Games & Quizzes
Don't forget to Sign In to save your points
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
Games & Quizzes
You may need to watch a part of the video to unlock quizzes
Don't forget to Sign In to save your points
PERFECT HITS | +NaN | |
HITS | +NaN | |
LONGEST STREAK | +NaN | |
TOTAL | + |
Hello and Welcome to using MySQL
to build Big Data applications
to build Big Data applications
This is going to be a tutorial about
obviously, using MySQL
obviously, using MySQL
to build Big Data applications, but
when I mean Big Data
when I mean Big Data
there could be two things, it could be..
there could be two things, it could be..
Sorry, there could be two problems that you are addressing. Either it's an problem of scaling
as in, my system already has a lot of data and I..
as in, my system already has a lot of data and I..
I would like to be able to
I would like to be able to
make the existing features more performant or
be allowed to get more volume
be allowed to get more volume
and the other problem is reporting
and the other problem is reporting
in the sense that you already have Big Data
in the sense that you already have Big Data
in the sense that you already have Big Data
and you are asked to make use of that in some way
and you are asked to make use of that in some way
and you are asked to make use of that in some way
to either give more insight
to either give more insight
to the business users in your organization or give aggregated reports
to your customers about how they are performing
to your customers about how they are performing
and I'm going to focus today, on this side (reporting)
and I'm going to focus today, on this side (reporting)
and I'm going to focus today, on this side (reporting)
of the Big Data the problem.
So what is the problem with with the Big Data?
So what is the problem with with the Big Data?
Basically, it's as if you have a very large table
Basically, it's as if you have a very large table
Basically, it's as if you have a very large table
with millions or billions of rows
with millions or billions of rows
and in order to do the reporting that you need to do
and in order to do the reporting that you need to do
you need to gather all this information from this table and process it in some way
you need to gather all this information from this table and process it in some way
However, what does that mean in terms of the underlying physics of it.
However, what does that mean in terms of the underlying physics of it.
However, what does that mean in terms of the underlying physics of it.
You have a hard disk
You have a hard disk
(let's pretend that's a hard disk)
and in order to get the certain rows from the table on the hard disk
and in order to get the certain rows from the table on the hard disk
you have to go over many different places in the hard disk
So, if it is a large amount of data (that) would obviously be more time consuming.
If the data is fragmented across different places on the hard disk that would mean you have to spin more.
If the data is fragmented across different places on the hard disk that would mean you have to spin more.
If the data is fragmented across different places on the hard disk that would mean you have to spin more.
and once you have that
you need to get that data into the CPU (roughly)
you need to get that data into the CPU (roughly)
you need to get that data into the CPU (roughly)
to aggregate that (data)
to aggregate that (data)
To process it (the data). To manipulate it into whatever way you need it to (be)
and then you produce a report
which you later provide to your users
which you later provide to your users
provide to your users
and they are happy about it
(I'm not sure if you can see that)
So, this problem has actually been going on for a very long time
So, this problem has actually been going on for a very long time
How are we able to, with existing hardware technologies,
How are we able to, with existing hardware technologies,
get more data faster to be able to process it and turn it into a report
get more data faster to be able to process it and turn it into a report
Many years ago, a person called Ralph Kimbal
who is the main or one of the two main contributors to the data warehousing
who is the main or one of the two main contributors to the data warehousing
who is the main or one of the two main contributors to the data warehousing
who is the main or one of the two main contributors to the data warehousing
he came up with.. data warehousing.. I wouldn't say movement, but technology
he came up with.. data warehousing.. I wouldn't say movement, but technology
he came up with.. data warehousing.. I wouldn't say movement, but technology
came up with the idea in 1995 or 1996
came up with the idea in 1995 or 1996
where he said basically, no matter what the technology is
where he said basically, no matter what the technology is
is we'll always have to go through a large number of rows
is we'll always have to go through a large number of rows
so how can we design our database
(in a way) that we are able to produce reports without
(in a way) that we are able to produce reports without
(in a way) that we are able to produce reports without
(in a way) that we are able to produce reports without
very resource intensive operations and
what he thought was
his solution to this program was basically to create something called a summary table
his solution to this program was basically to create something called a summary table
and a summary table is an aggregated
version of this table
obviously, smaller and with less rows
that data is already been
taken from here (the large table) and
summarized here (the small table). So when you access this summary table
it's obviously much easier to get the rows and much easier to give back the results
it's obviously much easier to get the rows and much easier to give back the results
it's obviously much easier to get the rows and much easier to give back the results
So let me give some examples about what
what that would look like
so let's say, you have
so let's say, you have
a table
and it has orders
like a basic e-commerce site
and you have
usually a hundred thousand rows
usually a hundred thousand rows
per day
so it's not really a
not really an issue for any relational database.
You store those rows
You store those rows
with the database. That's fine.
But your period of time, lets say a year
But your period of time, lets say a year
you have quite a large number of rows
So you start to have 36.5 million rows
and that could get cumbersome
and in some cases
it could be much more than 100,000 rows, but lets stick to this example
So you want to
create a report from the orders table and you want to know
create a report from the orders table and you want to know
The business users in your organization want to know how certain products doing across particular dates
The business users in your organization want to know how certain products doing across particular dates
The business users in your organization want to know how certain products doing across particular dates
What you could you do (is), you could create a summary table
What you could you do (is), you could create a summary table
like this
and
For the sake of clarity, I'll write a SELECT statement here that will explain the contents of the summary table. So you have
For the sake of clarity, I'll write a SELECT statement here that will explain the contents of the summary table. So you have
select
So lets say we need date, because that was what was requested
So lets say we need date, because that was what was requested
and any product_id
and we want to get the aggregated details of revenue
and we want to get the aggregated details of revenue
and we want to get the aggregated details of revenue
and then we GROUP BY it
date
Basically the two keys (columns)
date and product_id
This is now the new summary table and we can call it
product revenue summary
product revenue summary
product revenue summary
and this had to say we have . hundred products, so this will have
hundred rows a day
So obviously, you could
after generating this table
You could provide this table to your business users and say
"Do whatever you need. Find out whatever information you want to gather."
so lets say for example,
If someone were to query for product 13A
If someone were to query for product 13A
If someone were to query for product 13A
and how it did (performed) on weekends
and how it did (performed) on weekends
so perhaps you know you would find the table
so perhaps you know you would find the table
for weekends or dates
Get only weekends and perhaps INNER JOIN it with that (summary) table
Get only weekends and perhaps INNER JOIN it with that (summary) table
and you'll get their answer very quickly
and you'll get their answer very quickly
and you'll get their answer very quickly
and your users will be happy because of it
and your users will be happy because of it
A different sample
or a different summary table could be for people who are interested to know how the
product is selling across a particular geography
and in this case, lets say city
so
what we would need to do for that it's a city_id isn't recorded in the orders table
we would need to enrich
we would need to enrich
the table a little bit
and the way we do that is we
we INNER JOIN it with the addresses table
and
what we would do is we would, basically.. I'll just write it here
what we would do is we would, basically.. I'll just write it here
you would do SELECT
let's do
let's do
o for orders, o.date
and
city
and
sum(o.revenue) FROM orders o
INNER JOIN addresses a
INNER JOIN addresses a
on
on
(actually) using
address_id
address_id
GROUP BY date and city
and we will fill up a new summary table
and we will fill up a new summary table
called
called
city revenue summary
so here we have
two summary tables
Two different ways of slicing the data. Now
you aren't exactly limited by the number of summary tables you can have
you aren't exactly limited by the number of summary tables you can have
obviously, they take a certain amount of space and
obviously, they take a certain amount of space and
they also take some effort into creating (them), but we'll get into that soon
they also take some effort into creating (them), but we'll get into that soon
was you could have done for example here is that you could have added city to to product
so you have product
you have here date, product_id and city_id
make it a larger summary table, but you can get the data in two different ways
or perhaps you can then have a more extensive
more extensive summary table with a higher level of granularity
more extensive summary table with a higher level of granularity
You could search for product and city and date
that could be a user requirement. It depends.
if you're interested in getting to the data in one way
You are only interested in slicing the data in this way or slicing the data in this way (second summary table)
You are only interested in slicing the data in this way or slicing the data in this way (second summary table)
currently you have two summary tables
and this particular summary table has saved you an INNER JOIN
that could be quite valuable in terms of performance, saving you an INNER JOIN
So, those are the two examples
i'd just like to quickly
give another example
of what happens nowadays in some other companies
of what happens nowadays in some other companies
some social networks
Already kind of use the idea of summary tables
in their systems
lets say
they have lots of servers
it's spread geographically: this is Europe
This is North America. This is South America
and this is Asia
and this is Asia
and
in order for them to get reports that they are interested in
what they would do is they would get data
what they would do is they would get data
From all the servers
into lets say a map/reduce system
in this case lets say hadoop, for example
and remember, we don't need the exact
and remember, we don't need the exact
once it arrives here, we don't need the exact data from them. We need the aggregated data to goto
once it arrives here, we don't need the exact data from them. We need the aggregated data to goto
once it arrives here, we don't need the exact data from them. We need the aggregated data to goto
to another database or another summary table
to another database or another summary table
and once the data from here is aggregated
it goes into a reporting database
depending on their needs this can be mysql database
depending on their needs this can be mysql database
if their needs are greater, it could be
if their needs are greater, it could be
if their needs are greater, it could be
any number of commercial or open source solutions which can handle
larger amounts of data
But the theory is very similar to
the example of summary tables there was that you get
data from from someplace you
you summer you advocated in the clinton reporting databases manual use those
you know years those
uh... access to state the base
and
korea
according hollers as they see cannot from becky affirmative also creates
ripples on your own
but he's a study group or to chat
uh...
though you can't
change of course you can change according to the report is is as it is
whereas here
uh... if they want to change
uh... the query okay
different information once today
discover there's something wrong with that they prevail and it's not slacking
more information according to what they found that they can
uh... graders that was whereas here it's static
um...
so
as well
a lot of things you're adding basically have the leading ill to yield a delay
system a subsystem
uh... or something
and you need to know
basically
creator
he needs to
uh... make sure that
dates are rising too
data constantly
rising to the summary tables
and needs more thought everything goals according as well
soul
uh...
the joys residue of the owners
animal duets with
pass that you can do is the reason the some
is that the summertime blues yeah
uh... one is real time
and one is attached
university either on angel home
overlooking once an hour
or job it's that time
geranium for example when no one's yours it's over and
uh...
it's very securities the pros
lettuce for example it sorry
that seems a little ones once an hour soldier getting older all the all those
that happened
over the last hour and breaking it
i'm putting it into the summary tables
insect time
same principle but
for all the day's worth of episodes
it's becoming a bit
uh... large amount of
in these cases it's important to note that
the summit that was our only refreshed
you know once in a row once a day so
it's a business decision that's okay that's
and you can go out and do that
uh...
mcconnell's
martin duckworth
and
forestry with uh...
so it would be with
not saw this with his uh... and the system
you need to set up monitoring
because you can't just take for granted that you know that has a right you can
just create something like the select statement that i created
put in the crimes of and and all that nothing goes around you have to make
sure that that
at everything
is there's no warning messages no analysts is that the didgeridoo
as it should
um...
regarding my sql
to set something like this out
you can basically is a select statement that i i did
you know what the celts and uh...
well close depending on your
on norinko
an insult into
three-fifths if it's owns themselves for example you have uh...
once all alert reporting so lands
uh... you have a date the coming through petition which is the the riddles atm
so you can set up
magistrates mountain
for example
this is you're going to lose
and here is your role
mainstream
lazy reporting seem
annual to st louis yeah
and have for example you have the to some of them
and you would do use the insult statement here
and something to say pencils
uh... take it from here
things that the clinton
these two well
so or that openssl
insert it into
tabled
that the other
uh...
the other way of doing this is
perhaps a bit
bids moral
slightly more confusing but sometimes it's a requirement by
something databases
that you do select
uh... unsolved problem
some kind of fun
and then some kind of uh...
into our
on the twenty on august the loads
data
info on
command mysql
and
this is
sunday just helps with rick occasional absolute soaking
requirement so that was one of the please do whatever is more convenient
for you
i would say it
i would advocate
here for example e
wearing them as a group of
i would also bergmann ending a little boy now
because my skill by default
would have the uh... grew brian
also
w invisible although by columns that you chose and that means you have to do in
addition fossil thing
if you don't need that they don't require it
uh... estimate total
you can handle it by now and a commandment
uh... you can
so you can not
frank usual unique he's on on these
and you can
replacing terror that smoking principally no ring true
uh... you can
aren't story differs empl
murdered a job
between going to love the data uh...
uh...
it's only need a tony date arrives and that's fine but if for example
uh...
there's a
chance of although they could being updated then you have to
perhaps include more than when i walk for beta in into your interval you have
to maybe
diesel too little as i said replace interested in going to
ordinator that may be updated you have to either recalled which totals of days
ago you can just
bulk say
the last three six seven days
uh...
if any that there was a bit in that period of time please update
countries object sometimes with a vaca
submit something to look out for with uh...
what do you can do is best for do
uh...
solar duke however didn't go
that their cargoes quite good recently
and
this is still the same back
still blanche
i would say though that the relation that was regarding you know
the crew buys
this congregation tradition there is still very very strong
uh... you would
look at uh... wanted to do
even though it's questions to just one
beta versions not paralyzed very well
uh... its
if it takes a very long time you may want to a group in some way
uh... just the differences here that this is one senator can be you know five
six seven eight cells
uh... judge richard data so he would you would get it
heating digital mysql
database
interview with you
cluster
and i go back and kids
or you can have back employees have hurt
specific reporting data
regarding real time
interest in this basically means triggers
powell purification
and understands dirt itself so
daggers once you update uh... regulator base
when your insults and also that the riddle database
uh... yesterday the summary tables as well
in the same in the same uh...
instance
so
or is it is a bit of an overhead
uh... if you have a high note in the day the races
in general and uh... uh...
and you can take this additional rental grand
you may want to consider branch all you know i consider all
i don't want to use it
requirement has to be real time and i i think you're in for social networks this
is going to be like every requirement
um...
so
you have
basically when you have been so sick man here
uh...
you would
if it's tickets for example
you have uh...
some integration then you would
so you know for this particular line
adv did them on document the distance
this one role
into this table and that is
attitude
if it's been updated
change it if it's the lead to remove it
there is also in a lot of other functions functions average main
maxim that
can be more complicated
uh... there is an envelope arrived
mobile from other websites
speaks about that
how to get used to go out to right uh...
transformed
if alum
it's not an issue to add these two girls though it is an issue tablets relief
and it's a bit more convenient
because mona you don't need to monitor it so much
index
uh... if you insult if you answer that
and something goes wrong and i wanted to make sense of it so were friends you fix
it done and uh...
if you're using a tribute to log something and then delete legend and
asynchronous and
slugs accurate time
then you may need to use the monitor
but uh...
trying to confuse you
uh...
we've done basically trios
and you can't find him halted set that up
uh...
and was being
so
just as a summarize
even as the gorgeous progress
if you want to speed up you'll
your reports
it is a good idea to have some examples
uh...
and you can use that reports of those
it's very common
in places a couple of days now
and
uh...
i'm showing you can help you
it does take a bit of addition design and i haven't spoken about it looks a
bit of an example where
we've removed animal join with pcp
ambitions on
something you have to look at it more and as a group of seven given
too much code example soul
uh... this is really interesting obviously understand what some of the
rules are
um...
exit thank you for watching my sister
uh... if you want to contacting
uh... this reminder to us
anderson in mind
website slash blog
we can find out more information to fit in the past
thank you very much
/ˈwerˌhouziNG/
noun
practice or process of storing goods in warehouse.
verb
To store things in a large building.
Metric | Count | EXP & Bonus |
---|---|---|
PERFECT HITS | 20 | 300 |
HITS | 20 | 300 |
STREAK | 20 | 300 |
TOTAL | 800 |
Sign in to unlock these awesome features: