Library

Video Player is loading.
 
Current Time 0:00
Duration 11:01
Loaded: 0%
 

x1.00


Back

Games & Quizzes

Training Mode - Typing
Fill the gaps to the Lyric - Best method
Training Mode - Picking
Pick the correct word to fill in the gap
Fill In The Blank
Find the missing words in a sentence Requires 5 vocabulary annotations
Vocabulary Match
Match the words to the definitions Requires 10 vocabulary annotations

You may need to watch a part of the video to unlock quizzes

Don't forget to Sign In to save your points

Challenge Accomplished

PERFECT HITS +NaN
HITS +NaN
LONGEST STREAK +NaN
TOTAL +
- //

We couldn't find definitions for the word you were looking for.
Or maybe the current language is not supported

  • 00:00

    Stat! Quest! Stat! Quest! Stat! Quest! StatQuest!
    Stat! Quest! Stat! Quest! Stat! Quest! StatQuest!

  • 00:08

    StatQuest is brought to you by
    StatQuest is brought to you by

  • 00:11

    the friendly people in the genetics
    the friendly people in the genetics

  • 00:13

    department at the University of North
    department at the University of North

  • 00:15

    Carolina at Chapel Hill. Hello and
    Carolina at Chapel Hill. Hello and

  • 00:18

    welcome to StatQuest in this video
    welcome to StatQuest in this video

  • 00:21

    we're going to talk about R-squared.
    we're going to talk about R-squared.

  • 00:23

    R-squared is a metric of correlation that
    R-squared is a metric of correlation that

  • 00:27

    is easy to compute and intuitive to
    is easy to compute and intuitive to

  • 00:29

    interpret. Most of us are already
    interpret. Most of us are already

  • 00:33

    familiar with correlation and the
    familiar with correlation and the

  • 00:35

    standard metric of it plain old 'r'.
    standard metric of it plain old 'r'.

  • 00:38

    Correlation values that are close to 1
    Correlation values that are close to 1

  • 00:40

    or negative 1 are good and tell you that
    or negative 1 are good and tell you that

  • 00:43

    two quantitative variables for example
    two quantitative variables for example

  • 00:45

    weight and size are strongly related.
    weight and size are strongly related.

  • 00:49

    Correlation values close to zero are
    Correlation values close to zero are

  • 00:51

    lame. Some of you may be asking why
    lame. Some of you may be asking why

  • 00:55

    should we care about R-squared; we
    should we care about R-squared; we

  • 00:57

    already have regular 'r'. Some of you
    already have regular 'r'. Some of you

  • 00:59

    might just be asking what is R-squared.
    might just be asking what is R-squared.

  • 01:02

    R-squared is very similar to its hipper
    R-squared is very similar to its hipper

  • 01:05

    cousin 'r', but interpretation is easier.
    cousin 'r', but interpretation is easier.

  • 01:09

    For example it's not obvious that when 'r'
    For example it's not obvious that when 'r'

  • 01:12

    equals 0.7 that's twice as good a
    equals 0.7 that's twice as good a

  • 01:15

    correlation as when 'r' equals 0.5.
    correlation as when 'r' equals 0.5.

  • 01:19

    However R-squared equals 0.7 is what it
    However R-squared equals 0.7 is what it

  • 01:23

    looks like
    looks like

  • 01:24

    it's 1.4 times as good as R-squared
    it's 1.4 times as good as R-squared

  • 01:27

    equals 0.5. The other thing that I like
    equals 0.5. The other thing that I like

  • 01:31

    about R-squared is that it's easy and
    about R-squared is that it's easy and

  • 01:33

    intuitive to calculate. Let's start with
    intuitive to calculate. Let's start with

  • 01:38

    an example.
    an example.

  • 01:40

    Here were plotting Mouse weight on the
    Here were plotting Mouse weight on the

  • 01:43

    y-axis with high weights towards the top
    y-axis with high weights towards the top

  • 01:45

    and low weights towards the bottom and
    and low weights towards the bottom and

  • 01:48

    mouse identification numbers on the
    mouse identification numbers on the

  • 01:50

    x-axis with ID numbers 1 through 7.
    x-axis with ID numbers 1 through 7.

  • 01:55

    We can calculate the mean or average of
    We can calculate the mean or average of

  • 01:59

    the mouse weights and plot it as a line
    the mouse weights and plot it as a line

  • 02:01

    that spans the graph. We can calculate
    that spans the graph. We can calculate

  • 02:05

    the variation of the data around this
    the variation of the data around this

  • 02:07

    mean as the sum of the squared
    mean as the sum of the squared

  • 02:10

    differences of the weight for each mouse
    differences of the weight for each mouse

  • 02:12

    'i', where 'i' as an individual mouse
    'i', where 'i' as an individual mouse

  • 02:15

    represented by a red dot, and the mean.
    represented by a red dot, and the mean.

  • 02:18

    The difference between each data point
    The difference between each data point

  • 02:21

    is squared so that the points below the
    is squared so that the points below the

  • 02:24

    mean don't cancel out the points above
    mean don't cancel out the points above

  • 02:26

    the mean. Now, what if instead of ordering
    the mean. Now, what if instead of ordering

  • 02:31

    our mice by their identification number
    our mice by their identification number

  • 02:33

    we ordered them by their size? Instead of
    we ordered them by their size? Instead of

  • 02:38

    using identification number on the
    using identification number on the

  • 02:40

    x-axis we have mouse size, with the
    x-axis we have mouse size, with the

  • 02:43

    smallest size on the left side and the
    smallest size on the left side and the

  • 02:45

    largest size on the right side. All we
    largest size on the right side. All we

  • 02:48

    have done is reorder the data on the
    have done is reorder the data on the

  • 02:51

    x-axis, the mean and variation are the
    x-axis, the mean and variation are the

  • 02:54

    exact same as before. Here we show the
    exact same as before. Here we show the

  • 02:58

    mean again as a black bar that spans the
    mean again as a black bar that spans the

  • 03:01

    graph in the exact same location as it
    graph in the exact same location as it

  • 03:03

    was before.
    was before.

  • 03:04

    Also the distances between the dots and
    Also the distances between the dots and

  • 03:07

    the line have not changed, just the order
    the line have not changed, just the order

  • 03:10

    of the dots. Here's a question for you:
    of the dots. Here's a question for you:

  • 03:14

    Given that we know an individual Mouse's
    Given that we know an individual Mouse's

  • 03:17

    size, is the mean, or average weight, the
    size, is the mean, or average weight, the

  • 03:20

    best way to predict that individual
    best way to predict that individual

  • 03:22

    Mouse's weight? Well the answer is 'no'. We
    Mouse's weight? Well the answer is 'no'. We

  • 03:26

    can do way better. All we have to do is
    can do way better. All we have to do is

  • 03:28

    fit a line to the data. Now we can
    fit a line to the data. Now we can

  • 03:32

    predict weight with our line. You tell me
    predict weight with our line. You tell me

  • 03:35

    you have a large Mouse, I can look at my
    you have a large Mouse, I can look at my

  • 03:37

    line and make a good guess about the
    line and make a good guess about the

  • 03:39

    weight. Here's another question: Does the
    weight. Here's another question: Does the

  • 03:42

    blue line that we just drew fit the data
    blue line that we just drew fit the data

  • 03:45

    better than the mean? If so how much
    better than the mean? If so how much

  • 03:49

    better? By eye, it looks like the blue
    better? By eye, it looks like the blue

  • 03:52

    line fits the data better than the mean.
    line fits the data better than the mean.

  • 03:55

    How do we quantify that difference?
    How do we quantify that difference?

  • 03:58

    R-squared. In the bottom of the graph
    R-squared. In the bottom of the graph

  • 04:01

    I've drawn the equation for R-squared.
    I've drawn the equation for R-squared.

  • 04:04

    We're going to walk through it one step
    We're going to walk through it one step

  • 04:06

    at a time.
    at a time.

  • 04:07

    The first part of the equation is just
    The first part of the equation is just

  • 04:10

    the variation around the mean. We already
    the variation around the mean. We already

  • 04:13

    calculated that. It's just the sum of the
    calculated that. It's just the sum of the

  • 04:16

    squared differences of the actual data
    squared differences of the actual data

  • 04:18

    values from the mean. The second part of
    values from the mean. The second part of

  • 04:22

    the equation is the variation around our
    the equation is the variation around our

  • 04:25

    new blue line. This is calculated in a
    new blue line. This is calculated in a

  • 04:28

    very similar way. Here, we just want the
    very similar way. Here, we just want the

  • 04:31

    sum of the squared differences between
    sum of the squared differences between

  • 04:33

    the actual data points and our new blue
    the actual data points and our new blue

  • 04:36

    line. The numerator, which is the
    line. The numerator, which is the

  • 04:39

    difference between the variation around
    difference between the variation around

  • 04:41

    the mean and the variation around the
    the mean and the variation around the

  • 04:43

    blue line, is then divided by the
    blue line, is then divided by the

  • 04:45

    variation around the mean. This makes
    variation around the mean. This makes

  • 04:48

    R-squared
    R-squared

  • 04:49

    range from zero to one, because the
    range from zero to one, because the

  • 04:52

    variation around the line will never be
    variation around the line will never be

  • 04:54

    greater than the variation around the
    greater than the variation around the

  • 04:55

    mean and it will never be less than zero.
    mean and it will never be less than zero.

  • 04:58

    This division also makes our squared a
    This division also makes our squared a

  • 05:01

    percentage and we'll talk more about
    percentage and we'll talk more about

  • 05:03

    that in just a second. Now, we'll walk
    that in just a second. Now, we'll walk

  • 05:07

    through an example where we calculate
    through an example where we calculate

  • 05:09

    things one step at a time.
    things one step at a time.

  • 05:11

    First, we'll start with the variation
    First, we'll start with the variation

  • 05:13

    around the mean. In this case that equals
    around the mean. In this case that equals

  • 05:16

    32. The variation around the blue line is
    32. The variation around the blue line is

  • 05:21

    only six, which is what we suspected
    only six, which is what we suspected

  • 05:24

    since it appears to fit the data much
    since it appears to fit the data much

  • 05:26

    better. Once we've calculated the
    better. Once we've calculated the

  • 05:29

    variation around the mean and the
    variation around the mean and the

  • 05:32

    variation around our blue line we can
    variation around our blue line we can

  • 05:34

    plug these values in to our formula for
    plug these values in to our formula for

  • 05:37

    R-squared. After plugging in our values,
    R-squared. After plugging in our values,

  • 05:40

    we get R-squared equals 32 minus 6 over
    we get R-squared equals 32 minus 6 over

  • 05:45

    32. After subtracting 6 from 32, we get 26.
    32. After subtracting 6 from 32, we get 26.

  • 05:52

    Doing the division, 26 divided by 32,
    Doing the division, 26 divided by 32,

  • 05:55

    gives us 0.81, or 81%.
    gives us 0.81, or 81%.

  • 06:00

    This means that there is 81% less
    This means that there is 81% less

  • 06:04

    variation around the line than the mean.
    variation around the line than the mean.

  • 06:07

    In other words, the size-weight
    In other words, the size-weight

  • 06:10

    relationship accounts for 81% of
    relationship accounts for 81% of

  • 06:13

    the total variation. This means that most
    the total variation. This means that most

  • 06:16

    of the variation in the data is
    of the variation in the data is

  • 06:18

    explained by the size-weight
    explained by the size-weight

  • 06:20

    relationship.
    relationship.

  • 06:22

    Here's another example. In this example
    Here's another example. In this example

  • 06:26

    we're comparing two possibly
    we're comparing two possibly

  • 06:28

    uncorrelated variables. On the y-axis we
    uncorrelated variables. On the y-axis we

  • 06:32

    have mouse weight again, but on the
    have mouse weight again, but on the

  • 06:34

    x-axis we now have "time spent sniffing a rock".
    x-axis we now have "time spent sniffing a rock".

  • 06:37

    Like before, we calculate the
    Like before, we calculate the

  • 06:41

    variation around the mean and just like
    variation around the mean and just like

  • 06:43

    before we got 32. However, this time when
    before we got 32. However, this time when

  • 06:47

    we calculated the variation around the
    we calculated the variation around the

  • 06:50

    blue line, we got a much larger value, 30.
    blue line, we got a much larger value, 30.

  • 06:54

    Now we just plug those values into our
    Now we just plug those values into our

  • 06:57

    formula for R-squared. By doing the math
    formula for R-squared. By doing the math

  • 07:02

    we see that R-squared equals 0.06 or 6%.
    we see that R-squared equals 0.06 or 6%.

  • 07:10

    Thus there is only 6% less variation
    Thus there is only 6% less variation

  • 07:14

    around the line than the mean. In other
    around the line than the mean. In other

  • 07:18

    words, the sniff-weight relationship
    words, the sniff-weight relationship

  • 07:20

    accounts for only 6% of the total
    accounts for only 6% of the total

  • 07:23

    variation. This means that hardly any of
    variation. This means that hardly any of

  • 07:27

    the variation in the data is explained
    the variation in the data is explained

  • 07:29

    by the sniff-weight relationship.
    by the sniff-weight relationship.

  • 07:35

    Now, when someone says "The statistically
    Now, when someone says "The statistically

  • 07:38

    significant R-squared was 0.9", you can
    significant R-squared was 0.9", you can

  • 07:42

    think to yourself "Very good! The
    think to yourself "Very good! The

  • 07:45

    relationship between the two variables
    relationship between the two variables

  • 07:47

    explains 90% of the variation in the
    explains 90% of the variation in the

  • 07:49

    data." And when someone else says, "The
    data." And when someone else says, "The

  • 07:53

    statistically significant R-squared was
    statistically significant R-squared was

  • 07:56

    0.01." You can think to yourself, "Dag! Who
    0.01." You can think to yourself, "Dag! Who

  • 08:01

    cares if that relationship is
    cares if that relationship is

  • 08:02

    significant? It only accounts for 1% of
    significant? It only accounts for 1% of

  • 08:05

    the variation in the data. Something else
    the variation in the data. Something else

  • 08:08

    must explain the remaining 99%."
    must explain the remaining 99%."

  • 08:12

    What about plain old 'r'? How is it related to R-squared?
    What about plain old 'r'? How is it related to R-squared?

  • 08:15

    R-squared is just the square of 'r'
    R-squared is just the square of 'r'

  • 08:19

    Now, when someone says
    Now, when someone says

  • 08:25

    "The statistically significant 'r' was 0.9", and
    "The statistically significant 'r' was 0.9", and

  • 08:29

    we're talking about just plain old 'r',
    we're talking about just plain old 'r',

  • 08:31

    you can think to yourself "0.9 times
    you can think to yourself "0.9 times

  • 08:35

    0.9 equals 0.81.
    0.9 equals 0.81.

  • 08:38

    Very good! The relationship between the
    Very good! The relationship between the

  • 08:41

    two variables explains 81% of the
    two variables explains 81% of the

  • 08:44

    variation in the data." And when someone
    variation in the data." And when someone

  • 08:47

    else says "The statistically significant
    else says "The statistically significant

  • 08:50

    'r'", that's plain old 'r', "was 0.5." You can
    'r'", that's plain old 'r', "was 0.5." You can

  • 08:55

    think to yourself "0.5 times 0.5 equals
    think to yourself "0.5 times 0.5 equals

  • 08:59

    0.25. The relationship accounts for 25%
    0.25. The relationship accounts for 25%

  • 09:03

    of the variation in the data. That's good
    of the variation in the data. That's good

  • 09:05

    if there are a million other things
    if there are a million other things

  • 09:07

    accounting for the remaining 75% and bad
    accounting for the remaining 75% and bad

  • 09:10

    if there's only one thing." I like
    if there's only one thing." I like

  • 09:14

    R-squared more than just plain old 'r'
    R-squared more than just plain old 'r'

  • 09:16

    because it's easier to interpret. Here's
    because it's easier to interpret. Here's

  • 09:19

    an example: How much better is 'r' equals
    an example: How much better is 'r' equals

  • 09:22

    0.7 then 'r' equals 0.5? Well if we convert
    0.7 then 'r' equals 0.5? Well if we convert

  • 09:29

    those numbers to R-squared, we see that
    those numbers to R-squared, we see that

  • 09:31

    when R-squared equals 0.7
    when R-squared equals 0.7

  • 09:34

    squared it actually equals 0.5,
    squared it actually equals 0.5,

  • 09:37

    which means 50% of the original
    which means 50% of the original

  • 09:39

    variation is explained by the
    variation is explained by the

  • 09:41

    relationship. When R-squared equals 0.5
    relationship. When R-squared equals 0.5

  • 09:45

    squared which equals 0.25, we see that
    squared which equals 0.25, we see that

  • 09:50

    only 25% of the original variation is
    only 25% of the original variation is

  • 09:52

    explained by the relationship. With
    explained by the relationship. With

  • 09:56

    R-squared it's easy to see that the first
    R-squared it's easy to see that the first

  • 09:58

    correlation is twice as good as the
    correlation is twice as good as the

  • 10:00

    second. Explaining 50% of the original
    second. Explaining 50% of the original

  • 10:03

    variation is twice as good as only
    variation is twice as good as only

  • 10:05

    explaining 25% of the original variation.
    explaining 25% of the original variation.

  • 10:09

    That said our squared does not indicate
    That said our squared does not indicate

  • 10:12

    the direction of the correlation because
    the direction of the correlation because

  • 10:15

    squared numbers are never negative. If
    squared numbers are never negative. If

  • 10:17

    the direction of the correlation isn't
    the direction of the correlation isn't

  • 10:20

    obvious, you can say "the two variables
    obvious, you can say "the two variables

  • 10:23

    were positively or negatively correlated
    were positively or negatively correlated

  • 10:26

    with R squared equals dot dot dot",
    with R squared equals dot dot dot",

  • 10:29

    whatever that value may be. These are the
    whatever that value may be. These are the

  • 10:34

    two main ideas for R-squared. R-squared
    two main ideas for R-squared. R-squared

  • 10:37

    is the percentage of variation explained
    is the percentage of variation explained

  • 10:40

    by the relationship between two
    by the relationship between two

  • 10:41

    variables, and also, if someone gives you
    variables, and also, if someone gives you

  • 10:44

    a value for plain old are just square it
    a value for plain old are just square it

  • 10:47

    in your head, you'll understand what's
    in your head, you'll understand what's

  • 10:49

    going on a whole lot better. We've
    going on a whole lot better. We've

  • 10:53

    reached the end of our StatQuest. Tune
    reached the end of our StatQuest. Tune

  • 10:55

    in next time for an exciting adventure
    in next time for an exciting adventure

  • 10:57

    into the land of statistics.
    into the land of statistics.

All

R-squared, Clearly Explained!!!

502,765 views

Video Language:

  • English

Caption Language:

  • English (en)

Accent:

  • English (US)

Speech Time:

100%
  • 11:04 / 11:01

Speech Rate:

  • 131 wpm - Conversational

Category:

  • Education

Intro:

Stat! Quest! Stat! Quest! Stat! Quest! StatQuest!. StatQuest is brought to you by. the friendly people in the genetics. department at the University of North. Carolina at Chapel Hill. Hello and. welcome to StatQuest in this video. we're going to talk about R-squared.. R-squared is a metric of correlation that. is easy to compute and intuitive to. interpret. Most of us are already. familiar with correlation and the. standard metric of it plain old 'r'.. Correlation values that are close to 1. or negative 1 are good and tell you that. two quantitative variables for example. weight and size are strongly related.. Correlation values close to zero are. lame. Some of you may be asking why. should we care about R-squared; we. already have regular 'r'. Some of you.

Video Vocabulary

/ˌkôrəˈlāSH(ə)n/

noun

mutual relationship.

/fəˈmilyər/

adjective noun

Well-known or easily recognized. demon supposedly obeying witch.

/iɡˈzampəl/

noun verb

Thing, person which represents a category. be illustrated or exemplified.

/ˌôlˈredē/

adverb

before or by now or time in question.

/dəˈpärtmənt/

noun

Division of a larger part or organization.

/ˌyo͞onəˈvərsədē/

noun

high-level educational institution in which students study for degrees.

/ˈäbvēəs/

adjective

Easily understood and clear; plain to see.

/ˈneɡədiv/

adjective exclamation noun verb

characterized by absence of distinguishing features. no. Reply to a question or statement that means 'no'. refuse to accept.

/ˈreɡyələr/

adjective noun

Being normal, usual, or average. Soldier who has a permanent job in the army.

/ˈsim(ə)lər/

adjective noun

Nearly the same; alike. person or thing similar to another.

/ˈfren(d)lē/

adjective adverb noun

kind and pleasant. in friendly manner. Game played not in a competition.

/briNG/

verb

take someone or something to place.

/ˈkwän(t)əˌtādiv/

adjective

relating to, measuring, or measured by quantity of something rather than its quality.

/inˌtərprəˈtāSH(ə)n/

noun

action of explaining meaning of something.