Library

Video Player is loading.
 
Current Time 0:00
Duration 7:00
Loaded: 0.00%
 

x1.00


Back

Games & Quizzes

Training Mode - Typing
Fill the gaps to the Lyric - Best method
Training Mode - Picking
Pick the correct word to fill in the gap
Fill In The Blank
Find the missing words in a sentence Requires 5 vocabulary annotations
Vocabulary Match
Match the words to the definitions Requires 10 vocabulary annotations

You may need to watch a part of the video to unlock quizzes

Don't forget to Sign In to save your points

Challenge Accomplished

PERFECT HITS +NaN
HITS +NaN
LONGEST STREAK +NaN
TOTAL +
- //

We couldn't find definitions for the word you were looking for.
Or maybe the current language is not supported

  • 00:00

    In the chi square Goodness of fit test, we are determining how well the distribution
    In the chi square Goodness of fit test, we are determining how well the distribution

  • 00:04

    of experimental or observed data fits the population or expectation.
    of experimental or observed data fits the population or expectation.

  • 00:10

    It is employed when dealing with counts or frequencies for a categorical variable with
    It is employed when dealing with counts or frequencies for a categorical variable with

  • 00:15

    2 or more categories or what is called a multinomial experiment.
    2 or more categories or what is called a multinomial experiment.

  • 00:20

    In this first example, a car manufacturer expects customers will order colours of their
    In this first example, a car manufacturer expects customers will order colours of their

  • 00:25

    model J according to this distribution:
    model J according to this distribution:

  • 00:29

    They selected a random sample of 140 orders as shown in this second table.
    They selected a random sample of 140 orders as shown in this second table.

  • 00:33

    We want to test at the 5% significance level if there is a statistically significant difference
    We want to test at the 5% significance level if there is a statistically significant difference

  • 00:39

    between the observed and expected frequencies.
    between the observed and expected frequencies.

  • 00:42

    The expected proportions in the first table will be the null hypothesized proportions.
    The expected proportions in the first table will be the null hypothesized proportions.

  • 00:47

    In essence, we can write the null hypothesis as
    In essence, we can write the null hypothesis as

  • 00:51

    p1 = 0.28, p2 = 0.25, p3 = 0.16 and p4 = 0.31.
    p1 = 0.28, p2 = 0.25, p3 = 0.16 and p4 = 0.31.

  • 00:54

    The alternative will be that the null hypothesis is not true or not correct.
    The alternative will be that the null hypothesis is not true or not correct.

  • 01:04

    That is, at least one of the proportions is not as
    That is, at least one of the proportions is not as

  • 01:07

    specified in the null hypothesis.
    specified in the null hypothesis.

  • 01:10

    In other words, the observed values are not consistent with
    In other words, the observed values are not consistent with

  • 01:13

    the expected distribution.
    the expected distribution.

  • 01:16

    The goodness of fit test is a Chi square test.
    The goodness of fit test is a Chi square test.

  • 01:19

    Chi-square distributions are a family of distributions with each defined by its degrees of freedom.
    Chi-square distributions are a family of distributions with each defined by its degrees of freedom.

  • 01:24

    They are typically skewed to the right and take on only positive values.
    They are typically skewed to the right and take on only positive values.

  • 01:30

    The degrees of freedom for the goodness of fit test are defined as k – 1,
    The degrees of freedom for the goodness of fit test are defined as k – 1,

  • 01:34

    where k is the number of categories in the independent variable.
    where k is the number of categories in the independent variable.

  • 01:38

    Since we have 4 categories here, the degrees of freedom will be 3.
    Since we have 4 categories here, the degrees of freedom will be 3.

  • 01:43

    At alpha = 0.05, with degrees of freedom = 3, the critical value from the table or software
    At alpha = 0.05, with degrees of freedom = 3, the critical value from the table or software

  • 01:49

    is 7.815.
    is 7.815.

  • 01:51

    So the rejection region is where the observed chi-square is greater than 7.815.
    So the rejection region is where the observed chi-square is greater than 7.815.

  • 01:57

    The formula for calculating the chi square statistic is shown here.
    The formula for calculating the chi square statistic is shown here.

  • 02:01

    Or, sometimes in this format.
    Or, sometimes in this format.

  • 02:04

    Where O represents observed frequency and E represents expected frequency.
    Where O represents observed frequency and E represents expected frequency.

  • 02:09

    The test statistic is a measure how far the observed frequencies are from expected
    The test statistic is a measure how far the observed frequencies are from expected

  • 02:14

    and it is easy to compute using a table set up like this one.
    and it is easy to compute using a table set up like this one.

  • 02:18

    Here are the observed frequencies, which add up to 140.
    Here are the observed frequencies, which add up to 140.

  • 02:22

    The expected frequencies for the Goodness of Fit test are calculated by n times p_i.
    The expected frequencies for the Goodness of Fit test are calculated by n times p_i.

  • 02:28

    where n is the total of the observed frequencies (that is, 140 in this case)
    where n is the total of the observed frequencies (that is, 140 in this case)

  • 02:33

    and the p_i’s are the expected proportions that we have in the null hypothesis here.
    and the p_i’s are the expected proportions that we have in the null hypothesis here.

  • 02:38

    Therefore, for White, the expected frequency is 140 times 0.28
    Therefore, for White, the expected frequency is 140 times 0.28

  • 02:43

    which gives 39.2.
    which gives 39.2.

  • 02:45

    For Black, 140 times 0.25 gives 35.
    For Black, 140 times 0.25 gives 35.

  • 02:49

    For Silver, we have 22.4 and for Other we have 43.4.
    For Silver, we have 22.4 and for Other we have 43.4.

  • 02:54

    Adding these up we get a total Expected of 140 as well.
    Adding these up we get a total Expected of 140 as well.

  • 02:58

    This is not a coincidence.
    This is not a coincidence.

  • 02:59

    The expected frequencies must add up to the same total as the observed frequencies
    The expected frequencies must add up to the same total as the observed frequencies

  • 03:04

    since the expected proportions add up to 1.
    since the expected proportions add up to 1.

  • 03:07

    Note that the Goodness of Fit test requires each expected cell frequency
    Note that the Goodness of Fit test requires each expected cell frequency

  • 03:11

    to be at least 5.
    to be at least 5.

  • 03:13

    If the expected cell for a category has a frequency less than 5,
    If the expected cell for a category has a frequency less than 5,

  • 03:17

    we should combine that category with another category to satisfy that requirement.
    we should combine that category with another category to satisfy that requirement.

  • 03:23

    Since all the expected values are greater than 5 here, that requirement is satisfied.
    Since all the expected values are greater than 5 here, that requirement is satisfied.

  • 03:29

    Another requirement is that of randomization.
    Another requirement is that of randomization.

  • 03:32

    Since we assumed we have a random sample at the beginning,
    Since we assumed we have a random sample at the beginning,

  • 03:35

    then that requirement is also satisfied.
    then that requirement is also satisfied.

  • 03:37

    Next, we take we compute the difference between the Observed and Expected here.
    Next, we take we compute the difference between the Observed and Expected here.

  • 03:42

    Then square the differences.
    Then square the differences.

  • 03:44

    And finally, divide the squared differences by the expected.
    And finally, divide the squared differences by the expected.

  • 03:48

    The sum of this last column is 1.6315.
    The sum of this last column is 1.6315.

  • 03:51

    And it is the value of the chi square statistic for this test.
    And it is the value of the chi square statistic for this test.

  • 03:56

    Since this test statistic is not greater than the critical value,
    Since this test statistic is not greater than the critical value,

  • 03:59

    we fail to reject the null hypothesis.
    we fail to reject the null hypothesis.

  • 04:02

    The result is not statistically significant, or we don’t have enough evidence to conclude
    The result is not statistically significant, or we don’t have enough evidence to conclude

  • 04:07

    that the distribution of the observed colours differs significantly from the expectation.
    that the distribution of the observed colours differs significantly from the expectation.

  • 04:12

    We can also estimate the p-value by checking on the df = 3 row,
    We can also estimate the p-value by checking on the df = 3 row,

  • 04:17

    where 1.63 will fall.
    where 1.63 will fall.

  • 04:19

    We see that it will fall between 0.584 and 6.251 here.
    We see that it will fall between 0.584 and 6.251 here.

  • 04:25

    Therefore, the p-value is somewhere between alphas of 0.1 and 0.9.
    Therefore, the p-value is somewhere between alphas of 0.1 and 0.9.

  • 04:32

    Using software, we can obtain the exact P-value to be 0.6523.
    Using software, we can obtain the exact P-value to be 0.6523.

  • 04:36

    Recall that the rule is: if the P-value < α, we reject the null hypothesis.
    Recall that the rule is: if the P-value < α, we reject the null hypothesis.

  • 04:42

    Since this p-value is much larger than our alpha of 0.05,
    Since this p-value is much larger than our alpha of 0.05,

  • 04:46

    it confirms our decision to not reject the null hypothesis.
    it confirms our decision to not reject the null hypothesis.

  • 04:50

    Now, suppose we simply want to test if there are preferences for any of the 4 categories.
    Now, suppose we simply want to test if there are preferences for any of the 4 categories.

  • 04:56

    That is, customer orders are different between the colour categories.
    That is, customer orders are different between the colour categories.

  • 05:00

    In other words, we want to determine if the frequencies differ significantly from a uniform
    In other words, we want to determine if the frequencies differ significantly from a uniform

  • 05:06

    distribution.
    distribution.

  • 05:07

    Then the expected percentage in each category will be 100% divided by the number of categories,
    Then the expected percentage in each category will be 100% divided by the number of categories,

  • 05:13

    which is 4 in this case.
    which is 4 in this case.

  • 05:14

    So, we will expect 25% in each category if there is no preference.
    So, we will expect 25% in each category if there is no preference.

  • 05:20

    The null hypothesis will thus be that all the proportions are equal to 0.25.
    The null hypothesis will thus be that all the proportions are equal to 0.25.

  • 05:25

    For the alternative hypothesis, we can still state that at least one pi is not as specified
    For the alternative hypothesis, we can still state that at least one pi is not as specified

  • 05:31

    in H0. or we can simply say that
    in H0. or we can simply say that

  • 05:34

    at least one of the proportions is not equal to 0.25.
    at least one of the proportions is not equal to 0.25.

  • 05:38

    The degrees of freedom are still 3 and the rejection region is the same.
    The degrees of freedom are still 3 and the rejection region is the same.

  • 05:43

    In calculating the test statistic, expected values can be obtained by n times
    In calculating the test statistic, expected values can be obtained by n times

  • 05:47

    pi as before.
    pi as before.

  • 05:49

    Or we can simply divide the sample size 140 by the number of categories 4,
    Or we can simply divide the sample size 140 by the number of categories 4,

  • 05:55

    to obtain 35 each for the expected cells.
    to obtain 35 each for the expected cells.

  • 05:58

    Next, we take the differences, Square them.
    Next, we take the differences, Square them.

  • 06:04

    Divide by expected and sum them to obtain the chi square statistic
    Divide by expected and sum them to obtain the chi square statistic

  • 06:08

    of 9.7714.
    of 9.7714.

  • 06:11

    Since this statistic is greater than the critical value,
    Since this statistic is greater than the critical value,

  • 06:14

    we reject the null hypothesis.
    we reject the null hypothesis.

  • 06:16

    From the chi square table, for df = 3, we also see that the test statistic falls
    From the chi square table, for df = 3, we also see that the test statistic falls

  • 06:21

    between the critical values for alpha = 0.025 and alpha = 0.01,
    between the critical values for alpha = 0.025 and alpha = 0.01,

  • 06:27

    indicating that the p-value is less than 0.05.
    indicating that the p-value is less than 0.05.

  • 06:31

    Using software, the p-value is 0.0206 which is less than 0.05.
    Using software, the p-value is 0.0206 which is less than 0.05.

  • 06:36

    So, we reject the null hypothesis.
    So, we reject the null hypothesis.

  • 06:39

    The result is statistically significant.
    The result is statistically significant.

  • 06:42

    That is, we have enough evidence to conclude that the colours are not equally preferred.
    That is, we have enough evidence to conclude that the colours are not equally preferred.

  • 06:48

    Or that the colour categories are not equally distributed among the customer orders.
    Or that the colour categories are not equally distributed among the customer orders.

  • 06:53

    And that’s it for this video.
    And that’s it for this video.

  • 06:57

    Thanks for watching.
    Thanks for watching.

All phrase
goodness of fit
//

phrase

the extent to which observed data match the values expected by theory.

Chi-Square Goodness-of-Fit Test

12,179 views

Video Language:

  • English

Caption Language:

  • English (en)

Accent:

  • English (CA)

Speech Time:

99%
  • 6:57 / 7:00

Speech Rate:

  • 149 wpm - Conversational

Category:

  • Education

Intro:

In the chi square Goodness of fit test, we are determining how well the distribution
of experimental or observed data fits the population or expectation.
It is employed when dealing with counts or frequencies for a categorical variable with
2 or more categories or what is called a multinomial experiment.
In this first example, a car manufacturer expects customers will order colours of their
model J according to this distribution:. They selected a random sample of 140 orders as shown in this second table.
We want to test at the 5% significance level if there is a statistically significant difference
between the observed and expected frequencies.. The expected proportions in the first table will be the null hypothesized proportions.
In essence, we can write the null hypothesis as. p1 = 0.28, p2 = 0.25, p3 = 0.16 and p4 = 0.31.. The alternative will be that the null hypothesis is not true or not correct.
That is, at least one of the proportions is not as. specified in the null hypothesis.. In other words, the observed values are not consistent with
the expected distribution.. The goodness of fit test is a Chi square test.. Chi-square distributions are a family of distributions with each defined by its degrees of freedom.
They are typically skewed to the right and take on only positive values.

Video Vocabulary

/ˈspesəˌfī/

verb

identify clearly.

/ˈkridək(ə)l/

adjective

Being important or serious; vital; dangerous.

/ˈpäzədiv/

adjective noun

characterized by presence of distinguishing features. positive quality.

/prəˈpôrSH(ə)n/

noun other verb

part in relation to whole. Sizes of one thing compared to the size of another. To balance the size of something with another.

/ikˌsperəˈmen(t)l/

adjective

based on untested ideas or techniques.

/hīˈpäTHəsəs/

noun

A theory trying to explain something.

/kənˈsistənt/

adjective

acting or done in same way over time.

/ˌpäpyəˈlāSH(ə)n/

noun

all inhabitants of place.

/ikˈspekt/

verb

regard something as likely.

/ˈtipik(ə)lē/

adverb

In a normal or usual way.

/ˈfrēdəm/

noun

power to act or speak freely.

/ˌkadəˈɡôrək(ə)l/

adjective

unambiguously explicit and direct.

/ˈdēliNG/

noun verb

business relations or transactions. To buy and sell illegal drugs.

/ˈkələr/

noun other verb

property of affecting eye by reflecting or emitting light. Qualities of things you can see, e.g. red, blue. To add color to something using colored pencils.

/ˌdistrəˈbyo͞oSH(ə)n/

noun other

action of distributing. Acts of providing or giving some things to people.