Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.

Similar presentations


Presentation on theme: " Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed."— Presentation transcript:

1

2  Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed distribution differs from the hypothesized one. “All creatures have their determined time for giving birth and carrying fetus, only a man is born all year long, not in determined time, one in the seventh month, the other in the eighth, and so on till the beginning of the eleventh month.” ~Aristotle

3  Counted Data Condition ◦ Check that the data are counts for the categories of a categorical variable.  Independence Assumption ◦ Check that the individuals counted in the cells are sampled independently from some population. ◦ If not, check the randomization condition – the individuals who have been counted should be a random sample from some population.  Sample Size Assumption ◦ Expected cell frequency condition – expect to observe at least 5 individuals in each cell.

4  Compare the observed counts in each cell with the expected counts.  Look at the differences between the observed and expected counts.  The test is always one- sided.  There is no direction to the rejection of the null model – we know it just doesn’t fit.  Chi- Square statistic refers to a family of sampling distribution models.  Number of degrees of freedom is n – 1, where n is the number of categories.

5  Hypothesis ◦ H o : Births are uniformly distributed over zodiac signs. (p Aries =p Taurus =…) ◦ H A : Births are not uniformly distributed over zodiac signs.  Check Conditions: ◦ C OUNTED DATA CONDITION : there are counts of the number of executives in categories. ◦ R ANDOMIZATION CONDITION : this is a convenience sample, but no expectation of bias. ◦ E XPECTED CELL FREQUENCY CONDITION : the null hypothesis expects that of the 256 should occur in each sign. The sampling distribution of the test statistic is χ 2 with 12 – 1 = 11 degrees of freedom. Use a Chi-Square goodness-of-fit test.

6  The chi-square procedure: ◦ Find the expected values.  Values come from the null hypothesis.  Multiply the total number of observations by the hypothesized proportion. ◦ Compute the residuals, Observed – Expected. ◦ Square the residuals. ◦ Compute the component for each cell, ◦ Find the sum of the components. ◦ Find the degrees of freedom, the number of cells minus 1. ◦ Test the hypothesis: find the P-value.

7 ◦ Enter counts in L1 and expected percentages in L2. ◦ Convert expected percentages to expected counts. ◦ Calculate chi-square in L3.

8 ◦ Find the sum of L3. ◦ Find the P-value  The probability of finding a χ 2 value at least as high as the one calculated from the data.  DISTR menu, χ 2 cdf

9  P-value ◦ Test is one-sided, only consider the right tail. ◦ Large χ 2 values correspond to small P-values, leading to rejection of the null hypothesis. ◦ The P-value is the area in the upper tail of the χ 2 model for 11 degrees of freedom above the computed χ 2 value.  Conclusion ◦ The P-value of 0.926 means that an observed chi- square value of 5.08 or higher would occur about 93% of the time. ◦ There is virtually no evidence that the distribution of zodiac signs among executives is not uniform.

10  Chi-square test for homogeneity  Assumptions and Conditions ◦ Counted data condition ◦ Check that the data are counts for the categories of a categorical variable. ◦ Independence Assumption: Randomization condition ◦ When we test for homogeneity, we often are not interested in some larger population so we don’t need to check the randomization condition. ◦ Sample Size Assumption ◦ Expected cell frequency condition – expected count in each cell must be at least 5 individuals.

11  Who: High school graduates  What: Post- graduation activities  When: 1980, 1990, 2000  Why: Regular survey for general information 198019902000Total College/Post- HS education 320245288853 Employment982417139 Military1819542 Travel172524 Total 4532903151058

12  Hypothesis ◦ Have the choices made by high school graduates in what they do after graduation changed? ◦ H o : The post-high school choices made by the classes of 1980, 1990, and 2000 have the same distribution (homogeneous). ◦ H A : The post-high school choices made by the classes of 1980, 1990, and 2000 do not have the same distribution.  Check the conditions √ Counted data condition: there are counts of the number of students in categories. √ Randomization condition: No inference will be drawn to other high schools or other classes, so no need to check for a random sample. √ Expected cell frequency condition: The expected values are all at least 5 (see table, later).  Under these conditions, the sampling distribution of the test statistic is χ 2 with (4 – 1) X (3 – 1) = 6 degrees of freedom.  Perform a chi-square test of homogeneity.

13  TI-84+ Steps: ◦ Enter data in a matrix. ◦ Do the chi-square test of homogeneity. ◦ Matrix Edit [B]  Note that all expected counts are at least 5.

14  Conclusion ◦ The P-value is very small.  Observed pattern is very unlikely to occur by chance.  Reject the null hypothesis.  The choices made by high school graduates have changed over the two decades examined.

15  Examine the Residuals ◦ Standardized Residuals  Divide the cell’s residual by the square root of its expected value.  Values are the square root of the components calculated for each cell, with + or – to show whether we observed more or less cases than expected.  What trends do you see? 198019902000 College/Post-HS education -2.3660.7322.136 Employment4.989-2.284-3.791 Military0.0042.207-2.122 Travel2.098-1.785-0.803

16  Chi-Square Test for Independence ◦ Data categorize subjects from a single group on two categorical variables. ◦ Contingency Tables  Categorize counts on two or more variables.  Decide whether the distribution of counts on one variable is contingent on the other.  Assumptions and Conditions ◦ Counted data condition ◦ Check that the data are counts for the categories of a categorical variable. ◦ Independence Assumption: Randomization condition ◦ When we test for independence, we are interested in generalizing to some larger population. ◦ Sample Size Assumption ◦ Expected cell frequency condition – expected count in each cell must be at least 5 individuals.

17  Who: Patients being treated for non- blood-related disorders  What: Tattoo status and hepatitis C status  When: 1991, 1992  Where: Texas Hepatitis CNo Hepatitis C Total Tattoo, Parlor 173552 Tattoo, elsewhere 85361 None22491513 Total47579626

18  Hypothesis ◦ Are the categorical variables “tattoo status” and “hepatitis C status” statistically independent? ◦ H 0 : Tattoo status and hepatitis C status are independent. ◦ H A : Tattoo status and hepatitis C status are not independent.  Check the conditions √Counted data condition: there are counts of individuals in categories of two categorical variables. √Randomization condition: Although not an SRS, the data were selected to avoid biases and should be representative of the general population. √Expected cell frequency condition: The expected values do not meet the condition that all are greater than 5. Continue with caution – be sure to check the residuals.  Under these conditions, the sampling distribution of the test statistic is χ 2 with (3 – 1) X (2 – 1) = 2 df.  Perform a chi-square test for independence.

19  TI-84+ Steps: ◦ Enter data in a matrix. ◦ Do the chi-square test of independence. ◦ Matrix Edit [B]  Note that not all expected counts are at least 5.

20  Conclusion: ◦ The P-Value is very small, indicating that if these variables were independent, the pattern seen would be very unlikely to occur by chance. ◦ The hepatitis C status is not independent of the tattoo status. ◦ HOWEVER, check the two cells with the small expected counts to determine if they did or did not influence the result too greatly.  Remember: A complete solution must include additional analysis, recalculation, and a final conclusion.

21  Analysis of Residuals ◦ Too small an expected frequency can arbitrarily inflate the residual, leading to an inflated chi-square statistic. ◦ In this case, the standardized residual for the hepatitis C and Tattoo, Parlor cell is large ⇒ Inflated chi- square statistic?  Standardized Residuals Hepatitis CNo Hepatitis C Tattoo, Parlor 6.628-1.888 Tattoo, elsewhere 1.598-0.455 None-2.6610.758

22  Options : ◦ Based upon concerns, choose not to report the results. ◦ Include a warning when reporting the results. ◦ Combine the appropriate categories to larger sample size and expected frequencies.  Recalculation:  Recalculation (continued):  Conclusion: ◦ The tattoo status and hepatitis C status are not independent. The data suggest that tattoo parlors may be a particular problem, but we do not have enough data to draw that conclusion. Hepatitis CNo Hepatitis C Ttl None22491513 Tattoo2588113 Total47579626

23  A failure of independence between two categorical variables does not show a cause-and-effect relationship between them.  There is no way to differentiate the direction of any possible causation from one variable to another.  Lurking variables could be responsible for the observed lack of independence.  Don’t use chi-square methods unless the data are counts. ◦ Data reported as proportions or percentages can be used if they are converted to counts. ◦ Just because data are reported in a two-way table does not mean they are suitable for chi-square procedures.  Beware large samples. ◦ The degrees of freedom for the chi- square tests do not grow with sample size. ◦ With a sufficiently large sample size, a chi-square test can always reject the null hypothesis. ◦ There are no confidence intervals to help in determining the effect size.


Download ppt " Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed."

Similar presentations


Ads by Google