## Presentation on theme: "Copyright © 2010 Pearson Education, Inc. Slide 26 - 1."— Presentation transcript:

Copyright © 2010 Pearson Education, Inc. Slide 26 - 2 Solution: B

Copyright © 2010 Pearson Education, Inc. Slide 26 - 4 Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit test.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 5 Assumptions and Conditions Counted Data Condition: Check that the data are counts for the categories of a categorical variable. Independence Assumption: The counts in the cells should be independent of each other. Randomization Condition: The individuals who have been counted and whose counts are available for analysis should be a random sample from some population. Sample Size Assumption: We must have enough data for the methods to work. Expected Cell Frequency Condition: We should expect to see at least 5 individuals in each cell.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 6 Calculations The test statistic, called the chi-square (or chi- squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:

Copyright © 2010 Pearson Education, Inc. Slide 26 - 7 Calculations (cont.) The chi-square models are actually a family of distributions indexed by degrees of freedom (much like the t-distribution). The number of degrees of freedom for a goodness-of-fit test is n – 1, where n is the number of categories. The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals. If the observed counts dont match the expected, the statistic will be largeit cant be too small. So the chi-square test is always one-sided. If the calculated statistic value is large enough, well reject the null hypothesis.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 8 The Chi-Square Calculation 1.Find the expected values: Every model gives a hypothesized proportion for each cell. The expected value is the product of the total number of observations times this proportion. 2.Compute the residuals: Once you have expected values for each cell, find the residuals, Observed – Expected. 3.Square the residuals.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 9 The Chi-Square Calculation (cont.) 4.Compute the components. Now find the components for each cell. 5.Find the sum of the components (thats the chi- square statistic).

Copyright © 2010 Pearson Education, Inc. Slide 26 - 10 The Chi-Square Calculation (cont.) 6.Find the degrees of freedom. Its equal to the number of cells minus one. 7.Test the hypothesis. Use your chi-square statistic to find the P- value. (Remember, youll always have a one- sided test.) Large chi-square values mean lots of deviation from the hypothesized model, so they give small P-values.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 11 Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi- square test of homogeneity. The statistic that we calculate for this test is identical to the chi-square statistic for goodness-of-fit. In this test, however, we ask whether choices are the same among different groups (i.e., there is no model). The expected counts are found directly from the data and we have different degrees of freedom.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 12 Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: Counted Data Condition: The data must be counts. Randomization Condition and 10% Condition: As long as we dont want to generalize, we dont have to check these conditions. Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 13 Calculations We calculated the chi-square statistic as we did in the goodness-of-fit test: In this situation we have (R – 1)(C – 1) degrees of freedom, where R is the number of rows and C is the number of columns.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 14 Examining the Residuals When we reject the null hypothesis, its always a good idea to examine residuals. For chi-square tests, we want to work with standardized residuals, since we want to compare residuals for cells that may have very different counts. To standardize a cells residual, we just divide by the square root of its expected value:

Copyright © 2010 Pearson Education, Inc. Slide 26 - 15 Independence Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table. A chi-square test of independence uses the same calculation as a test of homogeneity.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 16 Assumptions and Conditions We still need counts and enough data so that the expected values are at least 5 in each cell. If were interested in the independence of variables, we usually want to generalize from the data to some population. In that case, well need to check that the data are a representative random sample from that population.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 17 What have we learned? All three methods we examined look at counts of data in categories and rely on chi-square models. Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on theory or model. Tests of homogeneity compare the distribution of several groups for the same categorical variable. Tests of independence examine counts from a single group for evidence of an association between two categorical variables.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 18 Example: It has been proposed by some researchers that children who are the older ones in their class at school naturally perform better in sports and that these children then get more coaching and encouragement. Could that make a difference in who makes it to the professional level in sports. We have the birthdates of every one of the 16,804 players who ever played in a major league game from 1975 who played through 2006. Lets test whether the observed distribution of ballplayers birth months shows just random fluctuations or whether it represents a real deviation from the national pattern. How can we find the expected counts?

Copyright © 2010 Pearson Education, Inc. Slide 26 - 19 MonthBallplayer CountNational birth % 11378% 21217% 31168% 41218% 51268% 61148% 71029% 81659% 91349% 101159% 111058% 121229% Total1478100%

Copyright © 2010 Pearson Education, Inc. Slide 26 - 20 Example: Are the assumptions and conditions met for performing a goodness-of-fit test? Example: What are the hypotheses, and what does the test show? Example: Whats different about the distribution of birth months among major league ball players?

Copyright © 2010 Pearson Education, Inc. Slide 26 - 21 Example: Tiny black potato flea beetles can damage potato plants. These pests chew holes in the leaves, causing the plants to die. They can be killed with an insecticide, but canola oil spray has been suggested as a natural method of controlling the beetles. We gather 500 beetles and place them in 3 containers. Two hundred in the first container, they are sprayed with canola oil. 200 are in the second container, sprayed with insecticide. 100 in the last container serve as a control group. We wait 6 hours and count the number of surviving beetles in each container. a. Why do we need the control group? b. What would our null hypothesis be? c. After the experiment is over, we could summarize the results in a table. Draw the table and calculate the degrees of freedom for our chi-squared test. d. All together, 125 beetles survived. Whats the expected count in the first cell – survivors among those sprayed with the natural spray? e. If it turns out that only 40 of the beetles in the first container survived, whats the calculated component of chi-squared for that cell? f. If the total calculated value of chi-squared for this table turns out to be around 10, would you expect the P-value of our test to be large or small? Explain.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 22 Which of following chi-square tests would you use to calculate each of the following situations? (goodness-of-fit, homogeneity, or independence) a. A restaurant manger wonders whether customers who dine on Friday nights have the same preferences among the four chefs specials entrees as those who dine on Saturday nights. One weekend he has the wait staff record which entrees were ordered each night. Assuming these customers to be typical of all weekend diners, hell compare the distributions of meals chosen Friday and Saturday. b. Company policy calls for parking spaces to be assigned to everyone at random, but you suspect that may not be so. There are three lots of equal size: lot A, next to the building; lot B, at bit father away, and lot C, on the other side of the highway. You gather data about employees at middle management level and above to see how many were assigned parking in each lot. c. Is a students social life affected by where the student lives? A campus survey asked a random sample of students whether they lived in a dorm, in off-campus housing, or at home and whether they had been out on a date 0, 1-2, 3-4, or 5 or more times in the past two months.

Copyright © 2010 Pearson Education, Inc. Slide 26 - 23 Homework: Pg. 642 1,2,3-9 (day 2) pg. 642 13-29 odd