Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.

Chapter 11: Applications of Chi-Square

Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts. Results are often displayed on a chart showing the number of observations for each possible category.

Is Your Die Fair? Suppose you want to test whether or not a die is “fair,” i.e., are the probabilities for each outcome the same? You toss the die 60 times and record the results. You expect to get 10 of each number, but due to random variation you probably won’t. The question is, are the frequencies far enough off to convince you the die is not fair? Suppose you get 10, 9, 11, 12, 8, 10. Suppose you get 5, 10, 15, 14, 9, 7. How can we evaluate whether this is likely to be due to random chance or an unbalanced die?

Background: 1.Suppose there are n observations. 2.Each observation falls into a cell (or class). 3.Observed frequencies in each cell: O 1, O 2, O 3, …, O k. Sum of the observed frequencies is n. 4.Expected, or theoretical, frequencies: E 1, E 2, E 3,..., E k. Summary of notation:

Goal: 1.Compare the observed frequencies with the expected frequencies. 2.Decide whether the observed frequencies seem to agree or seem to disagree with the expected frequencies. Methodology: Use a chi-square statistic: This statistic is a measure of variation. Note its similarity to the formula for sums of squares (variance). Small values of  2 : Observed frequencies close to expected frequencies, because the variation is small. Large values of  2 : Observed frequencies do not agree with expected frequencies, the variation is large.

Sampling Distribution of  2 *: When n is large and all expected frequencies are greater than or equal to 5, then  2 * has approximately a  2 (chi-square) distribution. Recall: Properties of the Chi-Square Distribution: 1.  2 is nonnegative in value; it is zero or positively valued. 2.  2 is not symmetrical; it is skewed to the right. 3.  2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.

Various Chi-Square Distributions:

Critical values for chi-square: 1.Table 8, Appendix B. 2.Identified by degrees of freedom (df) and the area under the curve to the right of the critical value. 3.  2 (df,  ): critical value of a chi-square distribution with df degrees of freedom and  area to the right. 4.Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.

Example: Find  2 (16, 0.05). Portion of Table 8  2 (16, 0.05) = 26.3

Example: Find  2 (10, 0.99). Portion of Table 8  2 (10, 0.99) = 2.56

Multinomial Experiment: An experiment with the following characteristics: 1.It consists of n identical independent trials. 2.The outcome of each trial fits into exactly one of k possible cells. 3.There is a probability associated with each particular cell, and these individual probabilities remain constant during the experiment. 4.The experiment will result in a set of observed frequencies, O 1, O 2,..., O k, where each O i is the number of times a trial outcome falls into that particular cell. (It must be the case that O 1 + O 2 + + O k = n.)

Testing Procedure: 1.H 0 : The probabilities p 1, p 2,..., p k are correct. H a : Not all of the given probabilities are correct. 2.Test statistic: 3.Use a one-tailed critical region; the right-hand tail. 4.Degrees of freedom: df = k  1. 5.Expected frequencies: 6.To ensure a good approximation to the chi-square distribution: Each expected frequency should be at least 5

Example: A market research firm conducted a consumer- preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table: Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use  = 0.05.

Solution: If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class. 1.The null and alternative hypotheses: H 0 : There was no preference shown (equally distributed). H a : There was a preference shown (not equally distributed). 2.The type of test (distribution): A multinomial experiment with specified probabilities. Use  2 * with df = k  1 = 5  1 = 4 3.Rejection Region: Reject if  2 * >  2 (4,.05) =9.49

4.Calculations:  2 * = 3.2 5.Conclusion: Fail to reject H 0. At the 0.05 level of significance, there is not sufficient evidence to suggest the consumers showed a preference for any one cereal.

Example: A sample of 200 individuals were tested for their blood type, and the results are used to test the hypothesized distribution of blood types: At the 0.05 level of significance, is there any evidence to suggest the stated distribution is incorrect?

Solution: 1.The null and alternative hypotheses: H 0 : Blood type proportions are 0.41, 0.09, 0.46, 0.04 H a : Blood type proportions are not 0.41, 0.09, 0.46, 0.04 2.The type of test (distribution): A multinomial experiment with specified probabilities. Use  2 * with df = k  1 = 4  1 = 3 3.Rejection Region: Reject if  2 * >  2 (3,.05) = 7.82

4.Calculate the value of the test statistic:  2 * = 10.02 5. Conclusion : Reject H 0. The evidence suggests that the hypothesized proportions for blood types are incorrect.

Contingency Tables Contingency table: an arrangement of data into a two-way classification. Data is sorted into cells, and the observed frequency in each cell is reported. Contingency table involves two factors, or variables Usual question: are the two variables independent or dependent?

r  c Contingency Table: 1.r: number of rows; c: number of columns. 2.Used to test the independence of the row factor and the column factor. 3.Degrees of freedom: 4.n = grand total. 5.Expected frequency in the ith row and the jth column: Each E i,j should be at least 5. 6.R 1, R 2,..., R r and C 1, C 2,... C c : marginal totals.

Expected Frequencies for an r  c Contingency Table:

Example: A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution of responses is given in the table below. Test the hypothesis “political party is independent of opinion on Proposal 129.” Use  = 0.01.

Solution: 1.The null and the alternative hypotheses: H 0 : Opinion on property tax reform is independent of political party. H a : Opinion on property tax reform is not independent of political party. 2.The type of test (distribution): A Chi-Square test of independence df = (r  1) (c  1) = (3  1) (3  1) = 4 3.Rejection region: Reject H 0 if  2 * >  2 (4, 0.01) = 13.3

4. Calculations using Contingency table:

5.Conclusion: Reject H 0. There is evidence to suggest that opinion on tax reform and political party are not independent.

Test for Homogeneity: 1.Another type of contingency table problem. 2.Used when one of the two variables is controlled by the experimenter so that the row (or column) totals are predetermined. 3.Hypothesis test: the distribution of proportions within rows (or columns) is the same for all rows (or columns). 4.May be thought of as a comparison of several multinomial experiments. 5.Test procedure for independence and homogeneity with contingency tables is the same.

Example: A pharmaceutical company conducted an experiment to determine the effectiveness of three new cough suppressants. Each cough syrup was given to 100 random subjects. Is there any evidence to suggest the syrups act differently to suppress coughs? Use  = 0.05.

Solution: 1.The null and alternative hypotheses: H 0 : The proportion of individuals who receive various forms of relief is the same for all three cough syrups. H a : The proportion of individuals who receive various forms of relief is not the same for all three cough syrups. (In at least one group the proportions are different from the others.) 2.Type of test (distribution): A Chi-square test of homogeneity with df = (r  1) (c  1) = (3  1) (3  1) = 4 3.Rejection Region: Reject if  2 * >  2 (4, 0.05) = 9.49

4. Calculations (done by Minitab): A B C Total 1 23 29 20 72 24.00 24.00 24.00 2 60 56 50 166 55.33 55.33 55.33 3 17 15 30 62 20.67 20.67 20.67 Total 100 100 100 300 Chi-Sq = 0.042 + 1.042 + 0.667 + 0.394 + 0.008 + 0.514 + 0.651 + 1.554 + 4.215 = 9.085 DF = 4, P-Value = 0.059

5. Conclusion: Fail to reject H 0. There is no evidence to suggest the three remedies act differently to suppress coughs.

Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.

Similar presentations

Presentation on theme: "Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.

Similar presentations

Presentation on theme: "Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts."— Presentation transcript:

Similar presentations

About project

Feedback