Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.

Slides:



Advertisements
Similar presentations
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Advertisements

CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Discrete (Categorical) Data Analysis
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11: CHI-SQUARE TESTS.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Chi-Square and F Distributions Chapter 11 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chi-Square Tests and the F-Distribution
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chi-square test or c2 test
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Chapter Chi-Square Tests and the F-Distribution 1 of © 2012 Pearson Education, Inc. All rights reserved.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Chapter 10 Chi-Square Tests and the F-Distribution
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Chapter Outline Goodness of Fit test Test of Independence.
1 Chapter 10. Section 10.1 and 10.2 Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Statistics 300: Elementary Statistics Section 11-2.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
1 ES9 A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution.
Statistics 300: Elementary Statistics Section 11-3.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Goodness-of-Fit and Contingency Tables Chapter 11.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Chi-square test or c2 test
10 Chapter Chi-Square Tests and the F-Distribution Chapter 10
CHAPTER 11 CHI-SQUARE TESTS
Chapter 12 Tests with Qualitative Data
Chapter 10 Chi-Square Tests and the F-Distribution
Elementary Statistics: Picturing The World
Chapter 11: Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Overview and Chi-Square
Inference on Categorical Data
Analyzing the Association Between Categorical Variables
CHAPTER 11 CHI-SQUARE TESTS
Chapter Outline Goodness of Fit test Test of Independence.
Presentation transcript:

Chapter 11: Applications of Chi-Square

Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts. Results are often displayed on a chart showing the number of observations for each possible category.

Is Your Die Fair? Suppose you want to test whether or not a die is “fair,” i.e., are the probabilities for each outcome the same? You toss the die 60 times and record the results. You expect to get 10 of each number, but due to random variation you probably won’t. The question is, are the frequencies far enough off to convince you the die is not fair? Suppose you get 10, 9, 11, 12, 8, 10. Suppose you get 5, 10, 15, 14, 9, 7. How can we evaluate whether this is likely to be due to random chance or an unbalanced die?

Background: 1.Suppose there are n observations. 2.Each observation falls into a cell (or class). 3.Observed frequencies in each cell: O 1, O 2, O 3, …, O k. Sum of the observed frequencies is n. 4.Expected, or theoretical, frequencies: E 1, E 2, E 3,..., E k. Summary of notation:

Goal: 1.Compare the observed frequencies with the expected frequencies. 2.Decide whether the observed frequencies seem to agree or seem to disagree with the expected frequencies. Methodology: Use a chi-square statistic: This statistic is a measure of variation. Note its similarity to the formula for sums of squares (variance). Small values of  2 : Observed frequencies close to expected frequencies, because the variation is small. Large values of  2 : Observed frequencies do not agree with expected frequencies, the variation is large.

Sampling Distribution of  2 *: When n is large and all expected frequencies are greater than or equal to 5, then  2 * has approximately a  2 (chi-square) distribution. Recall: Properties of the Chi-Square Distribution: 1.  2 is nonnegative in value; it is zero or positively valued. 2.  2 is not symmetrical; it is skewed to the right. 3.  2 is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom.

Various Chi-Square Distributions:

Critical values for chi-square: 1.Table 8, Appendix B. 2.Identified by degrees of freedom (df) and the area under the curve to the right of the critical value. 3.  2 (df,  ): critical value of a chi-square distribution with df degrees of freedom and  area to the right. 4.Chi-square distribution is not symmetrical: critical values associated with right and left tails are given separately.

Example: Find  2 (16, 0.05). Portion of Table 8  2 (16, 0.05) = 26.3

Example: Find  2 (10, 0.99). Portion of Table 8  2 (10, 0.99) = 2.56

Multinomial Experiment: An experiment with the following characteristics: 1.It consists of n identical independent trials. 2.The outcome of each trial fits into exactly one of k possible cells. 3.There is a probability associated with each particular cell, and these individual probabilities remain constant during the experiment. 4.The experiment will result in a set of observed frequencies, O 1, O 2,..., O k, where each O i is the number of times a trial outcome falls into that particular cell. (It must be the case that O 1 + O O k = n.)

Testing Procedure: 1.H 0 : The probabilities p 1, p 2,..., p k are correct. H a : Not all of the given probabilities are correct. 2.Test statistic: 3.Use a one-tailed critical region; the right-hand tail. 4.Degrees of freedom: df = k  1. 5.Expected frequencies: 6.To ensure a good approximation to the chi-square distribution: Each expected frequency should be at least 5

Example: A market research firm conducted a consumer- preference experiment to determine which of 5 new breakfast cereals was the most appealing to adults. A sample of 100 consumers tried each cereal and indicated the cereal he or she preferred. The results are given in the following table: Is there any evidence to suggest the consumers had a preference for one cereal, or did they indicate each cereal was equally likely to be selected? Use  = 0.05.

Solution: If no preference was shown, we expect the 100 consumers to be equally distributed among the 5 cereals. Thus, if no preference is given, we expect (100)(0.2) = 20 consumers in each class. 1.The null and alternative hypotheses: H 0 : There was no preference shown (equally distributed). H a : There was a preference shown (not equally distributed). 2.The type of test (distribution): A multinomial experiment with specified probabilities. Use  2 * with df = k  1 = 5  1 = 4 3.Rejection Region: Reject if  2 * >  2 (4,.05) =9.49

4.Calculations:  2 * = Conclusion: Fail to reject H 0. At the 0.05 level of significance, there is not sufficient evidence to suggest the consumers showed a preference for any one cereal.

Example: A sample of 200 individuals were tested for their blood type, and the results are used to test the hypothesized distribution of blood types: At the 0.05 level of significance, is there any evidence to suggest the stated distribution is incorrect?

Solution: 1.The null and alternative hypotheses: H 0 : Blood type proportions are 0.41, 0.09, 0.46, 0.04 H a : Blood type proportions are not 0.41, 0.09, 0.46, The type of test (distribution): A multinomial experiment with specified probabilities. Use  2 * with df = k  1 = 4  1 = 3 3.Rejection Region: Reject if  2 * >  2 (3,.05) = 7.82

4.Calculate the value of the test statistic:  2 * = Conclusion : Reject H 0. The evidence suggests that the hypothesized proportions for blood types are incorrect.

Contingency Tables Contingency table: an arrangement of data into a two-way classification. Data is sorted into cells, and the observed frequency in each cell is reported. Contingency table involves two factors, or variables Usual question: are the two variables independent or dependent?

r  c Contingency Table: 1.r: number of rows; c: number of columns. 2.Used to test the independence of the row factor and the column factor. 3.Degrees of freedom: 4.n = grand total. 5.Expected frequency in the ith row and the jth column: Each E i,j should be at least 5. 6.R 1, R 2,..., R r and C 1, C 2,... C c : marginal totals.

Expected Frequencies for an r  c Contingency Table:

Example: A random sample of registered voters was selected and each was asked his or her opinion on Proposal 129, a property tax reform bill. The distribution of responses is given in the table below. Test the hypothesis “political party is independent of opinion on Proposal 129.” Use  = 0.01.

Solution: 1.The null and the alternative hypotheses: H 0 : Opinion on property tax reform is independent of political party. H a : Opinion on property tax reform is not independent of political party. 2.The type of test (distribution): A Chi-Square test of independence df = (r  1) (c  1) = (3  1) (3  1) = 4 3.Rejection region: Reject H 0 if  2 * >  2 (4, 0.01) = 13.3

4. Calculations using Contingency table:

5.Conclusion: Reject H 0. There is evidence to suggest that opinion on tax reform and political party are not independent.

Test for Homogeneity: 1.Another type of contingency table problem. 2.Used when one of the two variables is controlled by the experimenter so that the row (or column) totals are predetermined. 3.Hypothesis test: the distribution of proportions within rows (or columns) is the same for all rows (or columns). 4.May be thought of as a comparison of several multinomial experiments. 5.Test procedure for independence and homogeneity with contingency tables is the same.

Example: A pharmaceutical company conducted an experiment to determine the effectiveness of three new cough suppressants. Each cough syrup was given to 100 random subjects. Is there any evidence to suggest the syrups act differently to suppress coughs? Use  = 0.05.

Solution: 1.The null and alternative hypotheses: H 0 : The proportion of individuals who receive various forms of relief is the same for all three cough syrups. H a : The proportion of individuals who receive various forms of relief is not the same for all three cough syrups. (In at least one group the proportions are different from the others.) 2.Type of test (distribution): A Chi-square test of homogeneity with df = (r  1) (c  1) = (3  1) (3  1) = 4 3.Rejection Region: Reject if  2 * >  2 (4, 0.05) = 9.49

4. Calculations (done by Minitab): A B C Total Total Chi-Sq = = DF = 4, P-Value = 0.059

5. Conclusion: Fail to reject H 0. There is no evidence to suggest the three remedies act differently to suppress coughs.