Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.

Slides:



Advertisements
Similar presentations
© 2011 Pearson Education, Inc
Advertisements

Categorical Data Analysis
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
Discrete (Categorical) Data Analysis
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.
Chapter 16 Chi Squared Tests.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inferences About Process Quality
Chi-Square Tests and the F-Distribution
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Regression Analysis (2)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Chapter 16 – Categorical Data Analysis Math 22 Introductory Statistics.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 13: Categorical Data Analysis Statistics.
Chapter 16 The Chi-Square Statistic
1 Pertemuan 11 Uji kebaikan Suai dan Uji Independen Mata kuliah : A Statistik Ekonomi Tahun: 2010.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
AP Statistics Section 14.. The main objective of Chapter 14 is to test claims about qualitative data consisting of frequency counts for different categories.
Copyright © 2010 Pearson Education, Inc. Slide
Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
1 Chapter 10. Section 10.1 and 10.2 Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Dan Piett STAT West Virginia University Lecture 12.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistics 300: Elementary Statistics Section 11-2.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Goodness-of-Fit and Contingency Tables Chapter 11.
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
Chapter 12 Tests with Qualitative Data
Lecture Slides Elementary Statistics Tenth Edition
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 10 Analyzing the Association Between Categorical Variables
Overview and Chi-Square
Chapter 13 – Applications of the Chi-Square Statistic
Analyzing the Association Between Categorical Variables
Presentation transcript:

Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair classified by color M&M ® s classified by type (plain or peanut) These data sets have the characteristics of a multinomial experiment. Chapter 14: Analysis of Categorical Data

The Multinomial Experiment 1. The experiment consists of n identical trials. 2. Each trial results in one of k categories. 3. The probability that the outcome falls into a particular category i on a single trial is p i and remains constant from trial to trial. The sum of all k probabilities, p 1 +p 2 +…+ p k = The trials are independent. 5. We are interested in the number of outcomes in each category, O 1, O 2, … O k with O 1 + O 2 +… + O k = n.

The Binomial Experiment A special case of the multinomial experiment with k = 2. Categories 1 and 2: success and failure p 1 and p 2 :p and q O 1 and O 2 :x and n-x We made inferences about p (and q = 1 - p) In the multinomial experiment, we make inferences about all the probabilities, p 1, p 2, p 3 …p k.

Pearson’s Chi-Square Statistic We have some preconceived idea about the values of the p i and want to use sample information to see if we are correct. The expected number of times that outcome i will occur is E i = np i. If the observed cell counts, O i, are too far from what we hypothesize under H 0, the more likely it is that H 0 should be rejected.

Pearson’s Chi-Square Statistic We use the Pearson chi-square statistic: When H 0 is true, the differences (O-E) will be small, but large when H 0 is false. Look for large values of X 2 based on the chi-square distribution with a particular number of degrees of freedom.

Degrees of Freedom These will be different depending on the application. 1. Start with the number of categories or cells in the experiment. 2. Subtract 1 df for each linear restriction on the cell probabilities. (You always lose 1 df since p 1 +p 2 +…+ p k = 1.) 3. Subtract 1 df for every population parameter you have to estimate to calculate or estimate E i.

The Goodness of Fit Test The simplest of the applications. A single categorical variable is measured, and exact numerical values are specified for each of the p i. Expected cell counts are E i = np i Degrees of freedom: df = k-1

Example A multinomial experiment with k = 6 and O 1 to O 6 given in the table. We test: Upper Face Number of times H 0 : p 1 = 1/6; p 2 = 1/6;…p 6 = 1/6 (die is fair) H a : at least one p i is different from 1/6 (die is biased) H 0 : p 1 = 1/6; p 2 = 1/6;…p 6 = 1/6 (die is fair) H a : at least one p i is different from 1/6 (die is biased) Toss a die 300 times with the following results. Is the die fair or biased?

Example Upper Face OiOi EiEi 50 E i = np i = 300(1/6) = 50 Calculate the expected cell counts: Test statistic and rejection region: Do not reject H 0. There is insufficient evidence to indicate that the die is biased.

Some Notes The test statistic, X 2 has only an approximate chi-square distribution. For the approximation to be accurate, statisticians recommend E i  5 for all cells. Goodness of fit tests are different from previous tests since the experimenter uses H 0 for the model he thinks is true. Be careful not to accept H 0 (say the model is correct) without reporting . H 0 : model is correct (as specified) H a : model is not correct H 0 : model is correct (as specified) H a : model is not correct

Contingency Tables: A Two- Way Classification The experimenter measures two qualitative variables to generate bivariate data. Examples: Gender and colorblindness Genes and disease Professorial rank and age group of person Summarize the data by counting the observed number of outcomes in each of the intersections of category levels in a contingency table.

r x c Contingency Table The contingency table has r rows and c columns—rc total cells. 12…c 1O 11 O 12 …O 1c 2O 21 O 22 …O 2c ……………. rO r1 O r2 …O rc We study the relationship between the two variables. Is one method of classification contingent or dependent on the other? Does the distribution of measurements in the various categories for variable 1 depend on which category of variable 2 is being observed? If not, the variables are independent.

Chi-Square Test of Independence Observed cell counts are O ij for row i and column j. Expected cell counts are E ij = np ij If H 0 is true and the classifications are independent, p ij = p i p j = P(falling in row i)P(falling in row j) H 0 : classifications are independent H a : classifications are dependent H 0 : classifications are independent H a : classifications are dependent

Chi-Square Test of Independence The test statistic has an approximate chi-square distribution with df = (r-1)(c-1).

Example Doctors’ errors are classified according to the seriousness of the error and shift on which it occurred. Shift Type123Total A15 (22.51)26 (22.99)33 (28.50)74 B21 (20.99)31 (21.44)17 (26.57)69 C45 (38.94)34 (39.77)49 (49.29)128 D13 (11.56)5 (11.81)20 (11.63)38 Total Do the data present sufficient evidence to indicate that the type of error varies with the shift during which the error occurred? Test at the 1% level of significance. H 0 : type of defect is independent of shift H a : type of defect depends on the shift H 0 : type of defect is independent of shift H a : type of defect depends on the shift Note: Expected values are indicated in brackets

The Furniture Problem Calculate the expected cell counts. For example : Reject H 0. There is sufficient evidence to indicate that the proportion of errors vary from shift to shift. Applet

Comparing Multinomial Populations Sometimes researchers design an experiment so that the number of experimental units falling in one set of categories is fixed in advance. Example: An experimenter selects 900 patients who have been treated for flu prevention. She selects 300 from each of three types—no vaccine, one shot, and two shots. No VaccineOne ShotTwo ShotsTotal Flur1r1 No Flur2r2 Total300 n = 900 The column totals have been fixed in advance!

Comparing Multinomial Populations Each of the c columns (or r rows) whose totals have been fixed in advance is actually a single multinomial experiment. The chi-square test of independence with (r-1)(c- 1) df is equivalent to a test of the equality of c (or r) multinomial populations. No VaccineOne ShotTwo ShotsTotal Flur1r1 No Flur2r2 Total300 n = 900 Three binomial populations—no vaccine, one shot and two shots. Is the probability of getting the flu independent of the type of flu prevention used? Three binomial populations—no vaccine, one shot and two shots. Is the probability of getting the flu independent of the type of flu prevention used?

Example Random samples of 200 voters in each of four wards were surveyed and asked if they favor candidate A in a local election. (expected values) Ward 1234Total Favor A76 (59.00) 53 (59.00) 59 (59.00) 48 (59.00) 236 Do not favor A 124 (141.00) 147 (141.00) 141 (141.00) 152 (141.00) 564 Total Do the data present sufficient evidence to indicate that the the fraction of voters favoring candidate A differs in the four wards? H 0 : fraction favoring A is independent of ward H a : fraction favoring A depends on the ward H 0 : fraction favoring A is independent of ward H a : fraction favoring A depends on the ward H 0 : p 1 = p 2 = p 3 = p 4 where p i = fraction favoring A in each of the four wards Applet

The Voter Problem Calculate the expected cell counts. For example : Reject H 0. There is sufficient evidence to indicate that the fraction of voters favoring A varies from ward to ward.

The Voter Problem Since we know that there are differences among the four wards, what are the nature of the differences? Look at the proportions in favor of candidate A in the four wards. Ward1234 Favor A76/200=.3853/200 =.2759/200 =.3048/200 =.24 Candidate A is doing best in the first ward, and worst in the fourth ward. More importantly, he does not have a majority of the vote in any of the wards!

Equivalent Statistical Tests A multinomial experiment with two categories is a binomial experiment. The data from two binomial experiments can be displayed as a two-way classification. There are statistical tests for these two situations based on the statistic of Chapter 9:

Other Applications Goodness-of-fit tests: It is possible to design a goodness-of-fit test to determine whether data are consistent with data drawn from a particular probability distribution, possibly the normal, binomial, Poisson, or other distributions. Time-dependent multinomials: The chi square statistic can be used to investigate the rate of change of multinomial proportions over time.

Other Applications Multidimensional contingency tables: Instead of only two methods of classification, it is possible to investigate a dependence among three or more classifications. Log-linear models: Complex models can be created in which the logarithm of the cell probability (ln p ij ) is some linear function of the row and column probabilities.

Assumptions Assumptions for Pearson’s Chi-Square: 1.The cell counts O 1, O 2, …,O k must satisfy the conditions of a multinomial experiment, or a set of multinomial experiments created by fixing either the row or the column totals. 2.The expected cell counts E 1, E 2, …, E k should equal or exceed five. Assumptions for Pearson’s Chi-Square: 1.The cell counts O 1, O 2, …,O k must satisfy the conditions of a multinomial experiment, or a set of multinomial experiments created by fixing either the row or the column totals. 2.The expected cell counts E 1, E 2, …, E k should equal or exceed five.

Assumptions When you calculate the expected cell counts, if you find that one or more is less than five, these options are available to you: 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic X 2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption. 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic X 2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption.

Key Concepts I.The Multinomial Experiment 1.There are n identical trials, and each outcome falls into one of k categories. 2.The probability of falling into category i is p i and remains constant from trial to trial. 3.The trials are independent,  p i  1, and we measure O i, the number of observations that fall into each of the k categories. II.Pearson’s Chi-Square Statistic which has an approximate chi-square distribution with degrees of freedom determined by the application.

Key Concepts III.The Goodness-of-Fit Test 1.This is a one-way classification with cell probabilities specified in H 0. 2.Use the chi-square statistic with E i  np i calculated with the hypothesized probabilities. 3.d f  k  1  (Number of parameters estimated in order to find E i ) 4.If H 0 is rejected, investigate the nature if the differences using the sampling proportions.

Key Concepts IV.Contingency Tables 1. A two-way classification with n observations into r  c cells of a two-way table using two different methods of classification is called a contingency table. 2. The test for independence of classifications methods uses the chi-square statistic 3.If the null hypothesis of independence of classifications is rejected, investigate the nature of the dependency using conditional proportions within either the rows or columns of the contingency table.

Key Concepts V.Fixing Row or Column Totals 1.When either the row or column totals are fixed, the test of independence of classifications becomes a test of the homogeneity of cell probabilities for several multinomial experiments. 2.Use the same chi-square statistic as for contingency tables. 3.The large-sample z tests for one and two binomial proportions are special cases of the chi-square statistic.

Key Concepts VI. Assumptions 1.The cell counts satisfy the conditions of a multinomial experiment, or a set of multinomial experiments with fixed sample sizes. 2.All expected cell counts must equal or exceed five in order that the chi-square approximation is valid.