Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.

Slides:



Advertisements
Similar presentations
Categorical Data Analysis
Advertisements

Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
1 1 Slide © 2003 South-Western /Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Inference about the Difference Between the
Discrete (Categorical) Data Analysis
1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.
1 Pertemuan 09 Pengujian Hipotesis Proporsi dan Data Katagorik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Chapter 16 Chi Squared Tests.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.
1. State the null and alternative hypotheses. 2. Select a random sample and record observed frequency f i for the i th category ( k categories) Compute.
Chi-Square Tests and the F-Distribution
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
GOODNESS OF FIT TEST & CONTINGENCY TABLE
Regression Analysis (2)
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
1 Pertemuan 11 Uji kebaikan Suai dan Uji Independen Mata kuliah : A Statistik Ekonomi Tahun: 2010.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
AP Statistics Section 14.. The main objective of Chapter 14 is to test claims about qualitative data consisting of frequency counts for different categories.
Copyright © 2010 Pearson Education, Inc. Slide
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1/71 Statistics Tests of Goodness of Fit and Independence.
Chapter Outline Goodness of Fit test Test of Independence.
1 Chapter 10. Section 10.1 and 10.2 Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Dan Piett STAT West Virginia University Lecture 12.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 3 月 23 日 第六週:配適度與獨立性檢定.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1. State the null and alternative hypotheses. 2. Select a random sample and record observed frequency f i for the i th category ( k categories) Compute.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Test of independence: Contingency Table
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
St. Edward’s University
Chapter 12 Tests with Qualitative Data
John Loucks St. Edward’s University . SLIDES . BY.
Statistics for Business and Economics (13e)
Econ 3790: Business and Economics Statistics
Overview and Chi-Square
Analyzing the Association Between Categorical Variables
St. Edward’s University
Presentation transcript:

Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data

qualitativecategorical  Many experiments result in measurements that are qualitative or categorical rather than quantitative. ◦ People classified by ethnic origin ◦ Cars classified by color ◦ M&M ® s classified by type (plain or peanut) multinomial experiment  These data sets have the characteristics of a multinomial experiment.

n identical trials. 1. The experiment consists of n identical trials. one of k categories. 2. Each trial results in one of k categories. remains constant p 1 +p 2 +…+ p k = 1 3. The probability that the outcome falls into a particular category i on a single trial is p i and remains constant from trial to trial. The sum of all k probabilities, p 1 +p 2 +…+ p k = 1. independent 4. The trials are independent. O 1, O 2, … O k O 1 + O 2 +… + O k = n. 5. We are interested in the number of outcomes in each category, O 1, O 2, … O k with O 1 + O 2 +… + O k = n. m m m m m m

 A special case of the multinomial experiment with k = 2.  Categories 1 and 2: success and failure  p 1 and p 2 : p and q  O 1 and O 2 :: x and n-x  We made inferences about p (and q = 1 - p) In the multinomial experiment, we make inferences about all the probabilities, p 1, p 2, p 3 …p k.

 We have some preconceived idea about the values of the p i and want to use sample information to see if we are correct. expected number E i = np i  The expected number of times that outcome i will occur is E i = np i. observed cell counts, O i,  If the observed cell counts, O i, are too far from what we hypothesize under H 0, the more likely it is that H 0 should be rejected. m m m m m m

 We use the Pearson chi-square statistic: When H 0 is true, the differences O-E will be small, but large when H 0 is false. large values of 2Look for large values of χ 2 based on the chi-square distribution with a particular number of degrees of freedom. m m m m m m

 These will be different depending on the application. 1. Start with the number of categories or cells in the experiment. p 1 +p 2 +…+ p k = 1 2. Subtract 1df for each linear restriction on the cell probabilities. (You always lose 1 df since p 1 +p 2 +…+ p k = 1.) 3. Subtract 1 df for every population parameter you have to estimate to calculate or estimate E i.

Assumptions for Pearson’s Chi-Square: 1.The cell counts O 1, O 2, …,O k must satisfy the conditions of a multinomial experiment, or a set of multinomial experiments created by fixing either the row or the column totals. 2.The expected cell counts E 1, E 2, …, E k  5. Assumptions for Pearson’s Chi-Square: 1.The cell counts O 1, O 2, …,O k must satisfy the conditions of a multinomial experiment, or a set of multinomial experiments created by fixing either the row or the column totals. 2.The expected cell counts E 1, E 2, …, E k  5. 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic  2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption. 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic  2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption. If not (one or more is < 5)

The simplest of the applications. A single categorical variable is measured, and exact numerical values are specified for each of the p i. E i = np iExpected cell counts are E i = np i df = k-1Degrees of freedom: df = k-1

 A multinomial experiment with k = 6 and O 1 to O 6 given in the table.  We test: Upper Face Number of times H 0 : p 1 = 1/6; p 2 = 1/6;…p 6 = 1/6 (die is fair) H 1 : at least one p i is different from 1/6 (die is biased) H 0 : p 1 = 1/6; p 2 = 1/6;…p 6 = 1/6 (die is fair) H 1 : at least one p i is different from 1/6 (die is biased) Toss a die 300 times with the following results. Is the die fair or biased?

Upper Face OiOi EiEi 50 E i = np i = 300(1/6) = 50 Calculate the expected cell counts:  Test statistic and rejection region: Do not reject H 0. There is insufficient evidence to indicate that the die is biased.

 The test statistic, χ 2 has only an approximate chi- square distribution. E i  5  For the approximation to be accurate, statisticians recommend E i  5 for all cells.  Goodness of fit tests are different from previous tests since the experimenter uses H 0 for the model he thinks is true. Be careful not to accept H 0 (say the model is correct) without reporting . H 0 : model is correct (as specified) H 1 : model is not correct H 0 : model is correct (as specified) H 1 : model is not correct

Finger Lakes Homes manufactures four models of prefabricated homes, a two-story colonial, a ranch, a split-level, and an A-frame. To help in production planning, management would like to determine if previous customer purchases indicate that there is a preference in the style selected. The number of homes sold of each model for 100 sales over the past two years is shown below. Model Colonial Ranch Split-Level A-Frame # Sold

 Notation p C = popul. proportion that purchase a colonial p R = popul. proportion that purchase a ranch p S = popul. proportion that purchase a split-level p A = popul. proportion that purchase an A-frame  Hypotheses H 0 : p C = p R = p S = p A =.25 H 1 : The population proportions are not p C =.25, p R =.25, p S =.25, and p A =.25

 Expected Frequencies E 1 =.25(100) = 25 E 2 =.25(100) = 25 E 3 =.25(100) = 25 E 4 =.25(100) = 25  Test Statistic  Rejection Rule Reject H 0 we reject the assumption that there is no home style preference, at the 0.05 level of significance.

n Conclusion Using the p-Value Approach The p-value < . We can reject the null hypothesis.. Because  2 = 10 is between and , the area in the upper tail of the distribution is between.025 and.01. Area in Upper Tail  2 Value (df = 3) Example 2: Finger Lakes Homes

2. C ONTINGENCY T ABLES : A T WO -W AY C LASSIFICATION The test of independence of variables is used to determine whether two variables are independent when a single sample is selected. two qualitative variables bivariate data The experimenter measures two qualitative variables to generate bivariate data. Gender and colorblindness Age and opinion Professorial rank and type of university contingency table. Summarize the data by counting the observed number of outcomes in each of the intersections of category levels in a contingency table.

r X c C ONTINGENCY T ABLE The contingency table has r rows and c columns = rc total cells. 12…c 1O 11 O 12 …O 1c 2O 21 O 22 …O 2c ……………. rO r1 O r2 …O rc contingentdependentWe study the relationship between the two variables. Is one method of classification contingent or dependent on the other? Does the distribution of measurements in the various categories for variable 1 depend on which category of variable 2 is being observed? If not, the variables are independent. Does the distribution of measurements in the various categories for variable 1 depend on which category of variable 2 is being observed? If not, the variables are independent.

C HI -S QUARE T EST OF I NDEPENDENCE O ijObserved cell counts are O ij for row i and column j. E ij = np ijExpected cell counts are E ij = np ij If H 0 is true and the classifications are independent, p ij = p i p j = P(falling in row i)P(falling in row j) H 0 : classifications are independent H 1 : classifications are dependent H 0 : classifications are independent H 1 : classifications are dependent

df = (r-1)(c-1). The test statistic has an approximate chi-square distribution with df = (r-1)(c-1). C HI -S QUARE T EST OF I NDEPENDENCE

E XAMPLE Furniture defects are classified according to type of defect and shift on which it was made. Shift Type123Total A B C D Total Do the data present sufficient evidence to indicate that the type of furniture defect varies with the shift during which the piece of furniture is produced? Test at the 1% level of significance. H 0 : type of defect is independent of shift H 1 : type of defect depends on the shift H 0 : type of defect is independent of shift H 1 : type of defect depends on the shift

Calculate the expected cell counts. For example: Reject H 0. There is sufficient evidence to indicate that the proportion of defect types vary from shift to shift. E XAMPLE You don’t need to divide the  by 2

Calculate the expected cell counts. For example: Chi-Square Test: 1, 2, 3 Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Total Total Chi-Sq = , DF = 6, P-Value = Reject H 0. There is sufficient evidence to indicate that the proportion of defect types vary from shift to shift. E XAMPLE

3. Comparing Multinomial Populations Sometimes researchers design an experiment so that the number of experimental units falling in one set of categories is fixed in advance. Example: An experimenter selects 900 patients who have been treated for flu prevention. She selects 300 from each of three types—no vaccine, one shot, and two shots. No VaccineOne ShotTwo ShotsTotal Flur1r1 No Flur2r2 Total300 n = 900 The column totals have been fixed in advance!

Each of the c columns (or r rows) whose totals have been fixed in advance is actually a single multinomial experiment. equivalent toThe chi-square test of independence with (r-1)(c-1) df is equivalent to a test of the equality of c (or r) multinomial populations. No VaccineOne ShotTwo ShotsTotal Flur1r1 No Flur2r2 Total300 n = 900 Three binomial populations—no vaccine, one shot and two shots. Is the probability of getting the flu independent of the type of flu prevention used? Three binomial populations—no vaccine, one shot and two shots. Is the probability of getting the flu independent of the type of flu prevention used? Comparing Multinomial Populations

Example Random samples of 200 voters in each of four wards were surveyed and asked if they favor candidate A in a local election. Ward 1234Total Favor A Do not favor A Total Do the data present sufficient evidence to indicate that the the fraction of voters favoring candidate A differs in the four wards? H 0 : fraction favoring A is independent of ward H 1 : fraction favoring A depends on the ward H 0 : fraction favoring A is independent of ward H 1 : fraction favoring A depends on the ward H 0 : p 1 = p 2 = p 3 = p 4 where p i = fraction favoring A in each of the four wards H 0 : p 1 = p 2 = p 3 = p 4 where p i = fraction favoring A in each of the four wards

Calculate the expected cell counts. For example: Reject H 0. There is sufficient evidence to indicate that the fraction of voters favoring A varies from ward to ward. Example - Solution

Since we know that there are differences among the four wards, what are the nature of the differences? Look at the proportions in favor of candidate A in the four wards. Ward1234 Favor A76/200=.3853/200 =.2759/200 =.3048/200 =.24 Candidate A is doing best in the first ward, and worst in the fourth ward. More importantly, he does not have a majority of the vote in any of the wards! Example - Solution

Equivalent Statistical Tests binomial experiment. A multinomial experiment with two categories is a binomial experiment. two binomial experiments The data from two binomial experiments can be displayed as a two-way classification. There are statistical tests for these two situations based on the statistic of Chapter 9:

Assumptions When you calculate the expected cell counts, if you find that one or more is less than five, these options are available to you: 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic X 2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption. 1.Choose a larger sample size n. The larger the sample size, the closer the chi-square distribution will approximate the distribution of your test statistic X 2. 2.It may be possible to combine one or more of the cells with small expected cell counts, thereby satisfying the assumption. m m m m m m