# © 2011 Pearson Education, Inc

## Presentation on theme: "© 2011 Pearson Education, Inc"— Presentation transcript:

Chapter 9 Categorical Data Analysis © 2011 Pearson Education, Inc

Contents 9.1 Categorical Data and the Multinomial Experiment 9.2 Testing Category Probabilities: One-Way Table 9.3 Testing Category Probabilities: Two-Way Contingency Table 9.4 A Word of Caution about Chi-Square Tests As a result of this class, you will be able to ... © 2011 Pearson Education, Inc

Learning Objectives Discuss qualitative (i.e., categorical) data with more than two outcomes Present a chi-square hypothesis test for comparing the category proportions associated with a single qualitative variable–called a one-way analysis Present a chi-square hypothesis test for relating two qualitative variables–called a two-way analysis As a result of this class, you will be able to ... © 2011 Pearson Education, Inc

Categorical Data and Multinomial Experiment
9.1 Categorical Data and Multinomial Experiment :1, 1, 3 © 2011 Pearson Education, Inc

Qualitative Data Qualitative random variables yield responses that can be classified Example: gender (male, female) Qualitative data that fall in more than two categories often result from a multinomial experiment © 2011 Pearson Education, Inc

Properties of the Multinomial Experiment
1. The experiment consists of n identical trials. 2. There are k possible outcomes to each trial. These outcomes are called classes, categories, or cells. 3. The probabilities of the k outcomes, denoted by p1, p2,…, pk, remain the same from trial to trial,where p1 + p2 + … + pk = 1. 4. The trials are independent. 5. The random variables of interest are the cell counts, n1, n2, …, nk, of the number of observations that fall in each of the k classes. © 2011 Pearson Education, Inc

Testing Category Probabilities: One-Way Table
9.2 Testing Category Probabilities: One-Way Table :1, 1, 3 © 2011 Pearson Education, Inc

Multinomial Experiment
In this section, we consider a multinomial experiment with k outcomes that correspond to categories of a single qualitative variable. The results of such an experiment are summarized in a one-way table. The term one-way is used because only one variable is classified. Typically, we want to make inferences about the true proportions that occur in the k categories based on the sample information in the one-way table. © 2011 Pearson Education, Inc

Chi-Square (2) Test for k Proportions
Tests equality (=) of proportions only Example: p1 = .2, p2=.3, p3 = .5 One variable with several levels Uses one-way contingency table © 2011 Pearson Education, Inc

One-Way Contingency Table
Shows number of observations in k independent groups (outcomes or variable levels) Outcomes (k = 3) Candidate Tom Bill Mary Total 35 20 45 100 Number of responses © 2011 Pearson Education, Inc 20

A Test of a Hypothesis about Multinomial Probabilities: One-Way Table
H0: p1 = p1,0, p2 = p2,0, …, pk = pk,0 where p1,0, p2,0, …, pk,0 represent the hypothesized values of the multinomial probabilities. Ha: At least one of the multinomial probabilities does not equal its hypothesized value. © 2011 Pearson Education, Inc

A Test of a Hypothesis about Multinomial Probabilities: One-Way Table
where Ei = npi,0 is the expected cell count–that is, the expected number of outcomes of type i assuming that H0 is true. The total sample size is n. where has (k – 1) df. © 2011 Pearson Education, Inc

Conditions Required for a Valid Test: One-way Table
A multinomial experiment has been conducted. This is generally satisfied by taking a random sample from the population of interest. The sample size n is large. This is satisfied if for every cell, the expected cell count Ei will be equal to 5 or more. © 2011 Pearson Education, Inc

2 Test Basic Idea Compares observed count to expected count assuming null hypothesis is true Closer observed count is to expected count, the more likely the H0 is true Measured by squared difference relative to expected count Reject large values © 2011 Pearson Education, Inc

Finding Critical Value Example
What is the critical 2 value if k = 3, and  =.05? c 2 Upper Tail Area DF .995 .95 .05 1 ... 0.004 3.841 0.010 0.103 5.991 2 Table (Portion) If ni = E(ni), 2 = 0. Do not reject H0 Reject H0  = .05 df = k - 1 = 2 5.991 © 2011 Pearson Education, Inc 26

2 Test for k Proportions Example
As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions? To check assumptions, use sample proportions as estimators of population proportion: n1·p = 78·63/78 = 63 n1·(1-p) = 78·(1-63/78) = 15 © 2011 Pearson Education, Inc 10

2 Test for k Proportions Solution
H0: Ha:  = n1 = n2 = n3 = Critical Value(s): p1 = p2 = p3 = 1/3 At least 1 is different .05 63 45 72  = .05 c 2 Reject H0 5.991 © 2011 Pearson Education, Inc 11

2 Test for k Proportions Solution
© 2011 Pearson Education, Inc 12

2 Test for k Proportions Solution
H0: Ha:  = n1 = n2 = n3 = Critical Value(s): p1 = p2 = p3 = 1/3 At least 1 is different Test Statistic: Decision: Conclusion: 2 = 6.3 .05 63 45 72  = .05 c 2 Reject H0 5.991 Reject at  = .05 There is evidence of a difference in proportions © 2011 Pearson Education, Inc 11

Testing Category Probabilities: Two-Way (Contingency) Table
9.3 Testing Category Probabilities: Two-Way (Contingency) Table :1, 1, 3 © 2011 Pearson Education, Inc

2 Test of Independence Shows if a relationship exists between two qualitative variables One sample is drawn Does not show causality Uses two-way contingency table © 2011 Pearson Education, Inc

2 Test of Independence Contingency Table
Shows number of observations from one sample jointly in sample qualitative variables Levels of variable 2 Levels of variable 1 © 2011 Pearson Education, Inc 40

Finding Expected Cell Counts for a Two-Way Contingency Table
The estimate of the expected number of observations falling into the cell in row i and column j is given by where Ri = total for row i, Cj = total for column j, and n = sample size. © 2011 Pearson Education, Inc

General Form of a Contingency Table Analysis: 2 -Test for Independence H0: The two classifications are independent. Ha: The two classifications are dependent. where Rejection region: where has (r – 1)(c – 1) df. © 2011 Pearson Education, Inc

Conditions Required for a Valid 2-Test: Contingency Table
A multinomial experiment has been conducted . We may then consider this to be a multinomial experiment with r  c possible outcomes. The sample size n is large. This is satisfied if for every cell, the expected count Ei will be equal to 5 or more. © 2011 Pearson Education, Inc

2 Test of Independence Expected Counts
Statistical independence means joint probability equals product of marginal probabilities Compute marginal probabilities and multiply for joint probability Expected count is sample size times joint probability e = Column Tot al Sample Siz Row Total a f f a f © 2011 Pearson Education, Inc

Expected Count Example
Marginal probability = Location Urban Rural House Style Obs. Obs. Total Split–Level Ranch Total © 2011 Pearson Education, Inc 43

Expected Count Example
Marginal probability = Location Urban Rural House Style Obs. Obs. Total Split–Level Ranch Total Marginal probability = © 2011 Pearson Education, Inc 43

Expected Count Example
Joint probability = Marginal probability = Location Urban Rural House Style Obs. Obs. Total Split–Level Ranch Total Expected count = 160· = 54.6 Marginal probability = © 2011 Pearson Education, Inc 43

Expected Count Calculation
112· 54.6 House Location 112· 57.4 Urban Rural House Style Obs. Exp. Obs. Exp. Total Split - Level 63 49 112 Ranch 48· 23.4 15 33 48· 24.6 48 Total 78 78 82 82 160 © 2011 Pearson Education, Inc 43

2 Test of Independence Example
As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship? © 2011 Pearson Education, Inc 44

2 Test of Independence Solution
H0: Ha:  = df = Critical Value(s): No Relationship Relationship .05 (2 – 1)(2 – 1) = 1 c 2 Reject H0 3.841  = .05 © 2011 Pearson Education, Inc 47

2 Test of Independence Solution
Eij  5 in all cells 112· 112· 48· 48· © 2011 Pearson Education, Inc 45

2 Test of Independence Solution
© 2011 Pearson Education, Inc 12

2 Test of Independence Solution
H0: Ha:  = df = Critical Value(s): No Relationship Relationship Test Statistic: Decision: Conclusion: 2 = 8.41 .05 (2 – 1)(2 – 1) = 1 c 2 Reject H0 3.841  = .05 Reject at  = .05 There is evidence of a relationship © 2011 Pearson Education, Inc 47

2 Test of Independence Thinking Challenge
You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship? Diet Pepsi Diet Coke No Yes Total No 84 32 116 Yes 48 122 170 Total 132 154 286 © 2011 Pearson Education, Inc 44

2 Test of Independence Solution
H0: Ha:  = df = Critical Value(s): No Relationship Relationship .05 (2 – 1)(2 – 1) = 1 c 2 Reject H0 3.841  = .05 © 2011 Pearson Education, Inc 47

2 Test of Independence Solution*
Eij  5 in all cells 116· 154(116) 286 170· 170· © 2011 Pearson Education, Inc 45

2 Test of Independence Solution
© 2011 Pearson Education, Inc 12

2 Test of Independence Solution
H0: Ha:  = df = Critical Value(s): No Relationship Relationship Test Statistic: Decision: Conclusion: 2 = 54.29 .05 (2 – 1)(2 – 1) = 1 c 2 Reject H0 3.841  = .05 Reject at  = .05 There is evidence of a relationship © 2011 Pearson Education, Inc 47

2 Test of Independence Thinking Challenge 2
There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors? Diet Pepsi Diet Coke No Yes Total No 84 32 116 Yes 48 122 170 Total 132 154 286 © 2011 Pearson Education, Inc 48

You Re-Analyze the Data
High Income Diet Pepsi Diet Coke No Yes Total No 4 30 34 Yes 40 2 42 There is a spurious relationship between purchasing Diet Coke & Diet Pepsi. Income is an intervening or control variable & is the true cause. The analysis here uses only descriptive statistics. For low income, consumers are price conscious. Either they can’t afford to buy either or they buy whatever is on sale. For high income, consumers buy depending on preference regardless of price. Total 44 32 76 Low Income Diet Pepsi Diet Coke No Yes Total No 80 2 82 Yes 8 120 128 Total 88 122 210 © 2011 Pearson Education, Inc 49

Control or intervening variable (true cause)
True Relationships Diet Coke There is a spurious relationship between purchasing Diet Coke & Diet Pepsi. Income is an intervening or control variable & is the true cause. The analysis here uses only descriptive statistics. For low income, consumers are price conscious. Either they can’t afford to buy either or they buy whatever is on sale. For high income, consumers buy depending on preference regardless of price. Underlying causal relation Apparent relation Control or intervening variable (true cause) Diet Pepsi © 2011 Pearson Education, Inc 50

Numbers don’t think - People do!
Moral of the Story Numbers don’t think - People do! © 2011 Pearson Education, Inc © T/Maker Co. 51

A Word of Caution about Chi-Square Tests
9.4 A Word of Caution about Chi-Square Tests :1, 1, 3 © 2011 Pearson Education, Inc

The 2 is one of the most widely applied statistical tools and also one of the most abused statistical tool. Be certain the experiment satisfies the assumptions. Be certain the sample is drawn from the correct population. Avoid using when the expected counts are very small. © 2011 Pearson Education, Inc

If the 2 value does not exceed the established critical value of 2 , do not accept the hypothesis of independence. You risk a Type II error. Avoid concluding that two classifications are independent, even when 2 is small. If a contingency table 2 value does exceed the critical value, we must be careful to avoid inferring that a causal relationship exists between the classifications. The existence of a causal relationship cannot be established by a contingency table analysis. © 2011 Pearson Education, Inc

Key Ideas Multinomial Data Qualitative data that fall into more than two categories (or classes) As a result of this class, you will be able to... © 2011 Pearson Education, Inc

Key Ideas Properties of a Multinomial Experiment 1. n identical trials 2. k possible outcomes 3. probabilities of the k outcomes (p1, p2, …, pk) remain the same from trial to trial, where p1 + p2 + … + pk = 1 4. trials are independent 5. variables of interest: cell counts (i.e., number of observations falling into each outcome category), denoted n1, n2, …, nk As a result of this class, you will be able to... © 2011 Pearson Education, Inc

Key Ideas One-Way Table Summary table for a single qualitative variable Two-Way (Contingency) Table Summary table for two qualitative variables As a result of this class, you will be able to... © 2011 Pearson Education, Inc