Discrete (Categorical) Data Analysis

Slides:



Advertisements
Similar presentations
Statistics for Business and Economics
Advertisements

© 2011 Pearson Education, Inc
Statistics for Business and Economics Chapter 9 Categorical Data Analysis.
Categorical Data Analysis
 2 Test of Independence. Hypothesis Tests Categorical Data.
Chi Squared Tests. Introduction Two statistical techniques are presented. Both are used to analyze nominal data. –A goodness-of-fit test for a multinomial.
1 1 Slide © 2003 South-Western /Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Inference about the Difference Between the
© 2002 Prentice-Hall, Inc.Chap 10-1 Statistics for Managers using Microsoft Excel 3 rd Edition Chapter 10 Tests for Two or More Samples with Categorical.
1 1 Slide © 2009 Econ-2030-Applied Statistics-Dr. Tadesse. Chapter 11: Comparisons Involving Proportions and a Test of Independence n Inferences About.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Lecture 4 Chapter 11 wrap-up
Chapter Goals After completing this chapter, you should be able to:
1 Pertemuan 09 Pengujian Hipotesis Proporsi dan Data Katagorik Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Chapter 16 Chi Squared Tests.
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Inferences About Process Quality
Chapter 11a: Comparisons Involving Proportions and a Test of Independence Inference about the Difference between the Proportions of Two Populations Hypothesis.
Chi-Square Tests and the F-Distribution
Goodness of Fit Test for Proportions of Multinomial Population Chi-square distribution Hypotheses test/Goodness of fit test.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
© 2004 Prentice-Hall, Inc.Chap 12-1 Basic Business Statistics (9 th Edition) Chapter 12 Tests for Two or More Samples with Categorical Data.
Statistics for Managers Using Microsoft Excel
Fundamentals of Hypothesis Testing: One-Sample Tests
8 - 1 © 1998 Prentice-Hall, Inc. Chapter 8 Inferences Based on a Single Sample: Tests of Hypothesis.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2005 Thomson/South-Western Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial Population Goodness of.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
8 - 1 © 2000 Prentice-Hall, Inc. Statistics for Business and Economics Inferences Based on a Single Sample: Tests of Hypothesis Chapter 8.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Estimating a Population Proportion
Chapter 13: Categorical Data Analysis Statistics.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
Contingency Tables 1.Explain  2 Test of Independence 2.Measure of Association.
© 2000 Prentice-Hall, Inc. Statistics The Chi-Square Test & The Analysis of Contingency Tables Chapter 13.
Copyright © 2010 Pearson Education, Inc. Slide
Introduction to Probability and Statistics Thirteenth Edition Chapter 13 Analysis of Categorical Data.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1/71 Statistics Tests of Goodness of Fit and Independence.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Chapter Outline Goodness of Fit test Test of Independence.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
1. State the null and alternative hypotheses. 2. Select a random sample and record observed frequency f i for the i th category ( k categories) Compute.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.
Chapter 11 – Test of Independence - Hypothesis Test for Proportions of a Multinomial Population In this case, each element of a population is assigned.
CHAPTER 11 CHI-SQUARE TESTS
Chapter 9: Inferences Involving One Population
John Loucks St. Edward’s University . SLIDES . BY.
Chapter 10 Analyzing the Association Between Categorical Variables
Inference on Categorical Data
Analyzing the Association Between Categorical Variables
CHAPTER 11 CHI-SQUARE TESTS
Chapter 9 Hypothesis Testing: Single Population
Chapter Outline Goodness of Fit test Test of Independence.
Presentation transcript:

Discrete (Categorical) Data Analysis TOPIC 10 Discrete (Categorical) Data Analysis

Discrete Random Variables Recall that discrete random variables may take only discrete values. For example, Number of errors in a software product: 0, 1, 2, 3, 4, … Categories of a product’s quality level” High, medium, or low Characteristics of a machine breakdown: Mechanical failure, electrical failure, or operator misuse.

Sample Proportions Recall that the success probability p can be estimated by the sample proportion For large enough values of n the sample proportion can be taken to have approximately the normal distribution This expression may be written in terms of a standard normal distribution as = Standard Error

Confidence Interval Estimation for p Since the probability of p is unknown then we replace p with its estimated Assumptions:

Example You’re a production manager for a newspaper. You want to find the % defective. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion defective?

Example Solution 84

Sample Size for Estimating p I don’t want to sample too much or too little! SE = Sampling Error If no estimate of p is available, use p = 1 – p = 0.5 89

Example What sample size is needed to estimate p with 90% confidence and a width L of .03? 91

Exercises Exercise: In an election poll a random sample of 500 people showed that 42 preferred voting for a particular candidate. Set up a 90% confidence interval estimate for the population proportion, p of the particular candidate. Suppose that the auditing procedures require you to have 95% confidence in estimating the population proportion of sales invoices with errors within ± 0.07. The results from the past months indicate that the largest proportion has been no more than 0.15. Find the sample size needed to satisfy the requirements of the company.

Z Test of Hypothesis for the Proportion One sample Z test for the proportion where Hypothesized proportion of successes in the population Sample proportion of successes Number of items having the characteristic of interest Sample size

Example You’re an accounting manager. A year-end audit showed 4% of transactions had errors. You implement new procedures. A random sample of 500 transactions had 25 errors. Has the proportion of incorrect transactions changed at the .05 level of significance?

Example Solution Test Statistic: p = .04 H0: Ha:  = , /2 = 0.025 Decision: Conclusion: p = .04 p  .04 H0: Ha:  = , /2 = 0.025 n = Critical Value(s): .05 500 Do not reject H0 at  = .05 Reject H Reject H .025 .025 There is evidence proportion is 4% -1.96 1.96 Z

Exercise A fast-food chain has developed a new process to ensure that orders at the drive-through are filled correctly. The previous process filled orders correctly 85% of the time. Based on a sample of 100 orders using the new process, 94 were filled correctly. At a 0.01 level of significance, can you conclude that the new process has increased the proportion of orders filled correctly?

Large-Sample Inference about p1 – p2 Assumptions: Independent, random samples Normal approximation can be used if

Large-Sample Inference about p1 – p2 (1 – α)100% Confidence Interval for ( p1 – p2) where

Example As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. Find a 99% confidence interval for the difference in perceptions. To check assumptions, use sample proportions as estimators of population proportion: n1·p1 = 78·63/78 = 63 n1·q1 = 78·(1-63/78) = 15 n2·p2 = 82·49/82 = 49 n2·q2 = 82·(1-49/82) = 33

Example Solution To check assumptions, use sample proportions as estimators of population proportion: n1·p1 = 78·63/78 = 63 n1·q1 = 78·(1-63/78) = 15 n2·p2 = 82·49/82 = 49 n2·q2 = 82·(1-49/82) = 33

Hypothesis Testing for Two Proportions Large-Sample Inference about p1 – p2 Hypothesis Testing for Two Proportions No Difference Pop 1 ³ Pop 2 Pop 1 £ Pop 2 Hypothesis Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2 H0 Ha Z – Test Statistic: Hypothesized difference where The rejection region follows the way similar to that in the one sample tests

Example As personnel director, you want to test the perception of fairness of two methods of performance evaluation. 63 of 78 employees rated Method 1 as fair. 49 of 82 rated Method 2 as fair. At the .01 level of significance, is there a difference in perceptions? To check assumptions, use sample proportions as estimators of population proportion: n1·p1 = 78·63/78 = 63 n1·q1 = 78·(1-63/78) = 15 n2·p2 = 82·49/82 = 49 n2·q2 = 82·(1-49/82) = 33

Example Solution 12

Example Solution .005 z H0: Ha:  = n1 = n2 = Critical Value(s): p1 - p2 = 0 p1 - p2  0 Test Statistic: Decision: Conclusion: H0: Ha:  = n1 = n2 = Critical Value(s): Z = +2.90 .01 78 82 Reject H0 at  = .01 z 2.58 -2.58 Reject H .005 There is evidence of a difference in proportions 11

Chi-Square Tests for k Proportions This topic extends hypothesis testing to analyze differences between population proportions based on two or more samples. Qualitative data that fall in more than two categories often result from a multinomial experiment. Some of the characteristics of the multinomial experiment are The probabilities of the k outcomes, denoted p1, p2, … , pk, remain the same from trial to trial, where p1 + p2 + … + pk = 1 The trials are independent Recall, binomial experiment is a multinomial experiment with k = 2

Chi-Square (2) Tests Populations p1 = p2 = p3 = p4 = ….. pk Evidence to accept/reject our claim Populations 2 Test for equality of proportions p1 = p2 = p3 = p4 = ….. pk Observed and expected frequencies x , e Draw Sample

Road Map Decision Making One/Two Samples Analysis of Variance χ2 Tests One-Way Table Two-Way Table

Multinomial Experiment n identical and independent trials k outcomes to each trial Constant outcome probability, pk Random variable is count, nk Example: ask 100 people (n) which of 3 candidates (k) they will vote for Uses one-way contingency table: Shows number of observations in k independent groups (outcomes or variable levels)

One Way Contingency Table Outcomes (k = 3) Candidate Tom Bill Mary Total 35 20 45 100 Number of responses

2 Test Basic Idea Compares observed frequency (xi) to expected frequency [ei] assuming null hypothesis is true Closer observed frequency is to expected frequency, the more likely the H0 is true Measured by squared difference relative to expected frequency Reject large values Assumptions: A multinomial experiment has been conducted The sample size n is large: ei is greater than or equal to 5 for every cell ( i = 1, 2, 3, …, k)

2 Test for k Proportions Hypothesized probability 1. Hypotheses H0: p1 = p1,0, p2 = p2,0, ..., pk = pk,0 Ha: At least one pi is different from above 2. Test Statistic Observed frequency Expected frequency: ei = npi,0 3. Degrees of Freedom: k – 1 Number of outcomes 24

c Finding Critical Value Example Reject H0 5.991 Upper Tail Area df What is the critical 2 value if k = 3, and  =.05? If xi = ei, 2 = 0. Do not reject H0 c 2 Upper Tail Area df .995 … .95 .05 1 ... 0.004 3.841 0.010 0.103 5.991 2 Table (Portion) Reject H0  = .05 df = k - 1 = 2 5.991 26

2 Test for k Proportions Example As personnel director, you want to test the perception of fairness of three methods of performance evaluation. Of 180 employees, 63 rated Method 1 as fair, 45 rated Method 2 as fair, 72 rated Method 3 as fair. At the .05 level of significance, is there a difference in perceptions? To check assumptions, use sample proportions as estimators of population proportion: n1·p = 78·63/78 = 63 n1·(1-p) = 78·(1-63/78) = 15 10

2 Test for k Proportions Solution x1 = 63 x2 = 45 x3 = 72 12

2 Test for k Proportions Solution H0: Ha:  = n1 = n2 = n3 = Critical Value(s): p1 = p2 = p3 = 1/3 At least 1 is different Test Statistic: Decision: Conclusion: 2 = 6.3 .05 63 45 72 Reject H0 at  = .05 c 2 Reject H0 There is evidence of a difference in proportions 5.991  = .05 11

Road Map Decision Making One/Two Samples Analysis of Variance χ2 Tests One-Way Table Two-Way Table Test of Independence

2 Test of Independence Multinomial experiment has been conducted Shows if a relationship exists between two qualitative (categorical) variables One sample is drawn Does not show causality Uses two-way contingency table Assumptions: Multinomial experiment has been conducted The sample size, n, is large: eij is greater than or equal to 5 for every cell

Two-Way Contingency Table Shows number of observations from 1 sample jointly in 2 qualitative variables Levels of variable 2 Levels of variable 1 40

Degrees of Freedom: (r – 1)(c – 1) 2 Test of Independence Hypotheses H0: Variables are independent Ha: Variables are related (dependent) Test Statistic Observed frequency Expected frequency Degrees of Freedom: (r – 1)(c – 1) Rows Columns 41

2 Test of Independence Expected Frequencies Statistical independence means joint probability equals product of marginal probabilities Compute marginal probabilities and multiply for joint probability Expected frequency is sample size times joint probability e = Column Tot al Sample Siz Row Total a f   f a f

Expected Frequency Example Joint probability = 112 160 78 160 112 160 Marginal probability = Location Urban Rural House Style Obs. Obs. Total Split–Level 63 49 112 Ranch 15 33 48 Total 78 82 160 Ri Cj Expected freq. = 160× 112 160 78 160 = 54.6 78 160 Marginal probability = 43

Expected Frequency Calculation ri: Total frequency in row i-th cj: Total frequency in column j-th 112×78 160 54.6 House Location 112×82 160 57.4 = = Urban Rural House Style Obs. Exp. Obs. Exp. Total Split Level 63 49 112 Ranch 48×78 160 23.4 15 33 48×82 160 24.6 48 Total 78 78 82 82 160 = = 43

Example As a realtor you want to determine if house style and house location are related. At the .05 level of significance, is there evidence of a relationship? 44

 Example Solution 112×78 160 112×82 160 = = 48×78 160 48×82 160 = = eij  5 in all cells 112×78 160 112×82 160 = = 48×78 160 48×82 160 = = 45

Example Solution Test Statistic: 12

Example Solution c Reject H0  = .05 3.841 H0: Ha:  = df = Critical Value(s): No Relationship Relationship Test Statistic: Decision: Conclusion: 2 = 8.41 .05 (2 – 1) (2 – 1) = 1 Reject H0 at  = .05 c 2 Reject H0 There is evidence of a relationship 3.841  = .05 11

Exercise 1 You’re a marketing research analyst. You ask a random sample of 286 consumers if they purchase Diet Pepsi or Diet Coke. At the .05 level of significance, is there evidence of a relationship? Diet Pepsi Diet Coke No Yes Total No 84 32 116 Yes 48 122 170 Total 132 154 286 44

 Exercise 1 Solution eij  5 in all cells 116×132 286 154×116 286 = = 116×132 286 154×116 286 = = 170×132 286 170×154 286 = = 45

Exercise 1 Solution Test Statistic: 12

Exercise 1 Solution c Reject H0  = .05 3.841 H0: Ha:  = df = Critical Value(s): No Relationship Relationship Test Statistic: Decision: Conclusion: 2 = 54.29 .05 (2 – 1) (2 – 1) = 1 Reject H0 at  = .05 c 2 Reject H0 There is evidence of a relationship 3.841  = .05 11

Exercise 2 There is a statistically significant relationship between purchasing Diet Coke and Diet Pepsi. So what do you think the relationship is? Aren’t they competitors? Diet Pepsi Diet Coke No Yes Total No 84 32 116 Yes 48 122 170 Total 132 154 286 48

You Re-Analyze the Data High Income Diet Pepsi Diet Coke No Yes Total No 4 30 34 Yes 40 2 42 There is a spurious relationship between purchasing Diet Coke & Diet Pepsi. Income is an intervening or control variable & is the true cause. The analysis here uses only descriptive statistics. For low income, consumers are price conscious. Either they can’t afford to buy either or they buy whatever is on sale. For high income, consumers buy depending on preference regardless of price. Total 44 32 76 Low Income Diet Pepsi Diet Coke No Yes Total No 80 2 82 Yes 8 120 128 Total 88 122 210 49

True Relationships* Underlying causal relation Apparent relation Diet Coke There is a spurious relationship between purchasing Diet Coke & Diet Pepsi. Income is an intervening or control variable & is the true cause. The analysis here uses only descriptive statistics. For low income, consumers are price conscious. Either they can’t afford to buy either or they buy whatever is on sale. For high income, consumers buy depending on preference regardless of price. Underlying causal relation Apparent relation Control or intervening variable (true cause) Diet Pepsi 50

Numbers don’t think - People do! Moral of the Story* Numbers don’t think - People do! 51

Any Questions ?