1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)

Slides:



Advertisements
Similar presentations
CHI-SQUARE(X2) DISTRIBUTION
Advertisements

Statistical Inference for Frequency Data Chapter 16.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
ChiSq Tests: 1 Chi-Square Tests of Association and Homogeneity.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
Chi-square Test of Independence
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Analysis of Categorical Data
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11: Applications of Chi-Square. Chapter Goals Investigate two tests: multinomial experiment, and the contingency table. Compare experimental results.
1 Applied Statistics Using SAS and SPSS Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Chi-square test or c2 test
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Chapter 11 The Chi-Square Test of Association/Independence Target Goal: I can perform a chi-square test for association/independence to determine whether.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
+ Chi Square Test Homogeneity or Independence( Association)
Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
CHAPTER INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 2 methods.
March 30 More examples of case-control studies General I x J table Chi-square tests.
Dan Piett STAT West Virginia University Lecture 12.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
CHAPTER INTRODUCTORY CHI-SQUARE TEST Objectives:- Concerning with the methods of analyzing the categorical data In chi-square test, there are 3 methods.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Ch 13: Chi-square tests Part 2: Nov 29, Chi-sq Test for Independence Deals with 2 nominal variables Create ‘contingency tables’ –Crosses the 2 variables.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Categorical Analysis STAT120C 1. Review of Tests Learned in STAT120C Which test(s) should be used to answer the following questions? – Is husband’s BMI.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
Presentation 12 Chi-Square test.
Chi-square test or c2 test
5.1 INTRODUCTORY CHI-SQUARE TEST
Chapter 11 Chi-Square Tests.
Day 67 Agenda: Submit THQ #6 Answers.
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Applied Statistics Using SPSS
SENIORS: Final transcript request must be made by Friday.
Chapter 26 Comparing Counts.
Chapter 11 Chi-Square Tests.
Presentation transcript:

1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity) for general two-way cross classifications of count data. Terms: Contingency Table Cross-Classification Table Measure of association Independence in two-way tables Chi-Square Test for Independence or Homogeneity

2 A university conducted a study concerning faculty teaching evaluation classification by students. A sample of 467 faculty is randomly selected, and each person is classified according to rank (Instructor, Assistant Professor, etc. ) and teaching evaluation (Above, Average, Below). Each person has two categorical responses. Data can be formatted into a cross- tabulation or contingency table. Test of Independence or Association

3 Is the level of teaching evaluation related to rank? Are Professors more likely to be judged above average than other ranks? Two variables that have been categorized in a two-way table are independent if the probability that a measurement is classified into a given cell of the table is equal to the probability of being classified into that row times the probability of being classified into that column. This must be true for all cells of the table. What are we interested in from this two-way classification table? H o : Teaching Evaluation and Rank are independent variables.

4 The independence assumption: Expected Observed r=#rows=3, c=#cols=4, 3  4 table. df = (r-1)(c-1) Test Statistic:

5 Observed Counts

6 Expected Counts Assumptions: no E ij < 1, and no more than 20% of E ij < 5.

7  Reject H o Individual Cell Chi Square Values There is evidence of an association between rank and evaluation. Note that we observed less Assistant Professors getting below average evaluations (13) than we would expect under independence (26.2). Chi Square value is 6.67.

8 Minitab Input data in this way STAT > TABLES > Cross Tabs Classification Variables: rank eval Check Chi-square Analysis, and Above and Std. residual Frequencies are in: count rankevalcount

9 Tabulated Statistics: eval, rank Rows: eval Columns: rank All All Chi-Square = , DF = 6, P-Value = Cell Contents -- Count Exp Freq Std. Resid Square roots of Individual Chi- square values:

10 SAS options ls=79 ps=40 nocenter; data eval; input job $ rating $ number; datalines; Instructor Above 36 Instructor Average 48 Instructor Below 30 Assistant Above 62 Assistant Average 50 Assistant Below 13 Associate Above 45 Associate Average 35 Associate Below 20 Professor Above 50 Professor Average 43 Professor Below 35 ; run; proc freq data=eval; weight number; table job*rating / chisq ; run; Table of job by rating job rating Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚Above ‚Average ‚Below ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Assistan ‚ 62 ‚ 50 ‚ 13 ‚ 125 ‚ ‚ ‚ 2.78 ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Associat ‚ 45 ‚ 35 ‚ 20 ‚ 100 ‚ 9.64 ‚ 7.49 ‚ 4.28 ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Instruct ‚ 36 ‚ 48 ‚ 30 ‚ 114 ‚ 7.71 ‚ ‚ 6.42 ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Professo ‚ 50 ‚ 43 ‚ 35 ‚ 128 ‚ ‚ 9.21 ‚ 7.49 ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total

11 The FREQ Procedure Statistics for Table of job by rating Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square Likelihood Ratio Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V Sample Size = 467

12 SPSS First you need to tell SPSS that each observation must be weighted by the cell count. DATA > WEIGHT CASES Then you choose the analysis. ANALYZE > DESCRIPTIVE STATISTICS > CROSS TABS

13

14 > score <- c(36,48,30,62,50,13,45,35,20,50,43,35) > mscore <- matrix(score,3,4) > mscore [,1] [,2] [,3] [,4] [1,] [2,] [3,] > chisq.test(mscore) Pearson's Chi-squared test data: mscore X-squared = , df = 6, p-value = > out <- chisq.test(mscore) > out[1:length(out)] $statistic X-squared $parameter df 6 $p.value [1] R

15 $method [1] "Pearson's Chi-squared test" $data.name [1] "mscore" $observed [,1] [,2] [,3] [,4] [1,] [2,] [3,] $expected [,1] [,2] [,3] [,4] [1,] [2,] [3,] $residuals [,1] [,2] [,3] [,4] [1,] [2,] [3,] Square roots of Individual Chi- square values:

16 Test of Homogeneity Suppose we wish to determine if there is an association between a rare disease and another more common categorical variable (e.g. smoking). We can’t just take a random sample of subjects and hope to get enough cases (subjects with the disease). One solution is to choose a fixed number of cases, and a fixed number of controls, and classify each according to whether they are smokers or not. The same chi square test of independence applies here, but since we are sampling within subpopulations (have fixed margin totals), this is now called a chi square test of homogeneity (of distributions).

17 Homogeneity Null Hypothesis In general, if the column categories represent c distinct subpopulations, random samples of size n 1, n 2, …, n c are selected from each and classified into the r values of a categorical variable represented by the rows of the contingency table. The hypothesis of interest here is if there a difference in the distribution of subpopulation units among the r levels of the categorical variable, i.e. are the subpopulations homogenous or not. Subpop 1 = Subpop 2= … =Subpop c  11   1c  21   2c ::  r1  r2...  rc  ij = proportion of subpop j subjects (j=1,…,c) that fall in category i (i=1,…,r).

18 Null hypothesis of homogeneity

19 Example: Myocardial Infarction (MI) Data was collected to determine if there is an association between myocardial infarction and smoking in women. 262 women suffering from MI were classified according to whether they had ever smoked or not. Two controls (patients with other acute disorders) were matched to every case. Is the incidence of smoking the same for MI and non-MI sufferers? H o : the incidence of MI is homogenous with respect to smoking H o :  11 =  12 and  21 =  22

20 Example: MI results in MTB Stat -> Tables -> Chi-Square Test Chi-Square Test: MI Yes, MI No Expected counts are printed below observed counts MI Yes MI No Total Total Chi-Sq = = DF = 1, P-Value = Conclude: there is evidence of lack of homogeneity of incidence of MI with respect to smoking.

21 Odds and Odds Ratios Sometimes probabilities are expressed as odds, e.g. Gambling circles. (Why?) Biomedical studies. (Easy interpretation in logistic regression, etc.) Odds of Event A = P(A)  (1-P(A)) P(A) = Odds of A / (1 + Odds of A) Ex: A horse has odds of 3 to 2 of winning. This means that in every 3+2=5 races the horse wins 3 and loses 2. So P(Wins) = 3/5. To use the above formula express the odds as d to 1, so 1.5 to 1 in this case. Thus P(Wins) = 1.5 / (1+1.5) = 1.5 / 2.5 = 3/5.

22 Example: MI and Odds Ratios For women sufferers of MI, the proportion who ever smoked is 172/262 = In other words, the odds that a woman MI sufferer is a smoker are 0.656/( ) = 1.9. For women non-sufferers of MI, the proportion who ever smoked is 173/519 = In other words, the odds that a woman non-MI sufferer is a smoker are 0.333/( ) = 0.5. We can now calculate the odds ratio of being a smoker among MI sufferers: OR = 1.9/0.5 = 3.82 Among MI suffers, the odds of being a smoker are about 4 times the odds of not being a smoker. Put another way: a randomly selected MI sufferer is about twice as likely (.656/.333) of being a smoker than of not being one.