How Can We Test whether Categorical Variables are Independent?

Slides:



Advertisements
Similar presentations
Significance Tests About
Advertisements

Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
Chi-square Test of Independence
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chi-Square Tests and the F-Distribution
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.5 Small Sample.
Chapter 10 Analyzing the Association Between Categorical Variables
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.3 Determining.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.3 Determining.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Overview Definition Hypothesis
Hypothesis Testing.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
More About Significance Tests
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Agresti/Franklin Statistics, 1 of 111 Chapter 9 Comparing Two Groups Learn …. How to Compare Two Groups On a Categorical or Quantitative Outcome Using.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Copyright © 2010 Pearson Education, Inc. Slide
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
© Copyright McGraw-Hill 2004
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chapter 10 Section 5 Chi-squared Test for a Variance or Standard Deviation.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.1 Independence.
Presentation 12 Chi-Square test.
8. Association between Categorical Variables
Review for Exam 2 Some important themes from Chapters 6-9
Chapter 10 Analyzing the Association Between Categorical Variables
Analyzing the Association Between Categorical Variables
Chapter 11 Analyzing the Association Between Categorical Variables
Chapter 11 Analyzing the Association Between Categorical Variables
UNIT V CHISQUARE DISTRIBUTION
Presentation transcript:

How Can We Test whether Categorical Variables are Independent? Section 10.2 How Can We Test whether Categorical Variables are Independent?

A Significance Test for Categorical Variables The hypotheses for the test are: H0: The two variables are independent Ha: The two variables are dependent (associated) The test assumes random sampling and a large sample size

What Do We Expect for Cell Counts if the Variables Are Independent? The count in any particular cell is a random variable Different samples have different values for the count The mean of its distribution is called an expected cell count This is found under the presumption that H0 is true

How Do We Find the Expected Cell Counts? For a particular cell, the expected cell count equals:

Example: Happiness by Family Income

The Chi-Squared Test Statistic The chi-squared statistic summarizes how far the observed cell counts in a contingency table fall from the expected cell counts for a null hypothesis

Example: Happiness and Family Income

Example: Happiness and Family Income State the null and alternative hypotheses for this test H0: Happiness and family income are independent Ha: Happiness and family income are dependent (associated)

Example: Happiness and Family Income Report the statistic and explain how it was calculated: To calculate the statistic, for each cell, calculate: Sum the values for all the cells The value is 73.4

Example: Happiness and Family Income The larger the value, the greater the evidence against the null hypothesis of independence and in support of the alternative hypothesis that happiness and income are associated

The Chi-Squared Distribution To convert the test statistic to a P-value, we use the sampling distribution of the statistic For large sample sizes, this sampling distribution is well approximated by the chi-squared probability distribution

The Chi-Squared Distribution

The Chi-Squared Distribution Main properties of the chi-squared distribution: It falls on the positive part of the real number line The precise shape of the distribution depends on the degrees of freedom: df = (r-1)(c-1)

The Chi-Squared Distribution Main properties of the chi-squared distribution: The mean of the distribution equals the df value It is skewed to the right The larger the value, the greater the evidence against H0: independence

The Chi-Squared Distribution

The Five Steps of the Chi-Squared Test of Independence 1. Assumptions: Two categorical variables Randomization Expected counts ≥ 5 in all cells

The Five Steps of the Chi-Squared Test of Independence 2. Hypotheses: H0: The two variables are independent Ha: The two variables are dependent (associated)

The Five Steps of the Chi-Squared Test of Independence 3. Test Statistic:

The Five Steps of the Chi-Squared Test of Independence 4. P-value: Right-tail probability above the observed value, for the chi-squared distribution with df = (r-1)(c-1) 5. Conclusion: Report P-value and interpret in context If a decision is needed, reject H0 when P-value ≤ significance level

Chi-Squared is Also Used as a “Test of Homogeneity” The chi-squared test does not depend on which is the response variable and which is the explanatory variable When a response variable is identified and the population conditional distributions are identical, they are said to be homogeneous The test is then referred to as a test of homogeneity

Example: Aspirin and Heart Attacks Revisited

Example: Aspirin and Heart Attacks Revisited What are the hypotheses for the chi-squared test for these data? The null hypothesis is that whether a doctor has a heart attack is independent of whether he takes placebo or aspirin The alternative hypothesis is that there’s an association

Example: Aspirin and Heart Attacks Revisited Report the test statistic and P-value for the chi-squared test: The test statistic is 25.01 with a P-value of 0.000 This is very strong evidence that the population proportion of heart attacks differed for those taking aspirin and for those taking placebo

Example: Aspirin and Heart Attacks Revisited The sample proportions indicate that the aspirin group had a lower rate of heart attacks than the placebo group

Limitations of the Chi-Squared Test If the P-value is very small, strong evidence exists against the null hypothesis of independence But… The chi-squared statistic and the P-value tell us nothing about the nature of the strength of the association

Limitations of the Chi-Squared Test We know that there is statistical significance, but the test alone does not indicate whether there is practical significance as well

How Strong is the Association? Section 10.3 How Strong is the Association?

The following is a table on Gender and Happiness: Not Pretty Very Females 163 898 502 Males 130 705 379 In a study of the two variables (Gender and Happiness), which one is the response variable? Gender Happiness

The following is a table on Gender and Happiness: Not Pretty Very Females 163 898 502 Males 130 705 379 What is the Expected Cell Count for ‘Females’ who are ‘Pretty Happy’? 898 801.5 902 521

The following is a table on Gender and Happiness: Not Pretty Very Females 163 898 502 Males 130 705 379 What is the Expected Cell Count for ‘Females’ who are ‘Pretty Happy’? 898 801.5 902 = N*(898+705)/N*(163+898+502)/N 521

The following is a table on Gender and Happiness: Not Pretty Very Females 163 898 502 Males 130 705 379 Calculate the 1.75 0.27 0.98 10.34

The following is a table on Gender and Happiness: Not Pretty Very Females 163 898 502 Males 130 705 379 At a significance level of 0.05, what is the correct decision? ‘Gender’ and ‘Happiness’ are independent There is an association between ‘Gender’ and ‘Happiness’

Analyzing Contingency Tables Is there an association? The chi-squared test of independence addresses this When the P-value is small, we infer that the variables are associated

Analyzing Contingency Tables How do the cell counts differ from what independence predicts? To answer this question, we compare each observed cell count to the corresponding expected cell count

Analyzing Contingency Tables How strong is the association? Analyzing the strength of the association reveals whether the association is an important one, or if it is statistically significant but weak and unimportant in practical terms

Measures of Association A measure of association is a statistic or a parameter that summarizes the strength of the dependence between two variables

Difference of Proportions An easily interpretable measure of association is the difference between the proportions making a particular response

Difference of Proportions

Difference of Proportions Case (a) exhibits the weakest possible association – no association Accept Credit Card The difference of proportions is 0 Income No Yes High 60% 40% Low

Difference of Proportions Case (b) exhibits the strongest possible association: Accept Credit Card The difference of proportions is 100% Income No Yes High 0% 100% Low

Difference of Proportions In practice, we don’t expect data to follow either extreme (0% difference or 100% difference), but the stronger the association, the large the absolute value of the difference of proportions

Example: Do Student Stress and Depression Depend on Gender?

Example: Do Student Stress and Depression Depend on Gender? Which response variable, stress or depression, has the stronger sample association with gender?

Example: Do Student Stress and Depression Depend on Gender? The difference of proportions between females and males was 0.35 – 0.16 = 0.19 Gender Yes No Female 35% 65% Male 16% 84%

Example: Do Student Stress and Depression Depend on Gender? The difference of proportions between females and males was 0.08 – 0.06 = 0.02 Gender Yes No Female 8% 92% Male 6% 94%

Example: Do Student Stress and Depression Depend on Gender? In the sample, stress (with a difference of proportions = 0.19) has a stronger association with gender than depression has (with a difference of proportions = 0.02)

Example: Relative Risk for Seat Belt Use and Outcome of Auto Accidents

Example: Relative Risk for Seat Belt Use and Outcome of Auto Accidents Treating the auto accident outcome as the response variable, find and interpret the relative risk

Large Does Not Mean There’s a Strong Association A large chi-squared value provides strong evidence that the variables are associated It does not imply that the variables have a strong association This statistic merely indicates (through its P-value) how certain we can be that the variables are associated, not how strong that association is

How Can Residuals Reveal the Pattern of Association? Section 10.4 How Can Residuals Reveal the Pattern of Association?

Association Between Categorical Variables The chi-squared test and measures of association such as (p1 – p2) and p1/p2 are fundamental methods for analyzing contingency tables The P-value for summarized the strength of evidence against H0: independence

Association Between Categorical Variables If the P-value is small, then we conclude that somewhere in the contingency table the population cell proportions differ from independence The chi-squared test does not indicate whether all cells deviate greatly from independence or perhaps only some of them do so

Residual Analysis A cell-by-cell comparison of the observed counts with the counts that are expected when H0 is true reveals the nature of the evidence against H0 The difference between an observed and expected count in a particular cell is called a residual

Residual Analysis The residual is negative when fewer subjects are in the cell than expected under H0 The residual is positive when more subjects are in the cell than expected under H0

Residual Analysis To determine whether a residual is large enough to indicate strong evidence of a deviation from independence in that cell we use a adjusted form of the residual: the standardized residual

Residual Analysis (observed count – expected count)/se The standardized residual for a cell: (observed count – expected count)/se A standardized residual reports the number of standard errors that an observed count falls from its expected count Its formula is complex Software can be used to find its value A large value provides evidence against independence in that cell

Example: Standardized Residuals for Religiosity and Gender “To what extent do you consider yourself a religious person?”

Example: Standardized Residuals for Religiosity and Gender

Example: Standardized Residuals for Religiosity and Gender Interpret the standardized residuals in the table

Example: Standardized Residuals for Religiosity and Gender The table exhibits large positive residuals for the cells for females who are very religious and for males who are not at all religious. In these cells, the observed count is much larger than the expected count There is strong evidence that the population has more subjects in these cells than if the variables were independent

Example: Standardized Residuals for Religiosity and Gender The table exhibits large negative residuals for the cells for females who are not at all religious and for males who are very religious In these cells, the observed count is much smaller than the expected count There is strong evidence that the population has fewer subjects in these cells than if the variables were independent

What if the Sample Size is Small? Fisher’s Exact Test Section 10.5 What if the Sample Size is Small? Fisher’s Exact Test

Fisher’s Exact Test The chi-squared test of independence is a large-sample test When the expected frequencies are small, any of them being less than about 5, small-sample tests are more appropriate Fisher’s exact test is a small-sample test of independence

Fisher’s Exact Test The calculations for Fisher’s exact test are complex Statistical software can be used to obtain the P-value for the test that the two variables are independent The smaller the P-value, the stronger is the evidence that the variables are associated

Example: Tea Tastes Better with Milk Poured First? This is an experiment conducted by Sir Ronald Fisher His colleague, Dr. Muriel Bristol, claimed that when drinking tea she could tell whether the milk or the tea had been added to the cup first

Example: Tea Tastes Better with Milk Poured First? Experiment: Fisher asked her to taste eight cups of tea: Four had the milk added first Four had the tea added first She was asked to indicate which four had the milk added first The order of presenting the cups was randomized

Example: Tea Tastes Better with Milk Poured First? Results:

Example: Tea Tastes Better with Milk Poured First? Analysis:

Example: Tea Tastes Better with Milk Poured First? The one-sided version of the test pertains to the alternative that her predictions are better than random guessing Does the P-value suggest that she had the ability to predict better than random guessing?

Example: Tea Tastes Better with Milk Poured First? The P-value of 0.243 does not give much evidence against the null hypothesis The data did not support Dr. Bristol’s claim that she could tell whether the milk or the tea had been added to the cup first