BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.

Slides:



Advertisements
Similar presentations
BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
Advertisements

CHAPTER 23: Two Categorical Variables: The Chi-Square Test
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Does Background Music Influence What Customers Buy?
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chapter 26: Comparing Counts
CHAPTER 11 Inference for Distributions of Categorical Data
Two-Way Tables Two-way tables come about when we are interested in the relationship between two categorical variables. –One of the variables is the row.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
How Can We Test whether Categorical Variables are Independent?
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Goodness-of-Fit Tests and Categorical Data Analysis
AP STATISTICS LESSON 13 – 1 (DAY 1) CHI-SQUARE PROCEDURES TEST FOR GOODNESS OF FIT.
Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
1 Desipramine is an antidepressant affecting the brain chemicals that may become unbalanced and cause depression. It was tested for recovery from cocaine.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chi-square test or c2 test
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
Chapter 241 Two-Way Tables and the Chi-square Test.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.1 © 2006 W.H. Freeman and Company.
Analysis of two-way tables - Inference for two-way tables IPS chapter 9.2 © 2006 W.H. Freeman and Company.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
+ Chapter 11 Inference for Distributions of Categorical Data 11.1Chi-Square Goodness-of-Fit Tests 11.2Inference for Relationships.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Textbook Section * We already know how to compare two proportions for two populations/groups. * What if we want to compare the distributions of.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Essential Statistics Two Categorical Variables: The Chi-Square Test
Data Analysis for Two-Way Tables
Chapter 11: Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
Analyzing the Association Between Categorical Variables
Presentation transcript:

BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test

BPS - 5th Ed. Chapter 222 u Chapter 20: compare proportions of successes for two groups –“Group” is explanatory variable (2 levels) –“Success or Failure” is outcome (2 values) here a relationship between two categorical variables?” u Chapter 22: “is there a relationship between two categorical variables?” –may have 2 or more groups (one variable) –may have 2 or more outcomes (2 nd variable) Relationships: Categorical Variables

BPS - 5th Ed. Chapter 223 Case Study Data from patients’ own assessment of their quality of life relative to what it had been before their heart attack (data from patients who survived at least a year) Health Care: Canada and U.S. Mark, D. B. et al., “Use of medical resources and quality of life after acute myocardial infarction in Canada and the United States,” New England Journal of Medicine, 331 (1994), pp

BPS - 5th Ed. Chapter 224 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Case Study Health Care: Canada and U.S.

BPS - 5th Ed. Chapter 225 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Total Case Study Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: We have that 75 Canadians reported feeling much better, compared to 541 Americans. The groups appear greatly different, but look at the group totals.

BPS - 5th Ed. Chapter 226 Health Care: Canada and U.S. Compare the Canadian group to the U.S. group in terms of feeling much better: Change the counts to percents Now, with a fairer comparison using percents, the groups appear very similar in terms of feeling much better. Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% Case Study

BPS - 5th Ed. Chapter 227 Health Care: Canada and U.S. Is there a relationship between the explanatory variable (Country) and the response variable (Quality of life)? Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% Look at the conditional distributions of the response variable (Quality of life), given each level of the explanatory variable (Country). Case Study

BPS - 5th Ed. Chapter 228 u If the conditional distributions of the second variable are nearly the same for each category of the first variable, then we say that there is not an association between the two variables. u If there are significant differences in the conditional distributions for each category, then we say that there is an association between the two variables. Conditional Distributions

BPS - 5th Ed. Chapter 229 u In tests for two categorical variables, we are interested in whether a relationship observed in a single sample reflects a real relationship in the population. u Hypotheses: –Null: the percentages for one variable are the same for every level of the other variable (no difference in conditional distributions). (No real relationship). –Alt: the percentages for one variable vary over levels of the other variable. (Is a real relationship). Hypothesis Test

BPS - 5th Ed. Chapter 2210 Health Care: Canada and U.S. Null hypothesis: The percentages for one variable are the same for every level of the other variable. (No real relationship). Quality of lifeCanadaUnited States Much better24%25% Somewhat better23% About the same31%36% Somewhat worse16%13% Much worse6%3% Total100% For example, could look at differences in percentages between Canada and U.S. for each level of “Quality of life”: 24% vs. 25% for those who felt ‘Much better’, 23% vs. 23% for ‘Somewhat better’, etc. Problem of multiple comparisons! Case Study

BPS - 5th Ed. Chapter 2211 u H 0 : no real relationship between the two categorical variables that make up the rows and columns of a two-way table u To test H 0, compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H 0 were true) –if the observed counts are far from the expected counts, that is evidence against H 0 in favor of a real relationship between the two variables Hypothesis Test

BPS - 5th Ed. Chapter 2212 u The expected count in any cell of a two-way table (when H 0 is true) is Expected Counts u The development of this formula is based on the fact that the number of expected successes in n independent tries is equal to n times the probability p of success on each try (expected count = n  p) –Example: find expected count in certain row and column (cell): p = proportion in row = (row total)/(table total); n = column total; expected count in cell = np = (row total)(column total)/(table total)

BPS - 5th Ed. Chapter 2213 Quality of lifeCanadaUnited StatesTotal Much better Somewhat better About the same Somewhat worse Much worse Total Case Study Health Care: Canada and U.S. For the expected count of Canadians who feel ‘Much better’ (expected count for Row 1, Column 1): For the observed data to the right, find the expected value for each cell:

BPS - 5th Ed. Chapter 2214 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts: Quality of lifeCanadaUnited States Much better Somewhat better About the same Somewhat worse Much worse Expected counts: Compare to see if the data support the null hypothesis

BPS - 5th Ed. Chapter 2215 u To determine if the differences between the observed counts and expected counts are statistically significant (to show a real relationship between the two categorical variables), we use the chi-square statistic: Chi-Square Statistic where the sum is over all cells in the table.

BPS - 5th Ed. Chapter 2216 u The chi-square statistic is a measure of the distance of the observed counts from the expected counts –is always zero or positive –is only zero when the observed counts are exactly equal to the expected counts –large values of X 2 are evidence against H 0 because these would show that the observed counts are far from what would be expected if H 0 were true –the chi-square test is one-sided (any violation of H 0 produces a large value of X 2 ) Chi-Square Statistic

BPS - 5th Ed. Chapter 2217 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. Observed counts CanadaUnited States Expected counts

BPS - 5th Ed. Chapter 2218 u Calculate value of chi-square statistic –by hand (cumbersome) –using technology (computer software, etc.) u Find P-value in order to reject or fail to reject H 0 –use chi-square table for chi-square distribution (later in this chapter) –from computer output u If significant relationship exists (small P-value): –compare appropriate percents in data table –compare individual observed and expected cell counts –look at individual terms in the chi-square statistic Chi-Square Test

BPS - 5th Ed. Chapter 2219 u Chi-square test for a two-way table with r rows and c columns uses critical values from a chi-square distribution with (r  1)(c  1) degrees of freedom u P-value is the area to the right of X 2 under the density curve of the chi-square distribution –use chi-square table Chi-Square Test

BPS - 5th Ed. Chapter 2220 u See page 694 in text for Table D (“Chi-square Table”) u The process for using the chi-square table (Table D) is identical to the process for using the t-table (Table C, page 693), as discussed in Chapter 17 u For particular degrees of freedom (df) in the left margin of Table D, locate the X 2 critical value (x*) in the body of the table; the corresponding probability (p) of lying to the right of this value is found in the top margin of the table (this is how to find the P-value for a chi-square test) Table D: Chi-Square Table

BPS - 5th Ed. Chapter 2221 Quality of lifeCanadaUnited States Much better75541 Somewhat better71498 About the same96779 Somewhat worse50282 Much worse1965 Case Study Health Care: Canada and U.S. X 2 = df = (r  1)(c  1) = (5  1)(2  1) = 4 Look in the df=4 row of Table D; the value X 2 = falls between the 0.02 and 0.01 critical values. Thus, the P-value for this chi-square test is between 0.01 and 0.02 (is actually ). ** P-value <.05, so we conclude a significant relationship **

BPS - 5th Ed. Chapter 2222 u The chi-square test is an approximate method, and becomes more accurate as the counts in the cells of the table get larger u The following must be satisfied for the approximation to be accurate: –No more than 20% of the expected counts are less than 5 –All individual expected counts are 1 or greater u If these requirements fail, then two or more groups must be combined to form a new (‘smaller’) two-way table Chi-Square Test: Requirements

BPS - 5th Ed. Chapter 2223 u Family of distributions that take only positive values and are skewed to the right u Specific chi-square distribution is specified by giving its degrees of freedom (similar to t distn) Chi-Square Distributions

BPS - 5th Ed. Chapter 2224 u A variation of the Chi-square statistic can be used to test a different kind of null hypothesis: that a single categorical variable has a specific distribution u The null hypothesis specifies the probabilities (p i ) of each of the k possible outcomes of the categorical variable u The chi-square goodness of fit test compares the observed counts for each category with the expected counts under the null hypothesis Chi-Square Goodness of Fit Test

BPS - 5th Ed. Chapter 2225 u H o : p 1 =p 1o, p 2 =p 2o, …, p k =p ko u H a : proportions are not as specified in H o u For a sample of n subjects, observe how many subjects fall in each category u Calculate the expected number of subjects in each category under the null hypothesis: expected count = n  p i for the i th category Chi-Square Goodness of Fit Test

BPS - 5th Ed. Chapter 2226 u Calculate the chi-square statistic (same as in previous test): u The degrees of freedom for this statistic are df = k  1 (the number of possible categories minus one) u Find P-value using Table D Chi-Square Goodness of Fit Test

BPS - 5th Ed. Chapter 2227 Chi-Square Goodness of Fit Test

BPS - 5th Ed. Chapter 2228 Case Study A random sample of 140 births from local records was collected to show that there are fewer births on Saturdays and Sundays than there are on weekdays Births on Weekends? National Center for Health Statistics, “Births: Final Data for 1999,” National Vital Statistics Reports, Vol. 49, No. 1, 1994.

BPS - 5th Ed. Chapter 2229 Births on Weekends? DaySun.Mon.Tue.Wed.Thu.Fri.Sat. Births Case Study Do these data give significant evidence that local births are not equally likely on all days of the week? Data

BPS - 5th Ed. Chapter 2230 Births on Weekends? DaySun.Mon.Tue.Wed.Thu.Fri.Sat. Probability p1p1 p2p2 p3p3 p4p4 p5p5 p6p6 p7p7 Case Study H o : probabilities are the same on all days H o : p 1 = p 2 = p 3 = p 4 = p 5 = p 6 = p 7 = Null Hypothesis

BPS - 5th Ed. Chapter 2231 Births on Weekends? DaySun.Mon.Tue.Wed.Thu.Fri.Sat. Observed births Expected births 20 Case Study Expected count = n  p i =140  (1/7) = 20 for each category (day of the week) Expected Counts

BPS - 5th Ed. Chapter 2232 Births on Weekends? Case Study Chi-square statistic

BPS - 5th Ed. Chapter 2233 Births on Weekends? Case Study P-value, Conclusion X 2 = 7.60 df = k  1 = 7  1 = 6 P-value = Prob(X 2 > 7.60): X 2 = 7.60 is smaller than smallest entry in df=6 row of Table D, so the P-value is > Conclusion: Fail to reject H o – there is not significant evidence that births are not equally likely on all days of the week