Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.

Slides:



Advertisements
Similar presentations
Statistical Inference for Frequency Data Chapter 16.
Advertisements

Chapter 13: Inference for Distributions of Categorical Data
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Significance Tests Chapter 13.
© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.
CHAPTER 11 Inference for Distributions of Categorical Data
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Chi-Square and F Distributions Chapter 11 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Categorical Data Test of Independence.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Goodness-of-Fit Tests and Categorical Data Analysis
More About Significance Tests
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 33: Chapter 12, Section 2 Two Categorical Variables More.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chi-square test or c2 test
Analysis of two-way tables - Formulas and models for two-way tables - Goodness of fit IPS chapters 9.3 and 9.4 © 2006 W.H. Freeman and Company.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Categorical Variables Chapter 6.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Statistical Significance for a two-way table Inference for a two-way table We often gather data and arrange them in a two-way table to see if two categorical.
Copyright © 2010 Pearson Education, Inc. Slide
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Anova and contingency tables
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Goodness-of-Fit and Contingency Tables Chapter 11.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
11/12 9. Inference for Two-Way Tables. Cocaine addiction Cocaine produces short-term feelings of physical and mental well being. To maintain the effect,
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.
Presentation 12 Chi-Square test.
CHAPTER 11 Inference for Distributions of Categorical Data
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Inference for Relationships
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1

Copyright ©2011 Brooks/Cole, Cengage Learning 2 Principle Question: Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable?

Copyright ©2011 Brooks/Cole, Cengage Learning Chi-Square Test for Two-Way Tables Data displayed in a contingency or two-way table. Each combination of row/column is a cell of table. Two types of conditional percents: row and column. Row percents: percents across a row, based on total number in the row. Column percents: percents down a column, based on total number in the column. If one variable is explanatory, use it to define rows and use row percents.

Copyright ©2011 Brooks/Cole, Cengage Learning 4 Recall: Five steps for assessing statistical significance. Step 1: Null and alternative hypotheses H 0 : The two variables are not related. H a : The two variables are related. Sometimes associated is used instead of related.

Copyright ©2011 Brooks/Cole, Cengage Learning 5 Example 15.1 Ear Infections and Xylitol Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge Response = Did child have an ear infection? Only 16.2% of children in Xylitol Gum group had infection.

Copyright ©2011 Brooks/Cole, Cengage Learning 6 Example 15.1 Infections and Xylitol H 0 : p 1 = p 2 = p 3  (no relationship between trt and outcome) H a : p 1, p 2, p 3 are not all the same (there is a relationship) Let p 1 = proportion who would get an ear infection in the population given placebo gum p 2 = proportion who would get an ear infection in the population given xylitol gum p 3 = proportion who would get an ear infection in the population given xylitol lozenges

Copyright ©2011 Brooks/Cole, Cengage Learning 7 Example 15.2 Making Friends Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? H 0 : No difference in distribution of responses of men and women (no relationship between gender and response) H a : There is a difference in distribution of responses of men and women (is a relationship between gender and response)

Copyright ©2011 Brooks/Cole, Cengage Learning 8 Tech Note: Homogeneity and Independence Two variations of the general hypotheses statements which depend on the method of sampling. If samples have been taken from separate populations, the null hypothesis statement is a statement of homogeneity (sameness) among the populations. If a sample has been taken from a single population, and two categorical variables measured for each individual, the statement of no relationship is a statement of independence between the two variables.

Copyright ©2011 Brooks/Cole, Cengage Learning 9 Step 2: Chi-square Statistic and Necessary Conditions Compute expected count for each cell: Expected count = Row total  Column total Total n Compute test statistic by totaling over all cells: (Observed – Expected) 2 Expected Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (i.e. if null hypothesis were true).

Copyright ©2011 Brooks/Cole, Cengage Learning 10 More on the Chi-square Statistic Large difference  evidence of a relationship. Guidelines for large sample: 1. All expected counts should be greater than At least 80% of the cells should have an expected count greater than 5.

Copyright ©2011 Brooks/Cole, Cengage Learning 11 Example 15.3 Infections and Xylitol Expected count for “Placebo Gum, Yes Infection” cell: Expected Counts:

Copyright ©2011 Brooks/Cole, Cengage Learning 12 Example 15.3 Infections and Xylitol Chi-square Test Statistic:

Copyright ©2011 Brooks/Cole, Cengage Learning 13 Step 3: p-value of Chi-square Test p-value = probability the chi-square test statistic could have been as large or larger if the null hypothesis were true. Large test statistic  evidence of a relationship. So how large is enough to declare significance? Chi-square probability distribution used to find p-value. Degrees of freedom df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1)

Copyright ©2011 Brooks/Cole, Cengage Learning 14 Chi-square Distributions Skewed to the right distributions. Minimum value is 0. Indexed by the degrees of freedom (df).

Copyright ©2011 Brooks/Cole, Cengage Learning 15 Example 15.4 Infections and Xylitol Chi-square statistic was 6.69 df = (3-1)(2-1) = 2 p-value = 0.035

Copyright ©2011 Brooks/Cole, Cengage Learning 16 Finding the p-value from Table A.5: If value of statistic falls between two table entries, p-value is between values of p (column headings) for these entries. If value of statistic is larger than entry in rightmost column (labeled p = 0.001), p-value is less than (p < 0.001). If value of statistic is smaller than entry in leftmost column (labeled p = 0.50), p-value is greater than 0.50 (p > 0.50). Look in corresponding “df” row of Table A.5. Scan across until you find where the statistic falls.

Copyright ©2011 Brooks/Cole, Cengage Learning 17 Example 15.5 Infections and Xylitol There is a statistically significant relationship between the risk of an ear infection and the preventative treatment. Chi-square statistic was 6.69 df = (3-1)(2-1) = < p-value <.05

Copyright ©2011 Brooks/Cole, Cengage Learning 18 Example 15.6 A Moderate p-Value Table has three rows and three columns. The computed chi-square statistic is Degrees of freedom are df = (3 – 1)(3 – 1) = 4. Finding the p-value: Scan the df = 4 row in Table A.5 and the value of 8.12 is between the entries 7.78 (p = 0.10) and 8.50 (p = 0.075). Thus, the p-value is between and < p-value < 0.10

Copyright ©2011 Brooks/Cole, Cengage Learning 19 Steps 4 and 5:Making a Decision and Reporting a Conclusion Two equivalent rules: Reject H 0 when … p-value  0.05 Chi-square statistic is greater than the entry in the 0.05 column of Table A.5 (the critical value). Large test statistic  small p-value  evidence a real relationship exists in population. Note: For 2x2 tables, a test statistic of 3.84 or larger is significant.

Copyright ©2011 Brooks/Cole, Cengage Learning 20 Reporting a Conclusion Ways to write “do not reject H 0 ” The relationship between smoking and drinking alcohol is not statistically significant. The proportions of smokers who never drink, drink occasionally, and drink often are not significantly different from the proportions of non-smokers who do so. There is insufficient evidence to conclude that there is a relationship in the population between smoking and drinking alcohol. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2011 Brooks/Cole, Cengage Learning 21 Reporting a Conclusion Ways to write “reject H 0 ” There is a statistically significant relationship between smoking and drinking alcohol. The proportions of smokers who never drink, drink occasionally, and drink often are not the same as the proportions of non-smokers who do so. Smokers have significantly different drinking behavior than non-smokers. Example: Testing whether there is a relationship between smoking (yes or no) and drinking alcohol (never, occasionally, often).

Copyright ©2011 Brooks/Cole, Cengage Learning 22 Example 15.8 Making Friends Q: With whom do you find it easiest to make friend – opposite sex or same sex or no difference? df = (2 – 1)(3 – 1) = 2. Table A.5: value of falls between entries in column (7.38) and 0.01 column (9.21) < p-value < There is statistically significant relationship at the 0.05 level. There appears to be a a difference in distribution of responses of men and women if the populations were asked this question.

Copyright ©2011 Brooks/Cole, Cengage Learning 23 Supporting Analyses Description of row (or column) percents. Bar chart of counts or percents. Examination each cell’s “contribution to chi-square.” Cells with largest values have contributed most to significance of relationship  deserve attention in any description of relationship. Confidence intervals for important proportions or for differences between proportions. To learn about the specific nature of the relationship:

Copyright ©2011 Brooks/Cole, Cengage Learning 24 Chi-Square Test or Z-Test for Difference in Two Proportions? Does it make a difference? If desired H a has no specific direction (two-sided), the two tests give exactly the same p-value. The squared value of the z-statistic equals the chi-square statistic. If desired H a has a direction (one-sided), the z-test should be used.

Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about One Categorical Variable: GOF Step 1: Determine the null and alternative hypotheses. H 0 : The probabilities for k categories are p 1, p 2,..., p k. H a : Not all probabilities specified in H 0 are correct. Note: Probabilities in the null hypothesis must sum to 1. Goodness of Fit (GOF) Test

Copyright ©2011 Brooks/Cole, Cengage Learning 26 Goodness of Fit (GOF) Test Step 2: Verify necessary data conditions, and if met, summarize the data into an appropriate test statistic. If at least 80% of the expected counts are greater than 5 and none are less than 1, compute where the expected count for the i th category is computed as np i. (Observed – Expected) 2 Expected

Copyright ©2011 Brooks/Cole, Cengage Learning 27 Goodness of Fit (GOF) Test Step 3: Assuming the null hypothesis is true, find the p-value. Use chi-square distribution with df = k – 1. Step 4: Decide whether or not the result is statistically significant based on the p-value. The result is statistically significant if the p-value  . Step 5: Report the conclusion in the context of the situation.

Copyright ©2011 Brooks/Cole, Cengage Learning 28 Example Pennsylvania Daily Number State lottery game: Three-digit number made by drawing a digit between 0 and 9 from each of three different containers. Focus = draws from the first container. If numbers randomly selected, each value would be equally likely to occur. H 0 : p = 1/10 for each of the 10 possible digits H a : Not H 0

Copyright ©2011 Brooks/Cole, Cengage Learning 29 Example Daily Number Data: n = 500 days between 7/19/99 and 11/29/00

Copyright ©2011 Brooks/Cole, Cengage Learning 30 Example Daily Number Chi-square goodness of fit statistic: From Table A.5: df = k – 1 = 10 – 1 = 9 p-value > 0.50 Result is not statistically significant; the null hypothesis is not rejected.