Lesson 11 - R Chapter 11 Review:

Slides:



Advertisements
Similar presentations
Lesson 14 - R Chapter 14 Review:
Advertisements

 2 test for independence Used with categorical, bivariate data from ONE sample Used to see if the two categorical variables are associated (dependent)
Lesson Test for Goodness of Fit One-Way Tables.
Chapter 13: Inference for Distributions of Categorical Data
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Lesson Inference for Two-Way Tables. Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from.
Chapter 26 Chi-Square Testing
Chapter 11 Inference for Tables: Chi-Square Procedures 11.1 Target Goal:I can compute expected counts, conditional distributions, and contributions to.
+ Chi Square Test Homogeneity or Independence( Association)
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Lesson Inference for Two-Way Tables. Knowledge Objectives Explain what is mean by a two-way table. Define the chi-square (χ 2 ) statistic. Identify.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Warm up On slide.
Comparing Counts Chi Square Tests Independence.
11.1 Chi-Square Tests for Goodness of Fit
CHAPTER 26 Comparing Counts.
Presentation 12 Chi-Square test.
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square test or c2 test
Vocabulary Statistical Inference – provides methods for drawing conclusions about a population parameter from sample data Expected Values– row total *
Test for Goodness of Fit
Chapter 12 Tests with Qualitative Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 25 Comparing Counts.
Data Analysis for Two-Way Tables
Chapter 11 Goodness-of-Fit and Contingency Tables
Elementary Statistics
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chi Square Two-way Tables
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Goodness of Fit Test - Chi-Squared Distribution
Chapter 11: Inference for Distributions of Categorical Data
Two Categorical Variables: The Chi-Square Test
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
15.1 Goodness-of-Fit Tests
Inference for Relationships
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
CHAPTER 11 CHI-SQUARE TESTS
Chapter 26 Comparing Counts.
Chapter 13: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts.
Chapter 14.1 Goodness of Fit Test.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

Lesson 11 - R Chapter 11 Review: Inference for Distribution of Categorical Variables: Chi-Square Procedures

Objectives Explain what is meant by a chi-square goodness of fit test Conduct a chi-square goodness of fit test Given a two-way table, compute conditional distributions Conduct a chi-square test for homogeneity of populations Conduct a chi-square test for association / independence Use technology to conduct a chi-square significance test

Vocabulary none new

Chi-Square Distribution Total area under a chi-square curve is equal to 1 It is not symmetric, it is skewed right The shape of the chi-square distribution depends on the degrees of freedom (just like t-distribution) As the number of degrees of freedom increases, the chi-square distribution becomes more nearly symmetric The values of χ² are nonnegative; that is, values of χ² are always greater than or equal to zero (0); they increase to a peak and then asymptotically approach 0 Table D in the back of the book gives critical values

Conditions All Chi-Square tests (GOF, Homogeneity, Independence): Independent SRS(s) All expected counts are greater than or equal to 1 (all Ei ≥ 1) No more than 20% of expected counts are less than 5 Remember it is the expected counts, not the observed that are critical conditions

Chi-Square Test for Goodness of Fit

Chi-Square Test for Homogeneity H0: distribution of response variable is the same for all c populations Ha: distributions are not the same

χ² Test of Association/Independence This test assesses whether this observed association is statistically significant. That is, is the relationship in the sample sufficiently strong for us to conclude that it is due to a relationship between the two variables and not merely to chance.

Homogeneity vs Independence Homogeneity – two or more random samples from two or more distinct populations. Independence – a single sample from a single population, where two categorical variables are measured

z-Test versus χ² Test We use the χ² test to compare any number of proportions The results from the χ² test for 2 proportions will be the same as a z-test for 2 proportions z and χ2 p-value will be the same χ2 test statistic with df = 1 will be z test statistic squared z-Test is recommended to compare two proportions because it gives you a choice of a one-side test and is related to the confidence interval for p1 – p2.

Summary and Homework Summary Homework Goodness-of-fit tests apply to situations where there are a series of independent trials, and each trial has 3 or more possible outcomes The test for homogeneity analyzes whether the observed proportions are the same across the different samples of the populations The test for independence analyzes whether the row and column variables are independent in the same sample Homework Quiz

Problem 1 The makers of the movie Titanic imply that lower-class passengers were treated unfairly when the lifeboats were being filled. We want to determine whether that portrayal is accurate. The following table contains the survival data by passenger class for the 1316 passengers. Following the outline on the next page, you will use a chi-square test to determine whether there is a relationship between survival and passenger class. Class Survived Lost First 203 122 Second 118 167 Third 178 528

Problem 1 cont (a) If this table is considered an r x c table, r = ______ and c = _______. (b) State the null and alternative hypotheses that would be appropriate for this test: (c) Show how to determine the expected number of second class survivors. 3 2 H0: proportions of survivors is the same across classes Ha: at least one proportion is different Exp = rt  ct / tt = 499  285  1316 = 108.07

Problem 1 cont (d) In order to validly use the chi-square test, how many expected values could be less than 5? _______ How many expected values could be less than 1? ________ (e) How many degrees of freedom would be associated with this test? _________ Show how you determined the degrees of freedom: 1 2 (r-1)(c-1) = (3-1)(2-1) = 2(1) = 2

Problem 1 cont (f) Use your calculator to perform this chi-square test. Examine the matrix of expected values and write the expected values for each cell next to their observed frequencies in the table. (Round to tenths.) In what classes are there more survivors than would be expected under the assumption of the null hypothesis?  What is the value of the chi-square statistic? ______ Class Survived Lost First 203 122 Second 118 167 Third 178 528 123.23 201.77 108.07 176.93 267.7 438.3 about 80 more in 1st class and 10 more in 2nd class 133.05

Problem 1 cont (iv) What is the P-value associated with the chi-square statistic? __________ (v) State your conclusion regarding the hypotheses of this test: (vi) Examine the matrix of chi-square components that is created by your calculator. Which entry has the greatest contribution to your test statistic? What is the value of this component? 1.2  10-29 Since the p-value is so small we have strong evidence to reject H0 and conclude that survivor rates were different between classes. 51.94 from first class survivors

Problem 2 Class Survived Lost First 203 122 Second 118 167 Third 178 528 (a) What proportion of all 1316 passengers were third class passengers? (b) What proportion of survivors were third class passengers? (c) What proportion of first class passengers survived? 706 / 1316 = 53.65% 178 / 499 = 35.67% 203 / 325 = 62.46%

Countywide Percentage Problem 3 It is sometimes said that older people are over-represented on juries. The table below gives the percentage distribution of all people over 21 years of age in Alameda County, CA by age group. The table also shows the age group classification for a sample of 66 people who served on grand juries in this county. Age Countywide Percentage Number of jurors 21 to 40 42 5 41 to 50 23 9 51 to 60 16 19 61 or older 33 Total 100 66

Problem 3 cont We would like to perform a chi-square test to determine whether the age distribution of jurors is significantly different from the age distribution of county residents. That is, we want to test the following hypotheses: H0: 42% of jurors are 21 to 40 years old, 23% of jurors are 41 to 50 years old, 16% of jurors are 51 to 60 years old, and 19% of jurors are 61 or older. Ha: The age distribution of jurors is different from the one above.

Countywide Percentage Problem 3 cont Working under the assumption that the H0 is true, write how many of the 66 jurors would you expect next to the observed values in the table below: Write a few sentences to describe how the counts you expect if the null hypothesis is true compared to the counts observed in this sample. Age Countywide Percentage Number of jurors 21 to 40 42 5 41 to 50 23 9 51 to 60 16 19 61 or older 33 Total 100 66 27.72 15.18 10.56 12.54 the expected counts will be close to the observed values if the null hypothesis is true.

Countywide Percentage Problem 3 cont Use a chi-square test to determine whether the age distribution of jurors differs significantly from the age distribution of the general population. Show the computations needed to compute the chi-square statistic, and state the degrees of freedom, the P-value, and the conclusion. You do not need to state or check conditions.   (O – E)² χ² = ------------ E = 17.02 + 2.52 + 6.75 + 33.38 = 59.66 df = 3 p-value < 0.0005 Age Countywide Percentage Number of jurors 21 to 40 42 5 41 to 50 23 9 51 to 60 16 19 61 or older 33 Total 100 66 With such a low p-value we have strong evidence to reject H0 and conclude that the percentages of jurors by age does not follow the county’s percentages by age.