CHAPTER 26 Comparing Counts.

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide
Advertisements

Chapter 26 Comparing Counts
Chapter 11 Other Chi-Squared Tests
1 Chi-Squared Distributions Inference for Categorical Data and Multiple Groups.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Chapter 26: Comparing Counts
Chapter 26 Part 1 COMPARING COUNTS. Is an observed distribution consistent with what we expect? Are observed differences among several distributions large.
Analysis of Count Data Chapter 26
 Involves testing a hypothesis.  There is no single parameter to estimate.  Considers all categories to give an overall idea of whether the observed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
AP Statistics Chapter 26 Notes
CHAPTER 26: COMPARING COUNTS OF CATEGORICAL DATA To test claims and make inferences about counts for categorical variables Objective:
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 26: Comparing counts of categorical data
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 26 Comparing Counts.
Slide 26-1 Copyright © 2004 Pearson Education, Inc.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Statistics 26 Comparing Counts. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit.
Copyright © 2009 Pearson Education, Inc. Chapter 26 Comparing Counts.
Comparing Counts Chi Square Tests Independence.
Other Chi-Square Tests
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chi-Square hypothesis testing
CHAPTER 11 Inference for Distributions of Categorical Data
Chi-square test or c2 test
Chapter 11 Chi-Square Tests.
Chapter 12 Tests with Qualitative Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 25 Comparing Counts.
Chi Square Two-way Tables
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Chapter 11: Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Overview and Chi-Square
15.1 Goodness-of-Fit Tests
Inference for Relationships
Chapter 11 Chi-Square Tests.
Inference on Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Lesson 11 - R Chapter 11 Review:
Paired Samples and Blocks
Analyzing the Association Between Categorical Variables
Chapter 26 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
Section 11-1 Review and Preview
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
13.1 Test for Goodness of Fit
Chapter 26 Comparing Counts.
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 11 Chi-Square Tests.
CHAPTER 11 Inference for Distributions of Categorical Data
Presentation transcript:

CHAPTER 26 Comparing Counts

Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit test. Is one distribution comparable to another or does one distribution fit another? As usual, there are assumptions and conditions to consider…

Assumptions and Conditions Counted Data Condition: Check that the data are counts for the categories of a categorical variable. Independence Assumption: Randomization Condition: The individuals who have been counted should be a random sample from some population. Sample Size Assumption: We must have enough data for the methods to work. Expected Cell Frequency Condition: We should expect to see at least 5 individuals in each cell.

Calculations Since we want to examine how well the observed data reflect what would be expected, we look at the differences between the observed and expected counts (Obs – Exp). These differences are residuals, so we know that adding all of the differences will result in a sum of 0. We’ll handle the residuals as we did in regression, by squaring them. To get an idea of the relative sizes of the differences, we will divide these squared quantities by the expected values.

Calculations (cont.) The test statistic, called the chi-square (or chi-squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:

Calculations (cont.) The chi-square models are actually a family of distributions indexed by degrees of freedom (much like the t-distribution). The number of degrees of freedom for a goodness-of-fit test is n – 1, where n is the number of categories.

One-Sided or Two-Sided? The chi-square statistic is used only for testing hypotheses. If the observed counts don’t match the expected, the statistic will be large—it can’t be “too small.” So the chi-square test is always one-sided. If the calculated value is large enough, we’ll reject the null hypothesis.

The Chi-Square Calculation Find the expected values: Every model gives a hypothesized proportion for each cell. The expected value is the product of the total number of observations times this proportion. Compute the residuals: Once you have expected values for each cell, find the residuals, Observed – Expected. Square the residuals.

The Chi-Square Calculation (cont.) Compute the components. Now find the components for each cell. Find the sum of the components (that’s the chi-square statistic).

The Chi-Square Calculation (cont.) Find the degrees of freedom. It’s equal to the number of cells minus one. Test the hypothesis. Use your chi-square statistic to find the P-value. (Remember, you’ll always have a one-sided test.) Large chi-square values mean lots of deviation from the hypothesized model, so they give small P-values.

Example Trix cereal comes in five fruit flavors. A student sorted an entire box of the cereal and found the following distribution of flavors for the pieces of cereal in the box: Test the null hypothesis that the flavors are uniformly distributed versus the alternative that they are not. Flavor Grape Lemon Lime Orange Strawberry Freq 530 470 420 610 585

1. State the hypotheses. 2. Model

1. State the hypotheses. 2. Model Chi-squared goodness of fit Counted data Randomization – assume random box of cereal Sample Size – all expected counts are 5 or more

3. Mechanics

3. Mechanics Chi-squared = Flavor Grape Lemon Lime Orange Strawberry EC 523

4. Conclusion Since our p-value of 0 is less than .05, we reject the null hypothesis that states the flavors are uniformly distributed. We have sufficient evidence to prove that the flavors are not uniformly distributed.

M&M Activity Due Next Class ASSIGNMENT A#4/3 p. 628 #3, 5, 6, 7 M&M Activity Due Next Class

Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square test of homogeneity. A test of homogeneity is actually the generalization of the two-proportion z-test.

Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: Counted Data Condition: The data must be counts. Randomization Condition: As long as we don’t want to generalize to the larger population, we don’t have to check this condition. Expected Cell Frequency Condition: The expected count in each cell must be at least 5.

Calculations To find the expected counts, we multiply the row total by the column total and divide by the grand total. We calculated the chi-square statistic as we did in the goodness-of-fit test: In this situation we have (R – 1)(C – 1) degrees of freedom, where R is the number of rows and C is the number of columns. We’ll need the degrees of freedom to find a P-value for the chi-square statistic.

Example Medical researchers enlisted 90 subjects for an experiment comparing treatments for depression. The subjects were randomly divided into three groups and given pills to take for a period of three months. One group received a placebo, the second group St. John’s wort, and the third group the prescription drug Posrex. After six months, psychologists and physicians evaluated the subjects to see if their depression had returned.

Perform a chi-squared test of homogeneity on the above data. Diagnosis Placebo StJ W Posrex Depression returned 24 22 14 No Sign of Depression 6 8 16 Perform a chi-squared test of homogeneity on the above data.

1. State the hypotheses. 2. Model

1. State the hypotheses. 2. Model Chi-squared test for homogeneity Counted data Randomization – subjects were randomly assigned to groups Expected Cell Frequency – All expected counts are 5 or more?

3. Mechanics

3. Mechanics Diagnosis Placebo StJ W Posrex Depression returned 20 No sign of depression 10

4. Conclusion Since our p-value of .0149 is less than .05, we reject the null hypothesis that states the recurrence rates for all three treatments is the same. We have sufficient evidence to prove that the recurrence rates are different for the three treatments.

Independence Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table. A chi-square test of independence uses the same calculation as a test of homogeneity.

Assumptions and Conditions We still need counts and enough data so that the expected values are at least 5 in each cell. If we’re interested in the independence of variables, we usually want to generalize from the data to some population. In that case, we’ll need to check that the data are a representative random sample from that population.

Example A study of the career plans of young women and men sent questionnaires to all 724 members of the senior class in the College of Business Administration at the University of Illinois. One question asked which major within the business program the student had chosen. Is there a relationship between the gender of the student and their choice of major?

Female Male Accounting 68 56 Administration 91 40 Economics 5 8 Finance 61 59

1. State the hypotheses. 2. Model

1. State the hypotheses. 2. Model Chi-squared test for independence Counted data Random sample – sent to all students at one University, not fully random Expected Counts – all expected counts 5 or more?

3. Mechanics

3. Mechanics Female Male Accounting 71.907 52.093 Administration 75.966 55.034 Economics 7.5387 5.4613 Finance 69.588 50.412

4. Conclusion Since our p-value of 0.0069 is less than 0.05, we reject the null hypothesis that states there is no relationship between gender and the major that was chosen. We have sufficient evidence to prove that there is a relationship between gender and major.

What Can Go Wrong? Don’t use chi-square methods unless you have counts. Just because numbers are in a two-way table doesn’t make them suitable for chi-square analysis. Beware large samples. With a sufficiently large sample size, a chi-square test can always reject the null hypothesis. Don’t say that one variable “depends” on the other just because they’re not independent. Association is not causation.

Investigative Task Due April 15/16 ASSIGNMENT A#4/4 p. 629 #11, 12, 14, 16, 20 Investigative Task Due April 15/16