# Warm up On slide.

## Presentation on theme: "Warm up On slide."— Presentation transcript:

Warm up On slide

Section 11.1 Chi-Square

Inference Summary Means Proportions One-sample Z procedures
(Hypothesis Test and Confidence Intervals) Proportions One-sample Z procedures One Proportion Z Procedures One-sample t procedures Two Proportion Z Procedures Matched pairs t procedures Two-sample t procedures

The questions then are…
What if we want to compare MORE than 2 proportions? i.e. Let’s examine the proportion of high school students who go on to four-year colleges. Is that proportion different based on race (White, African American, Asian, Hispanic)? We’d be comparing 4 proportions! What if we want to make a prediction of results based on a predicted model? i.e. We want to predict the results of mating two red-eyed fruit flies by comparing the actual results to the predicted model. What if we want to compare two categorical variables to see if there is a relationship? i.e. Is smoking behavior (current smoker, former smoker, never smoked) associated to socioeconomic status (high, medium, low)?

Pronounced like KITE without the “te.”

Then there were three There are three types of tests
Goodness of fit Homogeneity of Proportions Association / Independence Today our focus will be the Chi-Squared Goodness of Fit test.

Goodness of Fit The Chi-squared goodness of fit test measures whether an observed sample distribution is significantly different from the hypothesized distribution. The idea is to compare the observed counts in each category to the expected count for each category based on the hypothesized distribution.

H0: The specified distribution of the categorical variable is correct.
Ha: The specified distribution of the categorical variable is not correct.

Conditions Use the chi-squared test if SRS
All the expected counts are at least 1. No more than 20% of expected counts are less than 5.

Mars, Incorporated makes milk chocolate candies
Mars, Incorporated makes milk chocolate candies. Here’s what the company’s Consumer Affairs Department says about the color distribution of its M&M’S Milk Chocolate Candies: On average, the new mix of colors of M&M’S Milk Chocolate Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16 percent greens, 20 percent oranges and 24 percent blues

The one-way table below summarizes the data from a sample bag of M&M’S Milk Chocolate Candies. In general, one-way tables display the distribution of a categorical variable for the individuals in a sample Since the company claims that 24% of all M&M’S Milk Chocolate Candies are blue, we might believe that something fishy is going on. We could use the one-sample z test for a proportion from Chapter 9 to test the hypotheses H0: p = 0.24 Ha: p ≠ 0.24 where p is the true population proportion of blue M&M’S. We could then perform additional significance tests for each of the remaining colors.

Hypotheses H0: The company’s stated color distribution for
The null hypothesis in a chi-square goodness-of-fit test should state a claim about the distribution of a single categorical variable in the population of interest. In our example, the appropriate null hypothesis is H0: The company’s stated color distribution for M&M’S Milk Chocolate Candies is correct. Ha: The company’s stated color distribution for M&M’S Milk Chocolate Candies is not correct.

pyellow = 0.14, pred = 0.13, pbrown = 0.13,
We can also write the hypotheses in symbols as H0: pblue = 0.24, porange = 0.20, pgreen = 0.16, pyellow = 0.14, pred = 0.13, pbrown = 0.13, Ha: At least one of the pi’s is incorrect where pcolor = the true population proportion of M&M’S Milk Chocolate Candies of that color.

The formula Remember Σ means sum. So complete this equation for each and add them all up!!!!

P-value = .0703

Percent of the Population
Example Back in 1980, the US population had the following distribution by age: Age Group Percent of the Population 0 to 24 41.39% 25 to 44 27.68% 45 to 64 19.64% 65 and older 11.28%

1996… Suppose I take a sample of 500 US residents in 1996 and find the following distribution: Age Group Count 0 to 24 177 25 to 44 158 45 to 64 101 65 and older 64 Total 500 I want to know: does the distribution of my sample in 1996 match the distribution of age from 1980?

(based on 1980 percentage * 500)
Let’s Compare: Observed (based on sample of 500) Expected (based on 1980 percentage * 500) 177 206.95 158 138.4 101 98.2 64 56.4 Help me fill in the last column! 0-24 25-44 45-64 65+

We see that the distributions are different
We see that the distributions are different. The question is ARE THEY SIGNIFICANTLY DIFFERENT?

Characteristics of the Chi-Squared Statistic
Chi-Square is ALWAYS (always? Yes, always) skewed RIGHT. As the degrees of freedom increase, the graph becomes less skewed. It becomes more symmetric and looks more like a normal curve. The total area under a chi-square curve is 1. WHY?

In Calc Put Observed in L1 and Expected in L2 Stat, Test, χ2 GOF-Test
Enter your df CAUTION!!!! You still need to know how to use the formula and table… Sometimes your calculator will give you an error! This happened in the 2008 Free Response!

How to recognize Χ2 Goodness of Fit
You have many percents and you want to know if your sample matches the distribution.

Chapter 11 #9, 10, 13(a-c), 15, 19-22explain
Homework Chapter 11 #9, 10, 13(a-c), 15, 19-22explain