Chi-Squared Tutorial This is significantly important. Get your AP Equations and Formulas sheet.

Presentation on theme: "Chi-Squared Tutorial This is significantly important. Get your AP Equations and Formulas sheet."— Presentation transcript:

Chi-Squared Tutorial This is significantly important. Get your AP Equations and Formulas sheet

The Purpose The chi-squared analysis exists to help us determine whether two sets of data have a significant difference. – Remember early in the semester when I said how scientists use the word “significant” only when they really mean it? This is one method to tell if you can use the word. Take a biostatistics course in college and you’ll learn a buttload more.

The Null Hypothesis Also recall that every experiment has a null hypothesis. – The “not very interesting” possibility, a.k.a. there is no difference between two sets of numbers. In order to accept your own hypothesis, you must reject the null hypothesis. – In other words, determine if the results are significant. The chi-squared test is one way to tell if you can do that.

An Example To resurrect this analogy and then kill it again, suppose you flip a coin 10 times. – You get 6 heads and 4 tails. – Is something fishy? That’s a 60% heads rate. If you flip 100 times and get 60 heads and 40 tails, that’s the same rate. – Now you might think something’s wrong. – But where do you draw the line? How many flips does it take? – Looks like you need one of them chi-squared tests. http://images.nationalgeographic.com/wpf/media-live/photos/000/002/cache/angler-fish_222_600x450.jpg

The Chi-Squared Test The Greek letter chi is basically an χ, so the chi-squared test usually goes by the name χ 2. To perform the test, you need the following: – Data you observe (o). – Data you expect (e). – The degrees of freedom (df). For example, in the 100 flip test, you’d expect 50 heads, but you observed 60 heads.

The Chi-Squared Test: Step 1 Determine the difference between observed and expected numbers: 60 observed – 50 expected = 10 heads difference. Square the difference: 10 2 = 100. Divide by what you expected: 100/50 = 2. Do the same for all calculated “differences” and add them together. 40 observed – 50 expected = -10 tails difference, squared to 100, divided by 50 = 2. 2 + 2 = 4.

The Chi-Squared Test: Step 2 That “4” we got as our answer is the calculated chi-squared statistic (χ 2 calc ) for our test. – The higher this is relatively speaking, the less “random chance” can play a role. – It’s called “calculated” because…you just…calculated it. We will compare this statistic to another number to see if this indicates more variation than chance would suggest, or not.

The Chi-Squared Test: Step 2 The number to which you’ll compare the calculated χ 2 value is called the critical chi- squared value (χ 2 crit ). To figure out how to get the critical value, you need to know one other thing – the degrees of freedom.

The Chi-Squared Test: Step 3 Degrees of freedom goes by “df” and represents…well…this is hard to explain. Let’s try this: – I flipped 100 times and got 60 heads. Once I know how many times I got heads, the number of times I got tails is a given. – As a result, though there are two outcomes, there is only one degree of freedom. Typically, df is the number of possible outcomes minus one.

The Chi-Squared Test: Step 4 p value reflects probability of chance and is frequently given by alpha (  ). Traditionally, scientists need 95% confidence that something is not caused by chance to reject the null hypothesis. Therefore, we need a p value of 0.05 or less. p=0.05 means it’s only 5% likely to be chance.

The Chi-Squared Test: Step 5 Finally, you look up the value of χ 2 critical in a chi-squared analysis table. – Make sure your p value is 0.05 (or whatever is specified by the problem/experiment). Once you have both χ 2 critical and χ 2 calculated, compare: χ 2 crit > χ 2 calc ? Accept the null hypothesis. There is no significant difference. χ 2 crit ≤ χ 2 calc ? Reject the null hypothesis. There’s something going on here.

The Chi-Squared Test: Step 5 df/prob.0.990.950.900.800.700.500.300.200.100.05 10.000130.00390.0160.060.150.461.071.642.713.84 20.020.100.210.450.711.392.413.224.605.99 30.120.350.581.001.422.373.664.646.257.82 40.30.711.061.652.203.364.885.997.789.49 50.551.141.612.343.004.356.067.299.2411.07 Insignificant (accept null hypothesis) Significant (reject null hypothesis)

The Chi-Squared Test: Step 6 At p=0.05 (5% likelihood it’s chance) and 1 DF, χ 2 crit is 3.84, which is less than the “4” we got. Since χ 2 crit ≤ χ 2 calc, we can reject the null hypothesis. – Something’s up with this coin. Just so you know, doing this with 6/4 heads/tails leads to a χ 2 calc of 0.4, which is not a significant result. Let’s look at the table for χ 2 calc = 0.4.

The Chi-Squared Test: Step 5 df/prob.0.990.950.900.800.700.500.300.200.100.05 10.000130.00390.0160.060.150.461.071.642.713.84 20.020.100.210.450.711.392.413.224.605.99 30.120.350.581.001.422.373.664.646.257.82 40.30.711.061.652.203.364.885.997.789.49 50.551.141.612.343.004.356.067.299.2411.07 Insignificant (accept null hypothesis) Significant (reject null hypothesis)

6 Heads, 4 Tails Our χ 2 calc = 0.4 value corresponds to a p value somewhere between 0.70 and 0.50. – So it’s about 60% likely to be chance that we got 6 heads. Makes sense. Computer software can often calculate an exact p value for you, but for our purposes we’ll use tables.

Chi-Squared Summary o is “observed” – What you found. e is “expected” – What you would have gotten if there were no difference.  (sigma) means “sum of” – Add all the (o-e) 2 /e results together Look up what you get for x 2 on a chi-squared table under with the right “degrees of freedom” under p=0.05. – If your x 2 value is higher, it’s a significant difference! – If not, find the closest p value.

Scientific Example: Chantix™ Remember Chantix? The anti-smoking drug we discussed earlier in the year? – How do we relate this to chi-squared testing? First, what’s the null hypothesis? – Chantix has no effect on smoking cessation. The observed data? – How many smokers quit. The expected data? – How many smokers quit…on a placebo. Degrees of freedom? – One. You either quit or you don’t.

Chi-Squared Takeaways x 2 increases with greater differences between data sets. So, to be confident it is not a chance effect, you need a bigger difference from the result of the chi-squared test than is listed on the table. With more degrees of freedom, you need an even larger difference between the data sets. Now let’s get to some M&Ms…

M&M Chi-Squared Activity Here’s the idea: – Mars says they measure out how many M&Ms of various colors are in a bag. – But are they really all equal? How can we tell? Perform a chi-squared test to find out! – Count the number of each color in your bag. – Convert the given percentages to numbers (no rounding necessary). – Complete the test and find out if your bag is significantly different from what Mars calls standard. – Note: We will pool all our data for the second half of the lab during the next class.