Presentation on theme: "Finish Anova And then Chi- Square. Fcrit Table A-5: 4 pages of values Left-hand column: df denominator df for MSW = n-k where k is the number of groups."— Presentation transcript:
Finish Anova And then Chi- Square
Fcrit Table A-5: 4 pages of values Left-hand column: df denominator df for MSW = n-k where k is the number of groups Next column: area in upper tail, e.g This is an alpha. Across the top, df numerator df for MSA = k - 1
Conclusions for Anova Look at the graph of the f distribution on page 290. If Fcalc is greater than Fcrit, we reject the null hypothesis that the means of the populations for the three or more groups are equal. We can accept the hypothesis that at least one of the groups comes from a different population
Bonferroni Method Anova tells us if there is a difference but not if all the groups come from different populations Maybe 2 or more are from the same population and only one is different How do we find out which one(s) are different? Bonferroni or Tukey’s Method or others We aren’t going to do them until later in the course.
Degrees of Freedom You may notice in each of these cases, we are still using n-1 for the degrees of freedom Within the groups, take n-1 for each and add them up. Equals n-k. Among the groups, take number of groups minus one. Equals k-1.
Any idea what degrees of freedom means?? You can just remember n-1 or you can consider this. Degrees of freedom means how many values are independent given that there is a sum of the values. Once we have a sum of the values and we have n-1 of the values, then the last one is fixed, not free. We can consider that there is a sum or a mean or whatever, that fixes that last value.
When do we use df? One situation is for the sample standard deviation. That uses the deviation around the mean. Since we have the mean, then only n-1 values within the sample are “free” or “variable”.
Chi Square Testing for Associations
NEW QUESTIONS Up till now we looked for relationships by comparing the means of measurements from samples or populations. Here we are using counts instead of means of measurements
A non-parametric Test Can be used for nominal data Count cases within categories Chi Square Tests have many applications We will use the method to test for independence of categories
Hypotheses NULL The categories are independent There is no association between them ALTERNATIVE The categories are related
What are we trying to prove? As usual, we are trying to reject the null hypothesis. We are looking for evidence that events are related. (In other applications of chi-square, may be looking for independence. Not in this course)
Example Outbreak of stomach upset on the day following a community picnic At the doctor’s office, we note two facts. Does the person have an upset stomach Was the person at the picnic Set up a contingency table - what’s that?
The Table, set it up by counts or proportions Upset stomach Not upset Totals At picnic Not at picnic Totals
Upset stomach Not upset Totals At picnic Not at picnic Totals
Null Hypothesis The upset stomachs have nothing to do with the picnic H A : Presence at the picnic is related to upset stomach If the H 0 is correct, each event will be independent of the others The proportion in each category will be just by “chance”
Let’s leave the actual numbers Consider the situation General concepts of the chance of getting sick if you were at the picnic And if you were not at the picnic
Think it through a bit 100 persons 20 of them are sick, 80 are fine. 1/5th and 4/5ths. What about the proportions who were at the picnic? If an independent event then 1/5 will be sick and 4/5 will be fine And the same for those not at the picnic
Upset stomach Not upset Totals At picnic 1/5 th of 254/5ths of 25 Not at picnic 1/5 th of 754/5ths of Totals
We call these the “expected” values Of course, we set up mechanisms so that we calculate these expected values automatically. For any cell, we take the total of the row times the total of the column and divide by the grand total.
Upset stomach Not upset Totals At picnic 25 Not at picnic 75 Totals
Compare “expected” frequencies with actual frequencies Upset stomach Not upset Totals At picnic 6 (5) 19 (20) 25 Not at picnic 14 (15) 61 (60) 75 Totals
Sum of Deviations How different is the actual from the expected Use a formula (E – A) 2 /E and take the sum The general form looks familiar – a deviation squared and divided
Hypothesis Test for 2 Use Table A8 Need a value for alpha And degrees of freedom For 2, degrees of freedom = (number of rows – 1) * (number of columns – 1) Calculate it for our example In our example, a 2 X 2 table, df = 1
The test For alpha = 0.05 and df = 1, 2 crit = 3.84 Our calculated 2 was 2.07 We have failed to reject the null hypothesis. We have not proven an association between the potato salad, hot dogs & beer & the upset stomachs