Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anova and contingency tables

Similar presentations


Presentation on theme: "Anova and contingency tables"— Presentation transcript:

1 Anova and contingency tables
Week 12 Anova and contingency tables

2 Two categorical variables
Joint probabilities px,y = P(X=x and Y=y) proportion of popn with values (x, y) School performance and wt of children

3 Conditional probabilities
Proportion within row

4 Conditional probabilities
School performance and wt of children Weight & performance are independent

5 Independence from sample?
214 child skiers classified by skiing ability and whether they got injured Are ability and injury independent in underlying population?

6 Independence from sample?
Conditional sample proportions Is there independence in underlying population?

7 Testing for independence
Can a relationship observed in the sample data be inferred to hold in the population represented by the data? Could observed sample relationship have occurred by chance?

8 Expected counts — independence
31 out of 214 injured overall Expect 31/214 of the 80 beginners to be injured i.e. expect injured beginners

9 Expected counts — independence
General formula Injured Uninjured Beginner 80 Intermediate 93 Advanced 41 31 183 214

10 Observed and estimated counts
Injured Uninjured Beginner 20 (11.59) 60 (68.41) 80 Intermediate 9 (13.47) 84 (79.53) 93 Advanced 2 (5.94) 39 (35.06) 41 31 183 214 Are the differences more than would be expected by chance?

11 Chi-squared test of independence
H0: independence of injury & experience HA: association between injury & experience or equivalently H0: P(injury|beginner) = P(injury|intermediate) = ... HA: P(injury | experience) depends on experience Test statistic:

12 Chi-squared test of independence
Small values consistent with independence Big values arise when observed are very different from what would be expected under independence. p-value = Prob(2 as big as obtained) if indep Tail area of chi-squared distribution d.f. of chi-squared = (rows–1)(cols–1)

13 Chi-squared distributions
Skewed to the right distributions. Minimum value is 0. Indexed by the degrees of freedom.

14 Skiing injury and experience
Chi-Square Test: Injured, Uninjured Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Injured Uninjured Total Total Chi-Sq = , DF = 2, P-Value = 0.003 p-value = 0.003 Strong evidence that the chance of injury is related to experience.

15 Ear Infections and Xylitol
Experiment: n = 533 children randomized to 3 groups Group 1: Placebo Gum; Group 2: Xylitol Gum; Group 3: Xylitol Lozenge Response = Did child have an ear infection?

16 Ear Infections and Xylitol
Moderately strong evidence of differences between probs of infection

17 Making friends With whom do you find it easiest to make friend — opposite sex, same sex or no difference?

18 Making friends H0: No difference in distribution of responses of men and women (no relationship between gender & response) HA: Difference in distribution of responses of men and women (association between gender & response) Chi-Square Test: Opposite sex, Same sex, No difference Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Opposite sex Same sex No difference Total Total Chi-Sq = 8.515, DF = 2, P-Value = 0.014 Fairly strong evidence of difference between Females(1) & Males(2) Females more likely to choose opposite sex

19 Comparing means of 3+ groups
Do best students sit in the front of a classroom? Seat location and GPA for n = 384 students Students sitting in the front generally have slightly higher GPAs than others. Chance?

20 Seat location and GPA H0: m1 = m2 = m3
HA: The means are not all equal. p-value = Such big differences between sample means unlikely if popn means were same Extremely strong evidence that means are not all same.

21 Seat location and GPA 95% CIs for separate means:
Main difference seems to be between front and others

22 Assumptions for F-test
Independent random samples. Normal distribution within each population. Perhaps different population means. Same standard deviation,  in each group. Can still proceed if n is big or assumptions approx hold

23 F ratio More evidence of a real difference when: How do you measure:
Group means are far apart Variability within groups is small How do you measure: Variation between means? Variation within groups?

24 Variation between means
Between-groups sum of squares Mean sum of squares for groups (k groups):

25 Variation within groups
Within-groups sum of squares Residual sum of squares Also called residual sum of squares Mean residual sum of squares: Best estimate of error st devn, :

26 Total variation Total sum of squares = SSTotal
SSTotal = SSGroups + SSError

27 Analysis of variance table
Anova table F test is based on F ratio p-value = Prob of such a high F ratio if all means same (p-value found from an ‘F distribution’)

28 Seat location and GPA (again)
H0: m1 = m2 = m3 HA: The means are not all equal. p-value = P(F ≥ 6.69) under H0 = Such a big F ratio unlikely if popn means were same Extremely strong evidence that means are not all same.


Download ppt "Anova and contingency tables"

Similar presentations


Ads by Google