Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of variance Tron Anders Moger 2006.31.10.

Similar presentations


Presentation on theme: "Analysis of variance Tron Anders Moger 2006.31.10."— Presentation transcript:

1 Analysis of variance Tron Anders Moger 2006.31.10

2 Comparing more than two groups Up to now we have studied situations with –One observation per object One group Two groups –Two or more observations per object We will now study situations with one observation per object, and three or more groups of objects The most important question is as usual: Do the numbers in the groups come from the same population, or from different populations?

3 ANOVA If you have three groups, could plausibly do pairwise comparisons. But if you have 10 groups? Too many pairwise comparisons: You would get too many false positives! You would really like to compare a null hypothesis of all equal, against some difference ANOVA: ANalysis Of VAriance

4 One-way ANOVA: Example Assume ”treatment results” from 13 patients visiting one of three doctors are given: –Doctor A: 24,26,31,27 –Doctor B: 29,31,30,36,33 –Doctor C: 29,27,34,26 H 0 : The means are equal for all groups (The treatment results are from the same population of results) H 1 : The means are different for at least two groups (They are from different populations)

5 Comparing the groups Averages within groups: –Doctor A: 27 –Doctor B: 31.8 –Doctor C: 29 Total average: Variance around the mean matters for comparison. We must compare the variance within the groups to the variance between the group means.

6 Variance within and between groups Sum of squares within groups: Compare it with sum of squares between groups: Comparing these, we also need to take into account the number of observations and sizes of groups

7 Adjusting for group sizes Divide by the number of degrees of freedom Test statistic: reject H 0 if this is large Both are estimates of population variance of error under H 0 n: number of observations K: number of groups

8 Test statistic thresholds If populations are normal, with the same variance, then we can show that under the null hypothesis, MSG and MSW are Chi- square distributed with K-1 and n-K d.f. Reject at confidence level if The F distribution, with K-1 and n-K degrees of freedom Find this value in table p. 871

9 Continuing example Thus we can NOT reject the null hypothesis in our case.

10 ANOVA table Source of variation Sum of squares Deg. of freedom Mean squares F ratio Between groups SSGK-1MSG Within groups SSWn-KMSW TotalSSTn-1 NOTE:

11 Formulation of the model: H 0 : µ 1 =µ 2 =…=µ K X ij =µ i +ε ij Let G i be the difference between the group means and the population mean. Then: G i =µ i -µ of µ i =µ+G i Giving X ij =µ+G i +ε ij And H 0 : G 1 =G 2 =…=G K =0

12 One-way ANOVA in SPSS Last column: The p-value: The smallest value of at which the null hypothesis is rejected.

13 One-way ANOVA in SPSS: Analyze - Compare Means - One-way ANOVA Move dependent variable to Dependent list and group to Factor Choose Bonferroni in the Post Hoc window to get comparisons of all groups Choose Descriptive and Homogeneity of variance test in the Options window

14 Energy expenditure example: Let us say we have measurements of energy expenditure in three independent groups: Anorectic, lean and obese Want to test H 0 : Energy expenditure is the same for anorectic, lean and obese Data for anorctic: 5.40, 6.23, 5.34, 5.76, 5.99, 6.55, 6.33, 6.21

15 SPSS output: See that there is a difference between groups. See also between which groups the difference is!

16 Conclusion: There is a significant overall difference in energy expenditure between the three groups (p-value<0.001) There are also significant differences for all two-by-two comparisons of groups

17 The Kruskal-Wallis test ANOVA is based on the assumption of normality There is a non-parametric alternative not relying this assumption: –Looking at all observations together, rank them –Let R 1, R 2, …,R K be the sums of ranks of each group –If some R’s are much larger than others, it indicates the numbers in different groups come from different populations

18 The Kruskal-Wallis test The test statistic is Under the null hypothesis, this has an approximate distribution. The approximation is OK when each group contains at least 5 observations.

19 Example: previous data Doctor ADoctor BDoctor C 24 (rank 1)29 (rank 6.5) 26 (rank 2.5)31 (rank 9.5)27 (rank 4.5) 31 (rank 9.5)30 (rank 8)34 (rank 12) 27 (rank 4.5)36 (rank 13)26 (rank 2.5) 33 (rank 11) R 1 =17.5R 2 =48R 3 =25.5 (We really have too few observations for this test!)

20 Kruskal-Wallis in SPSS Use ”Analyze=>Nonparametric tests=>K independent samples” For our data, we get

21 For the energy data: Same result as for one-way ANOVA! Reject H 0

22 When to use what method In situations where we have one observation per object, and want to compare two or more groups: –Use non-parametric tests if you have enough data For two groups: Mann-Whitney U-test (Wilcoxon rank sum) For three or more groups use Kruskal-Wallis –If data analysis indicate assumption of normally distributed independent errors is OK For two groups use t-test (equal or unequal variances assumed) For three or more groups use ANOVA

23 When to use what method When you in addition to the main observation have some observations that can be used to pair or block objects, and want to compare groups, and assumption of normally distributed independent errors is OK: –For two groups, use paired-data t-test –For three or more groups, we can use two-way ANOVA

24 Two-way ANOVA (without interaction) In two-way ANOVA, data fall into categories in two different ways: Each observation can be placed in a table. Example: Both doctor and type of treatment should influence outcome. Sometimes we are interested in studying both categories, sometimes the second category is used only to reduce unexplained variance (like an independent variable in regression!). Then it is called a blocking variable Compare means, just as before, but for different groups and blocks

25 Data from exercise 17.46: Three types of aptitude tests (K=3) given to prospective management trainers Each test type is given to members of each of four groups of subjects (H=4): Profile fit, Mindbender, Psych Out Test type Subject typeProfile fitMindbenderPsych Out Poor656975 Fair747270 Good646878 Excellent837876

26 Sums of squares for two-way ANOVA Assume K groups, H blocks, and assume one observation x ij for each group i and each block j, so we have n=KH observations (independent!). –Mean for category i: –Mean for block j: –Overall mean: Model: X ij =µ+G i +B j +ε ij

27 Sums of squares for two-way ANOVA

28 ANOVA table for two-way data Source of variation Sums of squares Deg. of freedom Mean squaresF ratio Between groupsSSGK-1MSG= SSG/(K-1)MSG/MSE Between blocksSSBH-1MSB= SSB/(H-1)MSB/MSE ErrorSSE(K-1)(H-1)MSE= SSE/(K-1)(H-1) TotalSSTn-1 Test for between groups effect: compare to Test for between blocks effect: compare to

29 Two-way ANOVA (with interaction) The setup above assumes that the blocking variable influences outcomes in the same way in all categories (and vice versa) We can check if there is interaction between the blocking variable and the categories by extending the model with an interaction term Need more observations per block Other advantages: More precise estimates

30 Data from exercise 17.46 cont’d: Each type of test was given three times for each type of subject Test type Subject typeProfile fitMindbenderPsych Out Poor65 68 6269 71 6775 75 78 Fair74 79 7672 69 6970 69 65 Good64 72 6568 73 7578 82 80 Excellent83 82 8478 78 7576 77 75

31 Sums of squares for two-way ANOVA (with interaction) Assume K groups, H blocks, and assume L observations x ij1, x ij2, …,x ijL for each category i and each block j block, so we have n=KHL observations (independent!). –Mean for category i: –Mean for block j: –Mean for cell ij: –Overall mean: Model: X ijl =µ+G i +B j +I ij +ε ijl

32 Sums of squares for two-way ANOVA (with interaction)

33 ANOVA table for two-way data (with interaction) Source of variation Sums of squares Deg. of freedom Mean squaresF ratio Between groupsSSGK-1MSG= SSG/(K-1)MSG/MSE Between blocksSSBH-1MSB= SSB/(H-1)MSB/MSE InteractionSSI(K-1)(H-1)MSI= SSI/(K-1)(H-1) MSI/MSE ErrorSSEKH(L-1)MSE= SSE/KH(L-1) TotalSSTn-1 Test for interaction: compare MSI/MSE with Test for block effect: compare MSB/MSE with Test for group effect: compare MSG/MSE with

34 Two-way ANOVA in SPSS Analyze->General Linear Model-> Univariate Move dependent variable (Score) to Dependent Variable Move test type and subject type to Fixed Factor(s) Under Options, may check Descriptive Statistics and Homogeneity Tests, and also get two-by-two comparisons by checking Bonferroni under Post Hoc Gives you a full model (with interaction)

35 Some SPSS output: See that there is a significant block effect, significant group effect, and a significant interaction effect Means (in plain words) that test score is different for subject types, for the three tests, and that difference for test type depends on what block you consider Equal variances can be assumed

36 Two-by-two comparisons

37 Notes on ANOVA All analysis of variance (ANOVA) methods are based on the assumptions of normally distributed and independent errors The same problems can be described using the regression framework. We get exactly the same tests and results! There are many extensions beyond those mentioned In fact, the book only briefly touches this subject More material is needed in order to do two-way ANOVA on your own

38 Next time: How to design a study? Different sampling methods Research designs Sample size considerations


Download ppt "Analysis of variance Tron Anders Moger 2006.31.10."

Similar presentations


Ads by Google