Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani K. mallick1 STAT 651 Lecture #14.

Similar presentations


Presentation on theme: "Copyright (c) Bani K. mallick1 STAT 651 Lecture #14."— Presentation transcript:

1 Copyright (c) Bani K. mallick1 STAT 651 Lecture #14

2 Copyright (c) Bani K. mallick2 Topics in Lecture #14 The Kruskal Wallis Test Review of ANOVA Theory for the ANOVA Table

3 Copyright (c) Bani K. mallick3 Book Sections Covered in Lecture #14 Chapter 8.6

4 Copyright (c) Bani K. mallick4 Relevant SPSS Tutorials Kruskal-Wallis

5 Copyright (c) Bani K. mallick5 What you need for the ANOVA Table Error sum of squares (SSE) Corrected Total sum of squares (CTSS) Between sum of squares: SSB = CTSS – SSE Number of populations t: df 1 = t – 1 Total number of observations n T : df 2 = n T –t Critical value from Table 8: F for  and df 1 and df 2

6 Copyright (c) Bani K. mallick6 The ANOVA Table: General Linear Model Sum of Square s Degrees of freedom Mean Square F for equal means Variable Name SSB df 1 = t – 1 SSB/(t-1) ------------ SSE/( n T –t) Error SSE df 2 = n T –t SSE/( n T –t) Corrected Total CTSS n T – 1

7 Copyright (c) Bani K. mallick7 What you need for the ANOVA Table df 1 = t – 1 df 2 = n T –t Critical value from Table 8: F for  and df 1 and df 2 You compute the F statistic, and reject the hypothesis that the population means are equal at Type I error probability  if the F statistic exceeds this tabulated value

8 Copyright (c) Bani K. mallick8 Illustration: What you need for the ANOVA Table df 1 = t – 1 = 4 df 2 = n T –t = 25  = 0.05 Critical value from Table 8: F for  and df 1 and df 2 = 2.76. You reject the hypothesis that the population means are equal at Type I error probability  if the F statistic exceeds 2.76

9 Copyright (c) Bani K. mallick9 Nonparametric Methods As we found for 1-sample comparisons and 2- sample comparisons, there are also nonparametric methods for ANOVA These are often called Kruskal-Wallis methods The idea is the same as in the 2-sample problem Remember it?

10 Copyright (c) Bani K. mallick10 Nonparametric Methods Replace each observation by its rank in the pooled data Do the usual ANOVA F-test

11 Copyright (c) Bani K. mallick11 Female Concho Water Snakes, Ages 2-4, Tail Length We need a method that allows for non- normal data!

12 Copyright (c) Bani K. mallick12 Concho Water Snake Data for Females Kruskal-Wallis Test

13 Copyright (c) Bani K. mallick13 Concho Water Snake Data for Females

14 Copyright (c) Bani K. mallick14 Nonparametric Methods Once you have decided that the populations are different in their means, there is no version of a LSD You simply have to do each comparison in turn This is a bit of a pain in SPSS, because you physically must do each 2-population comparison, defining the groups as you go

15 Copyright (c) Bani K. mallick15 Nonparametric Methods Illustrate Kruskal Wallis in SPSS and then remind ourselves about how to do the 2- population comparisons

16 Copyright (c) Bani K. mallick16 Lipids Research Study Four populations: Healthy, non-smokers Healthy, smokers CHD, non-smokers CHD, Smokers Compared on basis of cholesterol levels

17 Copyright (c) Bani K. mallick17 Lipid Research Study

18 Copyright (c) Bani K. mallick18 Lipid Research Study

19 Copyright (c) Bani K. mallick19 Lecture 14 Review: Residuals Testing for Normality in ANOVA I use the General Linear Model to define these residuals Form the residuals, which are simply the differences of the data with their group mean Then do a q-q plot Useful if you have many groups with a small number of observations per group

20 Copyright (c) Bani K. mallick20 Lipid Research Study

21 Copyright (c) Bani K. mallick21 Lipid Research Study: Note Table Entries Tests of Between-Subjects Effects Dependent Variable: Cholesterol 13450.180 a 34483.3932.738.043 12637872.4112637872.427716.573.000 13450.18034483.3932.738.043 545373.1263331637.757 17709567.0337 558823.306336 Source Corrected Model Intercept NCHDSMOK Error Total Corrected Total Type III Sum of SquaresdfMean SquareFSig. R Squared =.024 (Adjusted R Squared =.015) a.

22 Copyright (c) Bani K. mallick22 Lipid Research Study Kruskal-Wallis Test Test Statistics a,b 7.360 3.061 Chi-Square df Asymp. Sig. Cholesterol Kruskal Wallis Test a. Grouping Variable: CHD/Smoking Category b.

23 Copyright (c) Bani K. mallick23 Lipids Research Study The p-values are 0.043 for ANOVA, 0.061 for Kruskal Wallis. Weakish evidence of population differences The Q-Q plot was pretty normal, so I would probably go with the smaller p-value and publish, but with some warnings. BTW, what hypothesis were we testing??

24 Copyright (c) Bani K. mallick24 Lipids Research Study The p-values are 0.043 for ANOVA, 0.061 for Kruskal Wallis. Weakish evidence of population differences BTW, what hypothesis were we testing?? That the population means for the 4 populations were all simultaneously equal.

25 Copyright (c) Bani K. mallick25 Lipids Research Study: Fisher LSD Based on observed means. The mean difference is significant at the.05 level. *. Healthy Nonsmokers differ from Heart Disease Smokers Healthy and CHD patients differ among smokers

26 Copyright (c) Bani K. mallick26 Lipids Research Study Fisher’s LSD suggested that Healthy, Non- smokers has significantly less Healthy, Smokers (p = 0.011) P-value for Wilcoxon rank sum test is 0.024

27 Copyright (c) Bani K. mallick27 Variances The ANOVA method is remarkably robust Although it assumes that the populations have equal population variances, as long as the sample sizes are reasonably close, it is not much affected by unequal variances Of course, the sample variances will be different: why?

28 Copyright (c) Bani K. mallick28 Variances It is still possible to compare variances Realistically, if you are intrinsically interested in whether populations have the same variances or not, you should consult a statistician However, there is a version of the Levene test that can be computed from SPSS. It uses the same algorithm as in the 2-population case

29 Copyright (c) Bani K. mallick29 Lipid Research Study: Variances?

30 Copyright (c) Bani K. mallick30 Variances The IQR are 57, 44, 56, 74 I don’t see a massive inequality of variability The medians are 220, 225, 225, 232

31 Copyright (c) Bani K. mallick31 Lipid Research Study: Variances? You get this under “Options” in the ANOVA Run

32 Copyright (c) Bani K. mallick32 Concho Water Snakes

33 Copyright (c) Bani K. mallick33 Concho Water Snakes

34 Copyright (c) Bani K. mallick34 Concho Water Snakes


Download ppt "Copyright (c) Bani K. mallick1 STAT 651 Lecture #14."

Similar presentations


Ads by Google