Download presentation

Presentation is loading. Please wait.

1
**ANOVA (Analysis of Variance)**

Martina Litschmannovรก K210

2
**The basic ANOVA situation**

Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in? If categorical variable has only 2 values: null hypothesis: ๐ 1 โ ๐ 2 =๐ท ANOVA allows for 3 or more groups

3
**An example ANOVA situation**

Subjects: 25 patients with blisters Treatments: Treatment A, Treatment B, Placebo Measurement: # of days until blisters heal Data [and means]: A: 5,6,6,7,7,8,9, [7.25] B: 7,7,8,9,9,10,10, [8.875] P: 7,9,9,10,10,10,11,12,13 [10.11] Are these differences significant?

4
**Informal Investigation**

Graphical investigation: side-by-side box plots multiple histograms Whether the differences between the groups are significant depends on the difference in the means the standard deviations of each group the sample sizes ANOVA determines p-value from the F statistic

5
Side by Side Boxplots

6
What does ANOVA do? At its simplest (there are extensions) ANOVA tests the following hypotheses: H0: The means of all the groups are equal. HA: Not all the means are equal. Doesnโt say how or which ones differ. Can follow up with โmultiple comparisonsโ. Note: We usually refer to the sub-populations as โgroupsโ when doing ANOVA.

7
**Assumptions of ANOVA each group is approximately normal**

check this by looking at histograms and/or normal quantile plots, or use assumptions can handle some nonnormality, but not severe outliers test of normality standard deviations of each group are approximately equal rule of thumb: ratio of largest to smallest sample st. dev. must be less than 2:1 test of homoscedasticity

8
**Normality Check We should check for normality using:**

assumptions about population histograms for each group normal quantile plot for each group test of normality (Shapiro-Wilk, Liliefors, Anderson- Darling test, โฆ) With such small data sets, there really isnโt a really good way to check normality from data, but we make the common assumption that physical measurements of people tend to be normally distributed.

9
Shapiro-Wilk test One of the strongest tests of normality. [Shapiro, Wilk] Online computer applet (Simon Ditto, 2009) for this test can be found here.

10
**Standard Deviation Check**

Variable treatment N Mean Median StDev days A B P Compare largest and smallest standard deviations: largest: 1,764 smallest: 1,458 1,764/1,458=1,210<2 OK Note: Std. dev. ratio greather then 2 signs heteroscedasticity.

11
**ANOVA Notation Number of Individuals all together : ๐= ๐=1 ๐ ๐ ๐ ,**

Group 1 2 โฆ k Sample ๐ 11 ๐ 12 ๐ 1๐ โฎ ๐ ๐ 1 1 ๐ ๐ 2 2 ๐ ๐ ๐ ๐ Sample Size ๐ 1 ๐ 2 ๐ ๐ Sample average ๐ 1 ๐ 2 ๐ ๐ Sample Std. Deviation ๐ 1 ๐ 2 ๐ ๐ Number of Individuals all together : ๐= ๐=1 ๐ ๐ ๐ , Sample means: ๐ ๐ = 1 ๐ ๐ ๐=1 ๐ ๐ ๐ ๐๐ , Grand mean: ๐ = 1 ๐ ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ , Sample Standard Deviations: ๐ ๐ 2 = 1 ๐ ๐ โ1 ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ ๐

12
**Levene Test Null and alternative hypothesis:**

H0: ๐ 1 2 = ๐ 2 2 =โฏ= ๐ ๐ 2 , HA: ๐ป 0 Test Statistic: ๐น ๐ = ๐๐ ๐๐ต ๐โ1 ๐๐ ๐๐ ๐โ๐ , where ๐ ๐๐ = ๐ ๐๐ โ ๐ ๐ , ๐ ๐ = ๐=1 ๐ ๐ ๐ ๐๐ ๐ ๐ , ๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ ๐ , ๐๐ ๐๐ต = ๐=1 ๐ ๐ ๐ ๐ ๐ โ ๐ 2 , ๐๐ ๐๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ ๐ p-value:ย ๐โ๐ฃ๐๐๐ข๐=1โ ๐น 0 ๐ฅ ๐๐ต๐ , where ๐น 0 ๐ฅ is CDF of Fisher-Snedecor distribution with ๐โ1, ๐โ๐ degrees of freedom.

13
**How ANOVA works (outline)**

ANOVA measures two sources of variation in the data and compares their relative sizes.

14
**How ANOVA works (outline)**

Sum of Squares between Groups, ย ๐๐ ๐ต = ๐=1 ๐ ๐ ๐ ๐ ๐ โ ๐ 2 , ย resp. Mean of Squares โ between groups ๐๐ ๐ต = ๐๐ ๐ต ๐โ1 , where ๐โ1 is degrees of freedom ๐๐ ๐ต . Sum of Squares โ errors ๐๐ ๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ ๐ 2 = ๐=1 ๐ ๐ ๐ โ1 ๐ ๐ 2 , ย resp. Mean of squares - error ๐๐ ๐ = ๐๐ ๐ ๐โ๐ , where ๐โ๐ is degrees of freedom ๐๐ ๐ . Difference between Means Difference within Groups

15
The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within Group Variation: ๐น= ๐๐ ๐ต ๐๐ ๐ A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.

16
**ANOVA Output Source of Variation SS DF MS F p-value Between Groups**

34,74 2 17,37 6,45 0,006 Within Groups 59,26 22 2,69 Total 94,00 24

17
**How are these computations made?**

Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups ๐๐ ๐ต = ๐=1 ๐ ๐ ๐ ๐ ๐ โ ๐ 2 ๐๐ ๐ต =๐โ1 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups ๐๐ ๐ = ๐=1 ๐ ๐ ๐ โ1 ๐ ๐ 2 ๐๐ ๐ =๐โ๐ ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total ๐๐ ๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ 2 ๐๐ ๐ =๐โ1

18
Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Count Average Variance <35 let 53 25, , let , ,2775 >50 let 76 26, , Total , ,8971

19
Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Assumptions: Normality Homoskedasticita H0: ๐ 1 2 = ๐ 2 2 = ๐ 3 2 , HA: ๐ป 0 , ๐โ๐ฃ๐๐๐ข๐=0,129 (Levene test)

20
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Count Average Variance <35 let , ,3825 let , ,2775 >50 let , ,3393 Total , ,8971 ๐๐ ๐ต = ๐=1 ๐ ๐ ๐ ๐ ๐ โ ๐ 2 =53โ 25,1โ25, โ 25,9โ25, +76โ 26,1โ25,8 2 =34,0 ๐๐ ๐ = ๐=1 ๐ ๐ ๐ โ1 ๐ ๐ 2 =52โ10,4+122โ16,3+75โ12,3=3451,9

21
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups ๐๐ ๐ต = ๐=1 ๐ ๐ ๐ ๐ ๐ โ ๐ 2 ๐๐ ๐ต =๐โ1 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups ๐๐ ๐ = ๐=1 ๐ ๐ ๐ โ1 ๐ ๐ 2 ๐๐ ๐ =๐โ๐ ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total ๐๐ ๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ 2 ๐๐ ๐ =๐โ1

22
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 ๐๐ ๐ต =๐โ1 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups 3451,9 ๐๐ ๐ =๐โ๐ ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total ๐๐ ๐ = ๐=1 ๐ ๐=1 ๐ ๐ ๐ ๐๐ โ ๐ 2 ๐๐ ๐ =๐โ1

23
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 ๐๐ ๐ต =๐โ1 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups 3451,9 ๐๐ ๐ =๐โ๐ ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total 3485,9 ๐๐ ๐ =๐โ1

24
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 ๐๐ ๐ต =๐โ1 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups 3451,9 ๐๐ ๐ =๐โ๐ ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total 3485,9 ๐๐ ๐ =๐โ1 k โฆ number of sanmples ๐=3 n โฆ total sample si๐ง๐ ๐=252

25
Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 2 ๐๐ ๐ต = ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups 3451,9 249 ๐๐ ๐ = ๐๐ ๐ ๐๐ ๐ --- Total 3485,9 251 / = / =

26
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 2 17,0 ๐๐ ๐ต ๐๐ ๐ 1โ ๐น 0 ๐ฅ ๐๐ต๐ Within Groups 3451,9 249 13,9 --- Total 3485,9 251

27
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 2 17,0 1,23 0,294 Within Groups 3451,9 249 13,9 --- Total 3485,9 251 ๐โ๐ฃ๐๐๐ข๐=1โ๐น 1,23 =0,294 , where F(x) is CDF of Fisher-Snedecor distribution with 2 , 249 degrees of freedom

28
**Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 **

Using the results of exploratory analysis and ANOVA test, verify that the age of a statistically significant effect on BMI. Null and alternative hypothesis: H0: ๐ 1 = ๐ 2 = ๐ 3 , HA: ๐ป 0 Calculating of p-value: Result: We dont reject null hypothesis at the significance level 0,05. There is not a statistically significant difference between the means of BMI depended on the age. Source of Variation Sum of Squares Degrees of Freedom Mean of Squares ๐น ๐โ๐ฃ๐๐๐ข๐ Between Groups 34,0 2 17,0 1,23 0,294 Within Groups 3451,9 249 13,9 --- Total 3485,9 251

29
**Whereโs the Difference?**

Once ANOVA indicates that the groups do not all appear to have the same means, what do we do? Analysis of Variance for days Source DF SS MS F P treatmen Error Total Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev A ( * ) B ( * ) P (------* ) Pooled StDev = Clearest difference: P is worse than A (CIโs donโt overlap)

30
Multiple Comparisons Once ANOVA indicates that the groups do not all have the same means, we can compare them two by two using the 2-sample t test. We need to adjust our p-value threshold because we are doing multiple tests with the same data. There are several methods for doing this. If we really just want to test the difference between one pair of treatments, we should set the study up that way.

31
**Bonferroni method โ post hoc analysis**

We reject null hypothesis if ๐ฅ ๐ผ โ ๐ฅ ๐ฝ โฅ ๐ก ๐โ๐ 1โ ๐ผ โ ๐๐ ๐ ๐ ๐ผ ๐ ๐ฝ , where ๐ผ โ is correct significance level, ๐ผ โ = ๐ผ ๐ 2 , ๐ก 1โ ๐ผ โ 2 ๐โ๐ is 1โ ๐ผ โ 2 is quantile of Student distribution with ๐โ๐ degrees of freedom.

32
Kruskal-Wallis test The KruskalโWallis test is most commonly used when there is one nominal variable and one measurement variable, and the measurement variable does not meet the normality assumption of an ANOVA. It is the non-parametric analogue of a one-way ANOVA.

33
Kruskal-Wallis test Like most non-parametric tests, it is performed on ranked data, so the measurement observations are converted to their ranks in the overall data set: the smallest value gets a rank of 1, the next smallest gets a rank of 2, and so on. The loss of information involved in substituting ranks for the original values can make this a less powerful test than an anova, so the anova should be used if the data meet the assumptions. If the original data set actually consists of one nominal variable and one ranked variable, you cannot do an anova and must use the KruskalโWallis test.

34
**The farm bred three breeds of rabbits. An attempt was made (rabbits**

The farm bred three breeds of rabbits. An attempt was made (rabbits.xls), whose objective was to determine whether there is statistically significant (conclusive) the difference in weight between breeds of rabbits. Verify.

35
**The effects of three drugs on blood clotting was determined**

The effects of three drugs on blood clotting was determined. Among other indicators was determined the thrombin time. Information about the 45 monitored patients are recorded in the file thrombin.xls. Does the magnitude of thrombin time depend on the used preparation?

36
Study materials : (p p.154) Shapiro, S.S., Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika. 1965, roฤ. 52, ฤ. 3/4, s Dostupnรฉ z:

Similar presentations

OK

Comparing Three or More Means ANOVA (One-Way Analysis of Variance)

Comparing Three or More Means ANOVA (One-Way Analysis of Variance)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Hindi ppt on pollution Ppt on effect of global warming on weather center Free download ppt on food security in india Ppt on art of war book Ppt on sources of energy for class 8th result Ppt on regional trade agreements in germany Ppt on instrument landing system mark Ppt on synthesis and degradation of purines and pyrimidines structure Ppt on 600 mw generators Water level indicator using seven segment display ppt online