Download presentation
Presentation is loading. Please wait.
Published byMeghan Willis Modified over 9 years ago
1
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics alahmed@ksu.edu.sa (011) 4674108
2
Introduction Analysis of variance (ANOVA), as the name implies, is a statistical technique that is intended to analyze variability in data in order to infer the inequality among population means The purpose of ANOVA is much the same as the t- tests presented in the preceding sections. The goal is to determine whether the mean differences that are obtained for sample data are sufficiently large to justify a conclusion that there are mean differences between the populations from which the samples were obtained. 2
3
The difference between ANOVA and the t-tests is that ANOVA can be used in situations where there are more than two means being compared, whereas the t-tests are limited to situations where only two means are involved. If more than two means are compared, repeated use of the independent-samples t-test will lead to a higher Type I error rate than the level set for each t-test. (multiple comparison problem) 3
4
The basic ANOVA situation Variables in ANOVA: –Dependent variable is metric. –Independent variable(s) is nominal with two or more levels – also called treatment, manipulation, or factor. One-Way ANOVA: –Two variables: 1 Categorical, 1 Quantitative –Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical variable) the individual is in? –If categorical variable has only 2 values: 2-sample t-test –ANOVA allows for 3 or more groups. 4
5
AVOVA Hypotheses The null hypothesis: – The means for all groups are the same (equal). H 0 : 1 = 2 = ………. = k The alternative hypothesis: – The means are different for at least one pair of groups. H 1 : 1 2 ………. k If we reject H 0 How do you determine which means are significantly different? 5
6
Before we begin, we must consider the assumptions required to use ANOVA –The underlying distributions of the populations are normal. –The variance of each group is equal (This is critical for ANOVA). 6
7
If all of the groups had the same means, the distributions for all of the populations would look exactly the same (overlaid graphs) 7 Now, if the means of the populations were different, the picture would look like this. Notice that the variability between the groups is much greater than within a group
8
Sources of variance When we take samples from each group, there will be two sources of variability: –Within group variability - when we sample from a group there will be variability from person to person in the same group –Between group variability – the difference from group to group If the between group variability is large, the means of the groups are likely not to bethe same We can use the two types of variability to determine if the means are likely different 8
9
Blue arrow: within group, red arrow: between group Notice that when the distribution are separate, the between group variability is much greater than the within group 9
10
Notation for ANOVA All groups: n = number of individuals all together K = number of groups = mean for entire data set is Group i has: n i = # of individuals in group k x ij = value for individual j in group k = mean for group k s i = standard deviation for group k 10
11
Sources of variability ANOVA measures two sources of variation in the data and compares their relative sizes: 1. variation BETWEEN groups: for each data value look at the difference between its group mean and the overall mean. 2. variation WITHIN groups: for each data value we look at the difference between that value and the mean of its group. 11
12
F-statistic The F-statistic assesses whether you can conclude that statistical differences are present somewhere between the group means. The F-statistic is a ratio of the Between Group Variation divided by the Within Group Variation: This test statistic is compared to an F-table with k-1 and n-k degrees of freedom A large F is evidence against H 0, since it indicates that there is more difference between groups than within groups. 12
13
ANOVA Table Source of variation SSdfMSFp-value BetweenSS B k - 1MS B MS B /MS W WithinSS W n - kMS W TotalSS T n - 1 13 Total sum of squares (SS T ) Within group sum of squares (SS W ) Between group sum of squares (SS B ) + =
14
Example A researcher wishes to try three different techniques to lower the blood pressure of individuals diagnosed with high blood pressure. The subjects are randomly assigned to three groups; the first group takes medication, the second group exercises, and the third group follows a special diet. After four weeks, the reduction in each person’s blood pressure is recorded. 14
15
The data are: 15 At α = 0.05, test the claim that there is no difference among the means The hypotheses to be tested are: H 0 : μ 1 = μ 2 = μ 3 H 1 : At least one mean is different from the others
16
Analyze → Compare Means → One-Way Anova In the One-Way ANOVA menu window, place “Bp” in the Dependent List box and “Treatment” in the Factor box, 16
17
To complete the process described in the text, select OK in this window without doing anything else. The resulting output is the ANOVA table shown below. 17 As mentioned in the text, this result allows us only to conclude that at least one (true) treatment mean differs from the others; we can say nothing about the relative sizes of the (true) treatment means Further tests can be performed to determine which treatment mean(s) differ and, consequently, determine which (true) treatment mean(s) might have the highest (or lowest) values since p <α, the null hypothesis is rejected.
18
Verifying the Assumptions for the One- Way ANOVA F-test The assumptions for the one-way ANOVA F-test, as expressed in in the text, are: 1.The populations from which the samples were obtained must be normally or approximately normally distributed. 2.The samples must be independent of one another 3.The variances of the populations must be equal 18
19
Assessing the normality and constant variance assumptions 19 Analyze → Descriptive Statistics → Explore
20
This table provides results of the test of the following hypotheses: H 0 : The population random variable is normally distributed H 1 : The population random variable is not normally distributed An ideal Normal QQ Plot will have plotted points that appear to approximately fit a linear trend; If the error bars are close to each other in length, as appears to be the case here, one might expect the constant variance assumption to be approximately valid. 20
21
The second table of use is that of the “Test of Homogeneity of Variances” shown below The test to use here is the one that is “Based on Median”. This table provides results of the test of the following hypotheses: H 0 : The population variances are equal H 1 : The population variances are not equal. The p-value given in the last column is sufficiently large to conclude that the assumption of constant variances should not be rejected – The constant variance assumption may be assumed valid. Verifying the validity of the independence assumption: The validity of the independence assumption can be difficult to assess. The best approach is to ensure that the independence of the samples is ensured by proper sampling and data collection practices. 21
22
Pair-wise Comparisons of Treatment Means Where’s the Difference? –Once ANOVA indicates that the groups do not all appear to have the same means, what do we do? –We can do pair wise comparisons to determine which specific means are different, but we must still take into account the problem with multiple comparisons! 22
23
click on the Post Hoc 23
24
We conclude that there is a significant difference between the medication group and exercise group, but no difference between the medication and diet and exercise and diet. 24
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.