# © 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups.

## Presentation on theme: "© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups."— Presentation transcript:

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups

© 2008 McGraw-Hill Higher Education Analysis of Variance (ANOVA) ANOVA is used to compare three or more group means Instead of comparing each group mean to the others (as with a t-test), ANOVA compares each group mean to the grand mean, which is the mean for all cases in the sample

© 2008 McGraw-Hill Higher Education Main Effects In ANOVA, the difference between each group mean and the grand mean is a test effect, which are called main effects When the main effects are zero, this indicates that there are no differences among the means

© 2008 McGraw-Hill Higher Education The ANOVA Hypothesis Test For the ANOVA test, the H 0 states that the population means of the groups are equal The H 0 can also be stated as “the main effects are equal to zero,” or “there is no difference among the means”

© 2008 McGraw-Hill Higher Education The Idea Behind ANOVA ANOVA hypothesizes about differences among means, but its calculation is based on explaining variance around the grand mean E.G., suppose that the overall or “grand” mean of socioeconomic status (SES) of all household heads is 45. Urban residents, however, average 50. The 5-point difference we call the main effect of the category urban

© 2008 McGraw-Hill Higher Education The Idea Behind ANOVA (cont.) Shaneka, an urban dweller, scores 60. This is 15 SES points more than the grand mean of 45. This 15 SES points is her deviation score, the difference between her raw score and the overall mean ANOVA determines whether it is feasible to say that 5 SES points of her 15-point deviation score are due to the fact that she is an urban resident

© 2008 McGraw-Hill Higher Education The Idea Behind ANOVA (cont.) The focus with ANOVA is on explaining deviation scores Deviation scores when squared, summed, and averaged for a group of scores make up the variance. Hence the name “analysis of variance”

© 2008 McGraw-Hill Higher Education The Idea Behind ANOVA (cont.) With ANOVA we are asserting that the spread of scores is due to the main effects of the groups, as illustrated in Figure 12-2 in the text Can scores be explained by differences between group classifications? If so, then scores will cluster around group means rather than the grand mean, and this suggests a difference among means

© 2008 McGraw-Hill Higher Education The General Linear Model The general linear model is a useful framework for understanding ANOVA The general linear model states that the best prediction of an individual’s score on a dependent variable is the overall mean plus an adjustment for the effects of group membership on an independent variable

© 2008 McGraw-Hill Higher Education Applying the General Linear Model For Shaneka, the urban resident with a SES of 60, we decompose her score into 45 points for the grand mean and 5 points explained by urban resident (the main effect of urban). The remaining 10 points are unexplained error

© 2008 McGraw-Hill Higher Education Calculating ANOVA Statistics ANOVA calculations are summarized in a source table To obtain variances, we calculate three parts of the variation (or sums of squares) of the interval/ratio dependent variable and divide them by degrees of freedom

© 2008 McGraw-Hill Higher Education Sums of Squares The three types of sums of squares for ANOVA are: 1.the total sum of squares (SS T ) 2.the between-group or “explained” sum of squares (SS B ), and 3.the within-group or unexplained sum of squares (SS W )

© 2008 McGraw-Hill Higher Education Calculating the SS T The total sum of squares (SS T ) is calculated by summing the squared deviation scores for all cases The SS T is the same sum of squares calculated for the standard deviation (Chapter 5)

© 2008 McGraw-Hill Higher Education Calculating the SS B The between-group or explained sum of squares (SS B ) is calculated by squaring the main effect of each case and summing these squares The SS B is explained in the sense that it is accounted for by differences among the group means, as measured by main effects

© 2008 McGraw-Hill Higher Education Calculating the SS W The within-group or unexplained sum of squares (SS W ) is that part of the squared deviation scores that is not accounted for by main effects. It is unexplained error in the prediction of scores The SS W is most easily calculated by subtracting the between-group sum of squares from the total sum of squares

© 2008 McGraw-Hill Higher Education Calculating the Mean Square Variance (MSV) After sums of squares are computed, to account for sample size and the number of groups, these sums are divided by their degrees of freedom. The resulting variances are called mean square variances (MSV) MSW B = the mean square variance between groups MSW W = the mean square variance within groups

© 2008 McGraw-Hill Higher Education Calculating the F-Ratio Test Statistic The test statistic for ANOVA is the F-ratio statistic This is the ratio of the mean square variance between groups to the mean square variance within groups: F = MSV B / MSV W The p-value is determined using F- distribution curves, Appendix B, Tables D and E

© 2008 McGraw-Hill Higher Education When to Use the F-ratio Test In general, we use ANOVA and the F- ratio when testing a hypothesis between a nominal/ordinal independent variable with three or more categories, and an interval/ratio dependent variable ANOVA is a difference of means test and a cousin of the t-test

© 2008 McGraw-Hill Higher Education When to Use the F-ratio Test (cont.) 1. Number of variables, samples, and populations: a) One population with a single interval/ratio dependent variable, comparing means for three or more groups of a single nominal/ordinal independent variable. Each group’s sample must be representative of its subpopulation, or b) a single interval/ratio dependent variable whose mean is compared for three or more populations using representative samples

© 2008 McGraw-Hill Higher Education When to Use the F-ratio Test (cont.) 2) Sample size: generally no requirements. However, the dependent interval/ratio variable should not be highly skewed within any group sample. Moreover, range tests are unreliable unless sample sizes of groups are about equal. These restrictions are less important when group sample sizes are large

© 2008 McGraw-Hill Higher Education When to Use the F-ratio Test (cont.) 3) Variances (and standard deviations) of the groups are equal. This is the same restraint for the t-test (see equality of variances, Chapter 11)

© 2008 McGraw-Hill Higher Education Existence and Direction of the Relationship for ANOVA Existence: Determined by using the F-ratio to test the null hypothesis of equal group means Direction: Not applicable (because the independent variable is nominal)

© 2008 McGraw-Hill Higher Education Strength of the Relationship for ANOVA Strength: A strong relationship is one in which a high proportion of the total variance in the dependent interval/ratio variable is accounted for by the group variable The correlation ratio, ε 2 (epsilon squared) is a conservative measure that is unlikely to overinflate the strength of the relationship

© 2008 McGraw-Hill Higher Education Nature of the Relationship for ANOVA To assess the nature for ANOVA: 1)Make best estimates at the group level by reporting the grand mean, group means, and main effects 2)Provide examples of best estimates for individuals using the general linear model 3)Use range tests to specify which group means are significantly different from others

© 2008 McGraw-Hill Higher Education Range Tests With ANOVA, rejection of the null hypothesis merely indicates that at least two group means are significantly different Range tests determine which means differ, by establishing the range of differences between means that is statistically significant Tukey’s Highly Significant Difference (HSD) is a conservative range test, unlikely to mistakenly tell us that a difference exists when in fact it does not

© 2008 McGraw-Hill Higher Education Statistical Follies Care must be taken not to apply a group finding to individuals The “ecological fallacy,” drawing conclusions about individuals on the basis of analysis of group units, such as communities, is an extreme case of misapplying statistical findings

Download ppt "© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 12: Analysis of Variance: Differences among Means of Three or More Groups."

Similar presentations