# Multisample inference: Analysis of Variance

## Presentation on theme: "Multisample inference: Analysis of Variance"— Presentation transcript:

Multisample inference: Analysis of Variance
Chapter 12 Multisample inference: Analysis of Variance EPI809/Spring 2008

Learning Objectives 1. Describe Analysis of Variance (ANOVA)
2. Explain the Rationale of ANOVA 3. Compare Experimental Designs 4. Test the Equality of 2 or More Means Completely Randomized Design Randomized Block Design Factorial Design As a result of this class, you will be able to ... EPI809/Spring 2008

Analysis of Variance A analysis of variance is a technique that partitions the total sum of squares of deviations of the observations about their mean into portions associated with independent variables in the experiment and a portion associated with error EPI809/Spring 2008

Analysis of Variance The ANOVA table was previously discussed in the context of regression models with quantitative independent variables, in this chapter the focus will be on nominal independent variables (factors) EPI809/Spring 2008

Analysis of Variance A factor refers to a categorical quantity under examination in an experiment as a possible cause of variation in the response variable. EPI809/Spring 2008

Analysis of Variance Levels refer to the categories, measurements, or strata of a factor of interest in the experiment. EPI809/Spring 2008

Types of Experimental Designs
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. Completely Randomized Randomized Block Factorial One-Way Anova Two-Way Anova EPI809/Spring 2008 24

Completely Randomized Design
EPI809/Spring 2008 9 17

Completely Randomized Design
1. Experimental Units (Subjects) Are Assigned Randomly to Treatments Subjects are Assumed Homogeneous 2. One Factor or Independent Variable 2 or More Treatment Levels or groups 3. Analyzed by One-Way ANOVA EPI809/Spring 2008

One-Way ANOVA F-Test Tests the Equality of 2 or More (p) Population Means Variables One Nominal Independent Variable One Continuous Dependent Variable Note: There is one dependent variable in the ANOVA model. MANOVA has more than one dependent variable. Ask, what are nominal & interval scales? EPI809/Spring 2008

One-Way ANOVA F-Test Assumptions
1. Randomness & Independence of Errors 2. Normality Populations (for each condition) are Normally Distributed 3. Homogeneity of Variance Populations (for each condition) have Equal Variances EPI809/Spring 2008

One-Way ANOVA F-Test Hypotheses
H0: 1 = 2 = 3 = ... = p All Population Means are Equal No Treatment Effect Ha: Not All j Are Equal At Least 1 Pop. Mean is Different Treatment Effect NOT 1  2  ...  p EPI809/Spring 2008

One-Way ANOVA F-Test Hypotheses
H0: 1 = 2 = 3 = ... = p All Population Means are Equal No Treatment Effect Ha: Not All j Are Equal At Least 1 Pop. Mean is Different Treatment Effect NOT 1 = 2 = ... = p Or i ≠ j for some i, j. f(X) X = = 1 2 3 f(X) X = 1 2 3 EPI809/Spring 2008

One-Way ANOVA Basic Idea
1. Compares 2 Types of Variation to Test Equality of Means 2. If Treatment Variation Is Significantly Greater Than Random Variation then Means Are Not Equal 3.Variation Measures Are Obtained by ‘Partitioning’ Total Variation EPI809/Spring 2008

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. EPI809/Spring 2008 84

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. EPI809/Spring 2008 85

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. Variation due to treatment EPI809/Spring 2008 86

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. Variation due to treatment Variation due to random sampling EPI809/Spring 2008 87

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. Variation due to treatment Variation due to random sampling Sum of Squares Among Sum of Squares Between Sum of Squares Treatment Among Groups Variation EPI809/Spring 2008 88

One-Way ANOVA Partitions Total Variation
Variation due to Random Sampling are due to Individual Differences Within Groups. Variation due to treatment Variation due to random sampling Sum of Squares Among Sum of Squares Between Sum of Squares Treatment (SST) Among Groups Variation Sum of Squares Within Sum of Squares Error (SSE) Within Groups Variation EPI809/Spring 2008 89

Total Variation Y Response, Y Group 1 Group 2 Group 3
EPI809/Spring 2008

Treatment Variation Y3 Y Y2 Y1 Response, Y Group 1 Group 2 Group 3
EPI809/Spring 2008

Random (Error) Variation
Response, Y Y3 Y2 Y1 Group 1 Group 2 Group 3 EPI809/Spring 2008

One-Way ANOVA F-Test Test Statistic
F = MST / MSE MST Is Mean Square for Treatment MSE Is Mean Square for Error 2. Degrees of Freedom 1 = p -1 2 = n - p p = # Populations, Groups, or Levels n = Total Sample Size EPI809/Spring 2008

One-Way ANOVA Summary Table
Source of Degrees Sum of Mean F Variation of Squares Square Freedom (Variance) n = sum of sample sizes of all populations. c = number of factor levels All values are positive. Why? (squared terms) Degrees of Freedom & Sum of Squares are additive; Mean Square is NOT. Treatment p - 1 SST MST = MST SST/(p - 1) MSE Error n - p SSE MSE = SSE/(n - p) Total n - 1 SS(Total) = SST+SSE EPI809/Spring 2008

One-Way ANOVA F-Test Critical Value
If means are equal, F = MST / MSE  1. Only reject large F! Reject H Do Not Reject H F F a ( p 1 , n p ) Always One-Tail! © T/Maker Co. EPI809/Spring 2008

One-Way ANOVA F-Test Example
As a vet epidemiologist you want to see if 3 food supplements have different mean milk yields. You assign 15 cows, 5 per food supplement. Question: At the .05 level, is there a difference in mean yields? Food1 Food2 Food EPI809/Spring 2008

One-Way ANOVA F-Test Solution
H0: 1 = 2 = 3 Ha: Not All Equal  = .05 1 = 2 2 = 12 Critical Value(s): Test Statistic: Decision: Conclusion: MST 23 . 5820 F 25 . 6 MSE . 9211 Reject at  = .05  = .05 There Is Evidence Pop. Means Are Different F 3.89 EPI809/Spring 2008

Summary Table Solution
Source of Degrees of Sum of Mean F Variation Freedom Squares Square (Variance) Food 3 - 1 = 2 25.60 Error = 12 .9211 Total = 14 EPI809/Spring 2008

SAS CODES FOR ANOVA proc anova; /* or PROC GLM */ class group;
Data Anova; input group\$ milk cards; food food food food food food food food food food food food food food food ; run; proc anova; /* or PROC GLM */ class group; model milk=group; EPI809/Spring 2008

SAS OUTPUT - ANOVA Sum of
Source DF Squares Mean Square F Value Pr > F Model <.0001 Error Corrected Total EPI809/Spring 2008

Pair-wise comparisons
Needed when the overall F test is rejected Can be done without adjustment of type I error if other comparisons were planned in advance (least significant difference - LSD method) Type I error needs to be adjusted if other comparisons were not planned in advance (Bonferroni’s and scheffe’s methods) EPI809/Spring 2008

Fisher’s Least Significant Difference (LSD) Test
To compare level 1 and level 2 Compare this to t/2 = Upper-tailed value or - t/2 lower-tailed from Student’s t-distribution for /2 and (n - p) degrees of freedom MSE = Mean square within from ANOVA table n = Number of subjects p = Number of levels EPI809/Spring 2008

Bonferroni’s method To compare level 1 and level 2
Adjust the significance level α by taking the new significance level α* EPI809/Spring 2008

SAS CODES FOR multiple comparisons
proc anova; class group; model milk=group; means group/ lsd bon; run; EPI809/Spring 2008

SAS OUTPUT - LSD t Tests (LSD) for milk
NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha Error Degrees of Freedom Error Mean Square Critical Value of t = t.975,12 Least Significant Difference Means with the same letter are not significantly different. t Grouping Mean N group A food1 B food2 C food3 EPI809/Spring 2008

SAS OUTPUT - Bonferroni
Bonferroni (Dunn) t Tests for milk NOTE: This test controls the Type I experimentwise error rate Alpha Error Degrees of Freedom Error Mean Square Critical Value of t =t1-0.05/3/2,12 Minimum Significant Difference Means with the same letter are not significantly different. Bon Grouping Mean N group A food1 B food2 C food3 EPI809/Spring 2008

Randomized Block Design
EPI809/Spring 2008

Randomized Block Design
1. Experimental Units (Subjects) Are Assigned Randomly within Blocks Blocks are Assumed Homogeneous 2. One Factor or Independent Variable of Interest 2 or More Treatment Levels or Classifications 3. One Blocking Factor EPI809/Spring 2008

Randomized Block Design
Factor Levels: (Treatments) A, B, C, D Experimental Units  Treatments are randomly assigned within blocks Block 1 A C D B Block 2 Block 3 . Block b Are the mean training times the same for 3 different methods? 9 subjects 3 methods (factor levels) EPI809/Spring 2008 78

Randomized Block F-Test
1. Tests the Equality of 2 or More (p) Population Means 2. Variables One Nominal Independent Variable One Nominal Blocking Variable One Continuous Dependent Variable Note: There is one dependent variable in the ANOVA model. MANOVA has more than one dependent variable. Ask, what are nominal & interval scales? EPI809/Spring 2008

Randomized Block F-Test Assumptions
1. Normality Probability Distribution of each Block-Treatment combination is Normal 2. Homogeneity of Variance Probability Distributions of all Block-Treatment combinations have Equal Variances EPI809/Spring 2008

Randomized Block F-Test Hypotheses
H0: 1 = 2 = 3 = ... = p All Population Means are Equal No Treatment Effect Ha: Not All j Are Equal At Least 1 Pop. Mean is Different Treatment Effect 1  2  ...  p Is wrong EPI809/Spring 2008

Randomized Block F-Test Hypotheses
H0: 1 = 2 = ... = p All Population Means are Equal No Treatment Effect Ha: Not All j Are Equal At Least 1 Pop. Mean is Different Treatment Effect 1  2  ...  p Is wrong f(X) X = = 1 2 3 f(X) X = 1 2 3 EPI809/Spring 2008

The F Ratio for Randomized Block Designs
SS=SSE+SSB+SST

Randomized Block F-Test Test Statistic
F = MST / MSE MST Is Mean Square for Treatment MSE Is Mean Square for Error 2. Degrees of Freedom 1 = p -1 2 = n – b – p +1 p = # Treatments, b = # Blocks, n = Total Sample Size EPI809/Spring 2008

Randomized Block F-Test Critical Value
If means are equal, F = MST / MSE  1. Only reject large F! Reject H Do Not Reject H F F a ( p 1 , n p ) Always One-Tail! © T/Maker Co. EPI809/Spring 2008

Randomized Block F-Test Example
You wish to determine which of four brands of tires has the longest tread life. You randomly assign one of each brand (A, B, C, and D) to a tire location on each of 5 cars. At the .05 level, is there a difference in mean tread life? Tire Location Block Left Front Right Front Left Rear Right Rear Car 1 A: 42,000 C: 58,000 B: 38,000 D: 44,000 Car 2 B: 40,000 D: 48,000 A: 39,000 C: 50,000 Car 3 C: 48,000 D: 39,000 B: 36,000 Car 4 A: 41,000 D: 42,000 C: 43,000 Car 5 D: 51,000 A: 44,000 C: 52,000 B: 35,000 EPI809/Spring 2008

Randomized Block F-Test Solution
H0: 1 = 2 = 3= 4 Ha: Not All Equal  = .05 1 = 3 2 = 12 Critical Value(s): Test Statistic: Decision: Conclusion: F = Reject at  = .05  = .05 There Is Evidence Pop. Means Are Different F 3.49 EPI809/Spring 2008

SAS CODES FOR ANOVA data block; input Block\$ trt\$ resp @@; cards;
Car1 A: Car1 C: Car1 B: Car1 D: 44000 Car2 B: Car2 D: Car2 A: Car2 C: 50000 Car3 C: Car3 D: Car3 B: Car3 A: 39000 Car4 A: Car4 B: Car4 D: Car4 C: 43000 Car5 D: Car5 A: Car5 C: Car5 B: 35000 ; run; proc anova; class trt block; model resp=trt block; Means trt /lsd bon; EPI809/Spring 2008

Dependent Variable: resp
SAS OUTPUT - ANOVA Dependent Variable: resp Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE resp Mean Source DF Anova SS Mean Square F Value Pr > F trt Block EPI809/Spring 2008

SAS OUTPUT - LSD Means with the same letter are not significantly different. t Grouping Mean N trt A C: B D: B C B A: C C B: EPI809/Spring 2008

SAS OUTPUT - Bonferroni
Means with the same letter are not significantly different. Bon Grouping Mean N trt A C: A B A D: B B C A: C C B: EPI809/Spring 2008

Factorial Experiments
EPI809/Spring 2008 9 17

Factorial Design 1. Experimental Units (Subjects) Are Assigned Randomly to Treatments Subjects are Assumed Homogeneous 2. Two or More Factors or Independent Variables Each Has 2 or More Treatments (Levels) 3. Analyzed by Two-Way ANOVA EPI809/Spring 2008

1. Saves Time & Effort e.g., Could Use Separate Completely Randomized Designs for Each Variable 2. Controls Confounding Effects by Putting Other Variables into Model 3. Can Explore Interaction Between Variables EPI809/Spring 2008

Two-Way ANOVA Tests the Equality of 2 or More Population Means When Several Independent Variables Are Used Same Results as Separate One-Way ANOVA on Each Variable But Interaction Can Be Tested EPI809/Spring 2008

Two-Way ANOVA Assumptions
1. Normality Populations are Normally Distributed 2. Homogeneity of Variance Populations have Equal Variances 3. Independence of Errors Independent Random Samples are Drawn EPI809/Spring 2008

Two-Way ANOVA Data Table
Factor Factor B A 1 2 ... b Observation k 1 Y Y ... Y 111 121 1b1 Yijk Y Y ... Y 112 122 1b2 2 Y Y ... Y 211 221 2b1 Y Y ... YX 212 222 2b2 Level i Factor A Level j Factor B : : : : : a Y Y ... Y a11 a21 ab1 Y Y ... Y a12 a22 ab2 EPI809/Spring 2008

Two-Way ANOVA Null Hypotheses
1. No Difference in Means Due to Factor A H0: 1. = 2. =... = a. 2. No Difference in Means Due to Factor B H0: .1 = .2 =... = .b 3. No Interaction of Factors A & B H0: ABij = 0 EPI809/Spring 2008

Two-Way ANOVA Total Variation Partitioning
SS(Total) Variation Due to Treatment A Variation Due to Treatment B SSB SSA Variation Due to Interaction Variation Due to Random Sampling SS(AB) SSE EPI809/Spring 2008

Two-Way ANOVA Summary Table
Source of Degrees of Sum of Mean F Variation Freedom Squares Square A a - 1 SS(A) MS(A) MS(A) (Row) MSE B b - 1 SS(B) MS(B) MS(B) (Column) MSE AB (a-1)(b-1) SS(AB) MS(AB) MS(AB) (Interaction) MSE Error n - ab SSE MSE Same as Other Designs Total n - 1 SS(Total) EPI809/Spring 2008

Interaction 1. Occurs When Effects of One Factor Vary According to Levels of Other Factor 2. When Significant, Interpretation of Main Effects (A & B) Is Complicated 3. Can Be Detected In Data Table, Pattern of Cell Means in One Row Differs From Another Row In Graph of Cell Means, Lines Cross EPI809/Spring 2008

Graphs of Interaction Effects of Gender (male or female) & dietary group (sv, lv, nor) on systolic blood pressure Interaction No Interaction Average Average Response male Response male female female sv lv nor sv lv nor EPI809/Spring 2008

Two-Way ANOVA F-Test Example
Effect of diet (sv-strict vegetarians, lv-lactovegetarians, nor-normal) and gender (female, male) on systolic blood pressure. Question: Test for interaction and main effects at the .05 level. EPI809/Spring 2008

SAS CODES FOR ANOVA data factorial; input dietary\$ sex\$ sbp; cards;
sv male sv male sv male sv male sv male sv male sv female 102.6 sv female 99 sv female 83 .6 sv female 99.6 sv female 112.6 lv male 116.5 lv male 118.5 lv male 119.5 lv male 110.5 lv male 115.5 lv male 105.2 nor male 128.3 nor male 129.3 nor male 126.3 nor male 127.3 nor male 125.3 nor female 119.1 nor female 119.2 nor female 115.6 nor female 119.9 nor female 119.8 nor female 119.7 ; run; EPI809/Spring 2008

SAS CODES FOR ANOVA proc glm; class dietary sex;
model sbp=dietary sex dietary*sex; run; model sbp=dietary sex; EPI809/Spring 2008

Dependent Variable: sbp
SAS OUTPUT - ANOVA Dependent Variable: sbp Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total R-Square Coeff Var Root MSE sbp Mean Source DF Type I SS Mean Square F Value Pr > F dietary sex dietary*sex Source DF Type III SS Mean Square F Value Pr > F dietary sex dietary*sex EPI809/Spring 2008

Linear Contrast Linear Contrast is a linear combination of the means of populations Purpose: to test relationship among different group means with Example: 4 populations on treatments T1, T2, T3 and T4. Contrast T T T T4 relation to test L μ1 - μ3 = 0 L / / μ1 – μ2/2 – μ3/2 = 0 EPI809/Spring 2008

T-test for Linear Contrast (LSD)
Construct a t statistic involving k group means. Degrees of freedom of t - test: df = n-k. Construct To test H0: Compare with critical value t1-α/2,, n-k. Reject H0 if |t| ≥ t1-α/2,, n-k. SAS uses contrast statement and performs an F – test df (1, n-k); Or estimate statement and perform a t-test df (n-k). EPI809/Spring 2008

T-test for Linear Contrast (Scheffe)
Construct multiple contrasts involving k group means. Trying to search for significant contrast Construct To test H0: Compare with critical value. Reject H0 if |t| ≥ a EPI809/Spring 2008

SAS Code for contrast testing
proc glm; class trt block; model resp=trt block; Means trt /lsd bon scheffe; contrast 'A - B = 0' trt ; contrast 'A - B/2 - C/2 = 0' trt ; contrast 'A - B/3 - C/3 -D/3 = 0' trt ; contrast 'A + B - C - D = 0' trt ; lsmeans trt/stderr pdiff; lsmeans trt/stderr pdiff adjust=scheffe; /* Scheffe's test */ lsmeans trt/stderr pdiff adjust=bon; /* Boneferoni's test */ estimate ‘A - B' trt ; run; EPI809/Spring 2008

Regression representation of Anova
EPI809/Spring 2008

Regression representation of Anova
One-way anova: Two-way anova: SAS uses a different constraint EPI809/Spring 2008

Regression representation of Anova
One-way anova: Dummy variables of factor with p levels This is the parameterization used by SAS EPI809/Spring 2008

Conclusion: should be able to
1. Recognize the applications that uses ANOVA 2. Understand the logic of analysis of variance. 3. Be aware of several different analysis of variance designs and understand when to use each one. 4. Perform a single factor hypothesis test using analysis of variance manually and with the aid of SAS or any statistical software. EPI809/Spring 2008

Conclusion: should be able to
5. Conduct and interpret post-analysis of variance pairwise comparisons procedures. 6. Recognize when randomized block analysis of variance is useful and be able to perform the randomized block analysis. 7. Perform two factor analysis of variance tests with replications using SAS and interpret the output. EPI809/Spring 2008

Key Terms Between-Sample Variation Levels One-Way Analysis of Variance
Completely Randomized Design Experiment-Wide Error Rate Factor Levels One-Way Analysis of Variance Total Variation Treatment Within-Sample Variation EPI809/Spring 2008