Topic 25: Inference for Two-Way ANOVA
Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach
Data Response written Y ijk where –i denotes the level of the factor A –j denotes the level of the factor B –k denotes the k th observation in cell (i,j) i = 1,..., a levels of factor A j = 1,..., b levels of factor B k = 1,..., n observations in cell (i,j)
Cell means model Y ijk = μ ij + ε ijk –where μ ij is the theoretical mean or expected value of all observations in cell (i,j) –the ε ijk are iid N(0, σ 2 ) –This means Y ijk ~N(μ ij, σ 2 ) and independent
Factor effects model μ ij = μ + α i + β j + (αβ) ij Consider μ to be the overall mean α i is the main effect of A β j is the main effect of B (αβ) ij is the interaction between A and B
Constraints for this interpretation α. = Σ i α i = 0 (df = a-1) β. = Σ j β j = 0 (df = b-1) (αβ).j = Σ i (αβ) ij = 0 for all j (αβ) i. = Σ j (αβ) ij = 0 for all I df = (a-1)(b-1)
SAS GLM Constraints α a = 0 (1 constraint) β b = 0 (1 constraint) (αβ) aj = 0 for all j (b constraints) (αβ) ib = 0 for all i (a constraints) The total is 1+1+a+b-1=a+b+1 (the constraint (αβ) ab is counted twice in the last two bullets above)
Parameters and constraints The cell means model has ab parameters for the means The factor effects model has (1+a+b+ab) parameters –An intercept (1) –Main effect of A (a) –Main effect of B (b) –Interaction of A and B (ab)
Factor effects model There are 1+a+b+ab parameters There are 1+a+b constraints There are ab unconstrained parameters (or sets of parameters), the same number of parameters for the means in the cell means model While certain parameters depend on choice of constraints, others do not
KNNL Example KNNL p 833 Y is the number of cases of bread sold A is the height of the shelf display, a=3 levels: bottom, middle, top B is the width of the shelf display, b=2: regular, wide n=2 stores for each of the 3x2 treatment combinations
Proc GLM with solution proc glm data=a1; class height width; model sales=height width height*width /solution; means height*width; run;
Solution output Intercept 44.0 B height B height B height B width B width B
Solution output height*width B height*width B height*width B height*width B height*width B height*width B
Means height width Mean = = = = = = Based on estimates from previous two pages
Check normality Alternative way to form QQplot proc glm data=a1; class height width; model sales=height width height*width; output out=a2 r=resid; proc rank data=a2 out=a3 normal=blom; var resid; ranks zresid;
Normal Quantile plot proc sort data=a3; by zresid; symbol1 v=circle i=sm70; proc gplot data=a3; plot resid*zresid/frame; run;
The plot Note, dfE is only 6
ANOVA Table Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSE Error ab(n-1) SSE MSE _ Total abn-1 SSTO
Expected Mean Squares E(MSE) = σ 2 E(MSA) = σ 2 + nb( Σ i α i 2 )/(a-1) E(MSB) = σ 2 + na( Σ j β j 2 )/(b-1) E(MSAB) = σ 2 + n( Σ )/((a-1)(b-1)) Here, α i, β j, and (αβ) ij are defined with the usual factor effects constraints
An analytical strategy Run the model with main effects and the two-way interaction Plot the data, the means, and look at the normal quantile plot and residual plots If assumptions seem reasonable, check the significance of test for the interaction
AB interaction not sig If the AB interaction is not statistically significant –Possibly rerun the analysis without the interaction (See pooling §19.10) –Potential Type II errors when pooling –For a main effect with more than two levels that is significant, use the means statement with the Tukey multiple comparison procedure
GLM Output Source DF SS MS F Pr > F Model Error Total Note that there are 6 cells in this design.
Output ANOVA Type I or Type III Source DF SS MS F Pr > F height <.0001 width h*w Note Type I and Type III analyses are the same because cell size n is constant
Rerun without interaction proc glm data=a1; class height width; model sales=height width; means height / tukey lines; run;
ANOVA output Source DF MS F Pr > F height <.0001 width MS(height) and MS(width) have not changed. The MSE, F*’s, and P-values have because of pooling.
Comparison of MSEs Error Error Model with interaction Model without interaction Little change in MSE here…often only pool when df small
Pooling SS Data = Model + Residual When we remove a term from the `model’, we put this variation and the associated df into `residual’ This is called pooling A benefit is that we have more df for error and a simpler model Potential Type II errors Beneficial only in small experiments
Pooling SSE and SSAB For model with interaction SSAB=24, dfAB=2 SSE=62, dfE=6 MSE=10.33 For the model with main effects only SSE=62+24=86, dfE=6+2=8 MSE=10.75
Tukey Output Mean N height A B B B
Plot of the means
Regression Approach Similar to what we did for one-way Use a-1 variables for A Use b-1 variables for B Multiply each of the a-1 variables for A times each of the b-1 for B to get (a- 1)(b-1) for AB You can use the test statement in Proc reg to perform F tests
Create Variables data a4; set a1; X1 = (height eq 1) - (height eq 3); X2 = (height eq 2) - (height eq 3); X3 = (width eq 1) - (width eq 2); X13 = X1*X3; X23 = X2*X3;
Run Proc Reg proc reg data=a4; model sales= X1 X2 X3 X13 X23 / ss1; height: test X1, X2; width: test X3; interaction: test X13, X23; run;
SAS Output Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model Error Corrected Total Same basic ANOVA table
SAS Output Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t|Type I SS Intercept < X X < X X X
SS Results SS(Height) = SS(X1)+SS(X2|X1) 1544 = SS(Width) = SS(X3|X1,X2) 12 = 12 SS(Height*Width) = SS(X13|X1,X2,X3) + SS(X23|X1, X2,X3,X13) 24 =
Test Results Test height Results for Dependent Variable sales SourceDF Mean SquareF ValuePr > F Numerator <.0001 Denominator Test interaction Results for Dependent Variable sales SourceDF Mean SquareF ValuePr > F Numerator Denominator Test width Results for Dependent Variable sales SourceDF Mean SquareF ValuePr > F Numerator Denominator
Interpreting Estimates
Last slide Finish reading KNNL Chapter 19 Topic25.sas contains the SAS commands for these slides We will now focus more on the strategies needed to handle a two- or more factor ANOVA