# SADC Course in Statistics Analysis of Variance with two factors (Session 13)

## Presentation on theme: "SADC Course in Statistics Analysis of Variance with two factors (Session 13)"— Presentation transcript:

SADC Course in Statistics Analysis of Variance with two factors (Session 13)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to understand and interpret the components of a linear model with two categorical factors fit a model involving two factors, interpret the output and present the results understand the difference between raw means and adjusted means appreciate that a residual analysis is the same with more complex models

To put your footer here go to View > Header and Footer 3 Using Paddy again! In the paddy example, there were two categorical factors, variety and village. Here we will look at a model including both factors and the corresponding output. We will also discuss assumptions associated with anova models with categorical factors and procedures to check these assumptions.

To put your footer here go to View > Header and Footer 4 A model using two factors Objective here is to compare paddy yields across the 3 varieties and also across villages. A linear model for this takes the form: y ij = 0 + v i + g j + ij Here 0 represents a constant, and the g j (i=1,2,3) represent the variety effect as before. We also have the term v i (i=1,2,3,4) to represent the village effect.

To put your footer here go to View > Header and Footer 5 Anova results Sourced.f.S.S.M.S.FProb. Village313.914.6414.00.000 Variety225.6812.8438.70.000 Residual309.950.3318 Total3549.55 Above is a two-way anova since there are two factors explaining the variability in paddy yields. Again the Residual M.S. (s 2 ) = 0.3318 describes the variation not explained by village and variety.

To put your footer here go to View > Header and Footer 6 Sample sizes Above shows data is not balanced. Hence need to worry about the order of fitting terms. How then should we interpret the sequential S.S.s shown in slide 5 anova? --------+-----------------------+------- | Variety | Village | New Old Trad | Total --------+-----------------------+------- KESEN | 0 3 4 | 7 NANDA | 2 7 5 | 14 NIKO | 0 2 3 | 5 SABEY | 2 5 3 | 10 --------+-----------------------+------- Total | 4 17 15 | 36 --------+-----------------------+-------

To put your footer here go to View > Header and Footer 7 Anova with adjusted SS and MS Sourced.f.Adj.S.S.Adj.M.S.FProb. Village34.321.444.340.012 Variety225.6812.8438.70.000 Residual339.950.3318 Total3549.55 How may the above results be interpreted? What are your conclusions?

To put your footer here go to View > Header and Footer 8 Model estimates ParameterCoeff.Std.errortt prob 0 :constant 5.2840.38613.70.000 v 2 (Nanda) 0.7180.2722.630.013 v 3 (Niko)-0.1790.337-0.530.599 v 4 (Sabey) 0.6330.2942.160.039 g 2 (old)-1.2010.327-3.670.001 g 3 (trad)-2.6140.340-7.680.000 What do these results tell us?

To put your footer here go to View > Header and Footer 9 Relating estimates to means Again:Old - New =-1.201 =Estimate of g 2 Trad - New =-2.614 =Estimate of g 3 This is similar to the case with one categorical factor – can make comparisons easily with the base level using model estimates. But when sample sizes are unequal across the two categorical factors, results should be reported in terms of adjusted means!

To put your footer here go to View > Header and Footer 10 Raw means and adjusted means SampleRawStd.error VarietySize(n)Means (s.d./n) New improved4 5.960.128 Old improved17 4.540.173 Traditional15 3.000.168 VarietyAdjusted means Std.error (s/n) New improved5.580.308 Old improved4.380.148 Traditional2.960.150 Model based summaries (adjusted means):

To put your footer here go to View > Header and Footer 11 Computing adjusted means The model equation y ij = 0 + v i + g j + ij can be used to find the variety adjusted means e.g. adjusted mean for traditional variety is: = 5.284+0.25[0+0.718–0.179+0.633]–2.614 = 2.963 Thus the variety adjusted mean is an average over the 4 villages.

To put your footer here go to View > Header and Footer 12 Checking model assumptions Anova model with two categorical factors is: y ij = 0 + g i + v j + ij Model assumptions are associated with the ij. These are checked in exactly the same way as before. A residual analysis is done, looking at plots of residuals in various ways. We give below a residual analysis for the model fitted above.

To put your footer here go to View > Header and Footer 13 Histogram to check normality Histogram of standardised residuals after fitting a model of yield on village and variety.

To put your footer here go to View > Header and Footer 14 A normal probability plot… Another check on the normality assumption Do you think the points follow a straight line?

To put your footer here go to View > Header and Footer 15 Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers: What can you say here about the variance homogeneity assumption?

To put your footer here go to View > Header and Footer 16 Finally… know your software Different software packages impose different constraints on model parameters so need to be aware what this is. For example, Stata and Genstat set the first level of the factor to zero. SPSS and SAS set the last level to zero. Minitab imposes a constraint that sets the sum of the parameter estimates to zero! Check also whether the software produces sequential or adjusted or some other form of sums of squares. The correct interpretation of anova results would depend on this.

To put your footer here go to View > Header and Footer 17 Practical work follows to ensure learning objectives are achieved…