# SADC Course in Statistics A model for comparing means (Session 12)

## Presentation on theme: "SADC Course in Statistics A model for comparing means (Session 12)"— Presentation transcript:

SADC Course in Statistics A model for comparing means (Session 12)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to understand and interpret the components of a linear model for comparing means make comparisons from an examination of the parameter estimates via t-tests describe assumptions associated with a linear model for two categorical factors conduct a residual analysis to check model assumptions

To put your footer here go to View > Header and Footer 3 A model for the paddy data Consider again the objective of comparing paddy yields across the 3 varieties. A linear model for this data takes the form: y ij = 0 + g i + ij, i = 1, 2, 3 Here 0 represents a constant, and the g i represent the variety effect. Estimates of 0 and g i can be obtained with appropriate software.

To put your footer here go to View > Header and Footer 4 A model for the paddy data Graph showing the model: y ij = 0 + g i + ij, i = 1, 2, 3 Grand mean=4.06 Mean value for old improved variety New imp Old impTraditional

To put your footer here go to View > Header and Footer 5 Model estimates and anova Sourced.f.S.S.M.S.FProb. Variety235.27817.63940.80.000 Residual3314.2690.4324 Total3549.547 ParameterCoeff.Std.errortt prob Constant 5.9600.32918.10.000 Old impro.-1.4160.365-3.880.000 Traditional-2.9600.370-8.000.000 What do these results tell us?

To put your footer here go to View > Header and Footer 6 Graph showing model again y ij = 0 + g i + ij, i = 1, 2, 3 Mean for variety i = constant + g i = 5.96 + g i, where g 1 = 0, g 2 = -1.416, g 3 = -2.96 New imp Old imp Traditional Mean value of new improved variety at 5.96

To put your footer here go to View > Header and Footer 7 Relating estimates to means Note:Old - New =-1.42 =Estimate of g 2 Trad - New =-2.96 =Estimate of g 3 VarietyMeansStd.error New improved 5.960.329 Old improved 4.540.159 Traditional 3.000.170 Thus comparison with the first level becomes easy – and t-tests (slide 5) can be interpreted as comparisons with this level.

To put your footer here go to View > Header and Footer 8 Other comparisons VarietyMeansStd.error New improved 5.960.329 Old improved 4.540.159 Traditional 3.000.170 How do we compare old with traditional? First note (using parameter estimates) that Old-Trad= (Old-New)-(Trad-New) = g 2 -g 3 = - 1.416 - (-2.960) = 1.544 This is the same as the difference in means between the two varieties (see below).

To put your footer here go to View > Header and Footer 9 Finding the standard error Var-covar: 0 g2g2 g3g3 0 0.1081-0.1081 g2g2 0.1335 0.1081 g3g3 0.1369 But how can the std. error be found? For this, the variance-covariance matrix between parameter estimates is needed, (see below) followed by some computations. Variances are the diagonal elements, co- variances are the off-diagonal elements

To put your footer here go to View > Header and Footer 10 Computing the standard error Need Var(g 2 -g 3 ) = Var(g 2 )+Var(g 3 )-2covar(g 2,g 3 ) = 0.1335 + 0.1369 – 2(0.1081) = 0.0542 Hence, std error(g 2 -g 3 ) = 0.0542 = 0.2328 So t-test for the comparison will be t = 1.544 / 0.2328 = 6.63, which is clearly a highly significant result. So clear evidence of a difference between old improved and traditional varieties.

To put your footer here go to View > Header and Footer 11 Model Assumptions Anova model with one categorical factors is: y ij = 0 + g i + ij As in linear regression, it is assumed that this model is linear. Additionally, the i are assumed to be independent, with zero mean and constant variance 2, and be normally distributed. Note: As before, values predicted for y ij are called fitted values.

To put your footer here go to View > Header and Footer 12 Checking Model Assumptions Model assumptions are checked in exactly the same way as for regression analysis. A residual analysis is done, looking at plots of residuals in various ways. Such procedures are the same when modelling any quantitative response using a model linear in its unknown parameters. We give below a residual analysis for the model fitted above.

To put your footer here go to View > Header and Footer 13 Histogram to check normality Histogram of standardised residuals after fitting a model of yield on variety.

To put your footer here go to View > Header and Footer 14 A normal probability plot… Another check on the normality assumption Do you think the points follow a straight line?

To put your footer here go to View > Header and Footer 15 Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers: Is this plot satisfactory? The straight vertical lines appear because variety has just 3 distinct values.

To put your footer here go to View > Header and Footer 16 Conclusions There was little indication to doubt any of the assumptions associated with the model. There was clear evidence that the varieties differed in terms of the corresponding mean paddy yields. The new improved variety gave highest production, showing an increase of 1.42 tonnes/ha with confidence interval (0.67, 2.16) over the old improved variety. Least production was with the traditional variety.

To put your footer here go to View > Header and Footer 17 Practical work follows to ensure learning objectives are achieved…

Similar presentations