Presentation on theme: "SADC Course in Statistics Analysis of Variance for comparing means (Session 11)"— Presentation transcript:
SADC Course in Statistics Analysis of Variance for comparing means (Session 11)
To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to understand and interpret the components of an anova table for comparing means present the results following an anova in terms of an appropriate summary table make simple comparisons across pairs of levels of an explanatory categorical variable
To put your footer here go to View > Header and Footer 3 Comparing two groups Recall from Module H2 that the means of two population sub-groups with respect to a quantitative measurement of interest can be compared using a t-test. For example, we could compare the mean poverty levels, or the mean land area owned by households across urban and rural areas. Or we could compare household size, or the household dependency ratio across male headed and female headed households.
To put your footer here go to View > Header and Footer 4 Comparing more than two groups In the above examples, just two groups were being compared. Can we extend these ideas to a comparison across more than two groups? This is possible through use of an analysis of variance (anova). We have met an anova already, but the objective and hypothesis are different here!
To put your footer here go to View > Header and Footer 5 Objectives addressed: Some examples of questions to be answered: Is the average income of households in Malawi the same across its three regions? Is the average length of the rainy season the same across different districts? Have interventions to control the incidence of malaria (mean number of cases per 1000 of population) been equally effective across three areas in Zambia where controls were put in place?
To put your footer here go to View > Header and Footer 6 An example – Paddy data again! Suppose farmers want to know which variety of rice to grow in order to maximise their yields… There are three varieties to choose from, i.e. new improved, old improved and traditional. The null hypothesis to be tested is: H 0 : means are the same across all varieties; versus the alternative hypothesis H 1 : all variety means are not equal some difference somewhere…
To put your footer here go to View > Header and Footer 7 Using anova to compare means As in a simple linear regression, the anova splits the overall variation in y (here y=rice yields) into two components: variation due to differences in means residual variation, i.e. variation not due to variety possible differences. H 0 is tested by comparing the two components of variation above. A large variance ratio is evidence against H 0.
To put your footer here go to View > Header and Footer 8 Anova table - interpretation Sourced.f.S.S.M.S.FProb. Variety235.27817.63940.80.000 Residual3314.2690.4324 Total3549.547 2 d.f. for variety since it reflects variation between 3 varieties. Res. M.S. 0.4324 is the balance, or unexplained component of variation in yields. It represents variation between farmers within varieties.
To put your footer here go to View > Header and Footer 9 Anova table - results Sourced.f.S.S.M.S.FProb. Variety235.27817.63940.80.000 Residual3314.2690.4324 Total3549.547 F-ratio of 40.8 on (2,33) d.f. is highly significant (p-value=0.000). This indicates strong evidence to reject H 0.
To put your footer here go to View > Header and Footer 10 Presentation of results Results are presented in terms of the variety means and their standard errors. VarietyMeanStd.error95% C.I. New improved5.960.329(5.29, 6.63) Old improved4.540.159(4.22, 4.87) Traditional3.000.170(2.65, 3.35) Overall4.060.110(3.84, 4.28)
To put your footer here go to View > Header and Footer 11 Conclusions The anova results indicate clear evidence that varieties differ in terms of their yields. The best one is the new improved variety, giving a mean yield of about 4 tonnes per hectare, (95% confidence limits ranging from 5.3 to 6.6). The traditional variety does poorly in comparison with the other two varieties, yielding only about 3 tonnes per hectare.
To put your footer here go to View > Header and Footer 12 Further comparisons The anova is only a first step in the analysis. If the F-ratio is significant, proceed further to see where the actual differences occur. Do this using t-tests… For example, to compare the mean yields for new and old improved varieties, first calculate Difference in means = 1.416 Standard error of the difference given by ______________________________________ = 0.4324/[ (1/4)+(1/17) ] = 0.365 where n i is no. of obs. for variety i, and s 2 is the residual mean square from anova
To put your footer here go to View > Header and Footer 13 t-test for comparing means Then find the t-statistic given by t = difference in means/(std. error of diff) Here, t=1.416/0.365 = 3.88 Compare this with t-tables with 33 d.f. (since std. error of difference is based on anova residual mean square) Result is significant at the 0.1% sig. level Conclude: Strong evidence of a difference
To put your footer here go to View > Header and Footer 14 Other comparisons There may be other comparisons of interest. For example, comparing the mean of the traditional variety with improved varieties. Here the comparison of interest is: A t-test may again be used by computing the t-statistic as value above divided by its standard error, and comparing the result with a t-distribution with d.f.=Residual d.f.
To put your footer here go to View > Header and Footer 15 Practical work follows to ensure learning objectives are achieved…