 # SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

## Presentation on theme: "SADC Course in Statistics Comparing Means from Independent Samples (Session 12)"— Presentation transcript:

SADC Course in Statistics Comparing Means from Independent Samples (Session 12)

To put your footer here go to View > Header and Footer 2 Learning Objectives By the end of this session, you will be able to explain how means from two populations may be compared describe the assumptions associated with the independent samples t-test interpret computer output from a two-sample t-test present and write up conclusions resulting from such tests explain the difference between statistical significance and an important result

To put your footer here go to View > Header and Footer 3 An example: Comparing 2 means AgricNon-agric 156 223 282131 222137 172146 183130 206122 210141 198192 199188 211212 As part of a health survey, cholesterol levels of men in a small rural area were measured, including those working in agriculture and those employed in non- agricultural work. Aim: To see if mean cholesterol levels were different between the two groups.

To put your footer here go to View > Header and Footer 4 Summary statistics Begin with summarising each column of data. AgricNon-agric Mean= 203.9162.2 Std. dev. = 33.937.6 Variance =11471412 There appears to be a substantial difference between the two means. Our question of interest is: Is this difference showing a real effect, or could it merely be a chance occurrence?

To put your footer here go to View > Header and Footer 5 Setting up the hypotheses To answer the question, we set up: Null hypothesis H 0 : no difference between the two groups (in terms of mean response), i.e.  1 =  2 Alternative hypothesis H 1 : there is a difference, i.e.  1   2 The resulting test will be two-sided since the alternative is “not equal to”.

To put your footer here go to View > Header and Footer 6 Test for comparing means Use a two-sample (unpaired) t-test - appropriate with 2 independent samples Assumptions - normal distributions for each sample - constant variance (so test uses a pooled estimate of variance) - observations are independent Procedure - assess how large the difference in means is, relative to the noise in this difference, i.e. the std. error of the difference.

To put your footer here go to View > Header and Footer 7 Test Statistic where s 2, the pooled estimate of variance, is given by The test statistic is:

To put your footer here go to View > Header and Footer 8 Numerical Results The pooled estimate of variance, is : = 1279.5 Hence the t-statistic is: = 41.7/(2x1279.5/10) = 2.61, based on 18 d.f. Comparing with tables of t 18, this result is significant at the 2% level, so reject H 0. Note: The exact p-value = 0.018

To put your footer here go to View > Header and Footer 9 Presenting the results For comparisons, should report: - difference between means - s.e. of difference in means - 95% confidence interval for true diff. In addition, may report for each group: - mean - s.e. of each mean - sample size for each mean Conclusions will then follow…

To put your footer here go to View > Header and Footer 10 Results and conclusions Difference of means: 41.7 Standard error of difference: 15.99 95% confidence interval for difference in means: (8.09, 75.3). Conclusions: There is some evidence (p=0.018) that the mean cholesterol levels differ between those working in agriculture and others. The difference in means is 42 mg/dL with 95% confidence interval (8.1, 75.3).

To put your footer here go to View > Header and Footer 11 Significance ideas again! e.g. Farmers report that using a fungicide increased crop yields by 2.7 kg ha -1, s.e.m.=0.41 This gave a t-statistic of 6.6 (p-value<0.001) Recall that the p-value is the probability of rejecting the null hypothesis when it is true. i.e. it is the chance of error in your conclusion that there is an effect due to fungicide!

To put your footer here go to View > Header and Footer 12 How important are sig. tests? In relation to the example on the previous slide, we may find one of the following situations for different crops. Mean yields: with and without fungicide. 589.9 587.2  Not an important finding! 9.9 7.2  Very important finding! It is likely that in the first of these results, either too much replication or the incorrect level of replication had been used (e.g. plant level variation, rather than plot level variation used to compare means).

To put your footer here go to View > Header and Footer 13 What does non-significance tell us e.g. There was insufficient evidence in the data to demonstrate that using a fungicide had any effect on plant yields (p=0.128). Mean yields: with and without fungicide. 157.2 89.9 This difference may be an important finding, but the statistical analysis was unable to pick up this difference as being statistically significant. HOW CAN THIS HAPPEN? Too small a sample size? High variability in the experimental material? One or two outliers? All sources of variability not identified?

To put your footer here go to View > Header and Footer 14 Significance – Key Points Statistical significance alone is not enough. Consider whether the result is also scientifically meaningful and important. When a significant result if found, report the finding in terms of the corresponding estimates, their standard errors and C.I.’s

To put your footer here go to View > Header and Footer 15 Some practical work follows…