# SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)

## Presentation on theme: "SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)"— Presentation transcript:

SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to Understand the type of logic behind common non-parametric tests for comparing two groups based on ranks Interpret and understand the commonly used Wilcoxon signed-rank test Appreciate some practical problems associated with the methods

To put your footer here go to View > Header and Footer 3 An illustrative example Paired-Samples 10 farmers recorded their crop yield (tonnes/hectare) before and after the use of a fertiliser. Has the use of fertiliser changed the yield? Data (after-before pair-wise differences): 0.02, 0.89, -0.06, 0.26, 0.83, 0.42, 0.80, -0.05, 0.64, 0.84 How can we address this question objectively?

To put your footer here go to View > Header and Footer 4 Start by plotting - Roughly symmetric distribution?

To put your footer here go to View > Header and Footer 5 Addressing the question … A paired t-test is often employed in such cases Recall this is simply a one-sample t-test applied to the pair-wise differences The procedure assumes the pair-wise differences are from a normal distribution

To put your footer here go to View > Header and Footer 6 Addressing the question Recall the t-test procedure is quite robust against departures from normality (c.f. Session 19) However, if we are concerned about the validity of the normal assumption we might use a non-parametric test that does not make this assumption

To put your footer here go to View > Header and Footer 7 Addressing the question One possibility is to use a sign test, to test H 0 : Population median difference, =0 vs. H 1 : Population median difference, 0 However, this procedure is inefficient as it effectively only utilises the signs (+/-) of the pair-wise differences Can more information be used?

To put your footer here go to View > Header and Footer 8 Yes, but at a price… We use the rank order of the pair-wise differences, but not the actual values This leads to the Wilcoxon-signed rank test Assumptions The pair-wise differences are not only independent, but are from a symmetric, but unspecified distribution Wilcoxon signed-rank test

To put your footer here go to View > Header and Footer 9 Back to the example Let us assume the distribution of pair-wise differences is symmetrically distributed Not unreasonable based on the plots Also, the sample median and mean are similar; 0.53 and 0.46 respectively

To put your footer here go to View > Header and Footer 10 Wilcoxon signed-rank test Difference0.020.89-0.060.260.830.420.80-0.050.640.84 Rank11034857269 Sign++-++++-++ Signed rank+1+10-3+4+8+5+7-2+6+9 Rank the n=10 differences according to their magnitude Re-attach the signs to give signed-ranks: Notes Use average ranks for ties Zero differences are ignored in the above process (reducing the sample size)

To put your footer here go to View > Header and Footer 11 Wilcoxon signed-rank test LetT + =sum of +ve ranks = 50 T =sum of –ve ranks = 5 Take either T+ or T as a test statistic –T + + T =n(n+1)/2 Consider T +. A sufficiently small or large value is evidence to reject H 0 To obtain a p-value we compare T + with its null distribution –This is a symmetric discrete distribution A two-sided p-value is then Prob(T +5)+Prob(T +50)

To put your footer here go to View > Header and Footer 12 The p-value calculation Exact method –Cumbersome –Use appropriate software Large sample approximation –Approximate the null distribution of T + using a normal distribution –n>20 will usually give a reasonable approximation

To put your footer here go to View > Header and Footer 13 P-value calculation Using statistical software, e.g. Stata: sign | obs sum ranks positive | 8 50 negative | 2 5 all | 10 55 Ho: diff = 0 z = 2.293 Prob > |z| = 0.0218 Approximate p-value = 0.022 From SPSS, exact p-value = 0.020

To put your footer here go to View > Header and Footer 14 Conclusions The p-value is small. Hence, there is evidence to reject H 0 The estimated median difference (after – before), 0.53, is significant There is evidence based on this study for a positive fertiliser effect

To put your footer here go to View > Header and Footer 15 Comments While the Wilcoxon signed-rank test makes less restrictive assumptions than the t-test there are still a number of major practical problems The symmetric assumption is still quite limiting, as many distributions are skewed Confidence intervals (CIs) –As with the sign test (Session 19) most software packages concentrate on the p-value rather than point estimates and confidence intervals

To put your footer here go to View > Header and Footer 16 Two independent samples For comparing independent samples, a t- test for independent samples is often used If we were concerned about the validity of the underlying assumptions we could employ a non-parametric method The Wilcoxon rank-sum test (or the equivalent Mann-Whitney U test) is a common choice –Once again this is based on ranks

To put your footer here go to View > Header and Footer 17 Concluding remarks Non-parametric tests may be used. However… –Their usefulness is often over-rated –The lack of confidence intervals is a major disadvantage Practical statistics is frequently more complicated than comparing two groups. In this case, t-test methodology naturally extends into more a more general modelling framework The non-parametric tests discussed do not naturally extend

To put your footer here go to View > Header and Footer 18