# SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)

## Presentation on theme: "SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)"— Presentation transcript:

SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to Understand the general meaning of non- parametric methods and when they might be used Implement and interpret a simple non- parametric test, the sign test, and understand its advantages and limitations Appreciate some practical problems associated with non-parametric methods

To put your footer here go to View > Header and Footer 3 An illustrative example A random sample of 12 small businesses were asked What percentage of last years profit was reinvested?. Data:5.1, 6.4, 7.1, 23.6, 4.7, 14.3, 5.9, 5.5, 11.6, 17.5, 8.2, 7.7 A government official claims the real average is 10%. How can this claim be tested?

To put your footer here go to View > Header and Footer 4 Start by plotting - A very skewed distribution

To put your footer here go to View > Header and Footer 5 Addressing the question … A one-sample t-test is often employed in such cases, but the procedure assumes normally distributed data This is clearly NOT the case here, and hence the validity of the t-test procedure is questionable

To put your footer here go to View > Header and Footer 6 Recall the t-test is robust to departures from normality due to the Central Limit Theorem We only need to worry if the sample size is quite small and/or the underlying distribution is very non-normal Hence, we might be concerned here about applying a t-test in our example Robustness of the t-test

To put your footer here go to View > Header and Footer 7 Two alternative approaches Transformations Are the measurements approximately normally distributed on a different measurement scale, e.g. a logarithmic scale? If so, analyse the data on the transformed scale Non-Parametric methods Utilise a technique that does not assume a normal distribution. Such methods are often collectively referred to as non- parametric methods …

To put your footer here go to View > Header and Footer 8 Non-parametric methods (or tests) derive their name from the fact that no explicit distribution (e.g. normal, gamma, …) is associated with the data Occasionally the techniques are called distribution-free methods, but assumptions may be made, e.g. a symmetrical distribution. Hence, the name is potentially misleading To illustrate the above we shall now apply a simple sign test to the example Non-Parametric methods

To put your footer here go to View > Header and Footer 9 Back to the example Let us make no assumption about the distribution of reinvestment percentages Have said this, the distribution is clearly very skewed. When attempting to summarise the average of such a distribution the median is a natural choice –Sample median = 7.4% The median is a flexible summary and so hypotheses of interest are generally phrased in terms of a population median

To put your footer here go to View > Header and Footer 10 The sign test Hypotheses: H 0 : Population median, =10% vs. H 1 : Population median, 10% Assumptions: Data values are independent. No distributional assumption is necessary Logic: If H 0 is true, then we would expect half of the observed values to fall below 10 and half above 10. How inconsistent is our data with this expectation?

To put your footer here go to View > Header and Footer 11 Applying the sign test List the data in ascending order: 4.7, 5.1, …,8.2, 11.6, …, 23.6 If a value is < 10 assign a negative sign; if a value is > 10 assign a positive sign Under H 0, we have a random sample of n=12 binary outcomes (– or +): – – – – – – – – + + + + This gives 8 –ve and 4 +ve signs compared to the expected 6 and 6 respectively

To put your footer here go to View > Header and Footer 12 Applying the sign test How unusual is this result under H 0 ? A natural test statistic is literally the number of +ve signs [the choice –ve vs. +ve is arbitrary] A sufficiently small or large value is evidence to reject H 0 Under H 0, R=number of +ve signs follows a binomial distribution with n=12 and p=0.5 –This is a symmetric distribution A two-sided p-value is then Prob(R4)+Prob(R8) = 2Prob(R4)

To put your footer here go to View > Header and Footer 13 The p-value Using statistical software, e.g. Stata: Two-sided test: Ho: median of reinvest - 10 = 0 vs. Ha: median of reinvest - 10 != 0 Pr(#positive >= 8 or #negative >= 8) = min(1, 2*Binomial(n = 12, x >= 8, p = 0.5))= 0.3877 P-value = 0.39 This may be calculated by using the Excel BINOMDIST worksheet function

To put your footer here go to View > Header and Footer 14 Conclusions The p-value is very large. Hence, there is no evidence to reject H 0 The estimated median reinvestment, 7.4%, is not significantly different from 10% There is no evidence based on this survey against the government officials claim

To put your footer here go to View > Header and Footer 15 Further notes P-value calculation –The p-value may be approximated using the normal approximation to the binomial distribution –Compare Z with the tails of a N(0,1) distribution –n > 20 will usually give a reasonable approximation

To put your footer here go to View > Header and Footer 16 Further notes No signs –If any value equals the hypothesised median of 10 then it is ignored and the sample size is reduced accordingly One-sided tests –Although a two-sided example was discussed, one-sided tests are also possible

To put your footer here go to View > Header and Footer 17 Pros and cons of the sign test Advantages Simple and logical Widely applicable –Few assumptions Robust to outliers –Recorded values are not used, only signs

To put your footer here go to View > Header and Footer 18 Pros and cons of the sign test Major Disadvantages Severe loss of information –Recorded values not used, only signs –Makes the sign test inefficient Confidence intervals (CIs) –A CI for the true median can be constructed, but it is cumbersome –Software packages tend not to present a CI for the median, instead concentrating on the p- value

To put your footer here go to View > Header and Footer 19 Concluding remarks Non-parametric methods generally concentrate on hypothesis testing, and hence the p-value The lack of confidence intervals is a major disadvantage We shall return to these issues in Session 20

To put your footer here go to View > Header and Footer 20 References The two references below apply to both Sessions 19 and 20 and also to non- parametric methods in general. Conover, W.J. (1999) Practical Nonparametric Statistics. 3rd edn. Wiley, pp. 584. Sprent, P., (1993) Applied Nonparametric Statistical Methods, 2nd edn. Chapman and Hall, London.

To put your footer here go to View > Header and Footer 21 Practical work follows …