# 1 Today’s lecture Inferential methods - review –Bayesian – frequentist –Parametric, non-parametric, semi-parametric A more modern approach to non-parametric.

## Presentation on theme: "1 Today’s lecture Inferential methods - review –Bayesian – frequentist –Parametric, non-parametric, semi-parametric A more modern approach to non-parametric."— Presentation transcript:

1 Today’s lecture Inferential methods - review –Bayesian – frequentist –Parametric, non-parametric, semi-parametric A more modern approach to non-parametric procedures –Randomisation tests –Bootstraps Next week –Revision

2 Web site Material for last two weeks now on www.maths.napier.ac.uk/~gillianr This includes materials for today’s practical workshop on bootstraps

3 Parametric and non-parametric methods In both methods we are assuming that the data we are observing follow some model For parametric methods this is a model based on known probability distributions What we are saying is “IF the model is true – then we can conclude …. about the model and its parameters”

4 Parametric and non-parametric methods Non-parametric tests also make assumptions They imply that the DATA observed are a random sample from some unspecified distribution What we are saying is “IF we have observed these data – then we can conclude …. about the distribution(s) they come from”

5 Modern non-parametric methods Parametrics condition (IF) on the model Non-parametrics condition(IF) on the data Traditional non-parametric tests used ranks –This was for practical reasons in pre-computer days “Modern” non-parametric methods can use the data –Randomisation tests for hypotheses (not so modern as they go back to RA Fisher –Bootstrap methods for confidence intervals

6 Randomisation test for a difference in means Tests the null hypothesis that the two samples come from a common distribution So in some ways this is more than a difference in means or even medians It is the same null hypothesis tested by traditional rank tests (eg Wilcoxon Mann-Whitney test) Rank testsare not just a test of medians Mann-Whitney test is not just a test of medians: differences in spread can be important Anna Hart BMJ 2001; 323: 391-393 (get it from BMJ.com)

7 Sample data set Data on weights (in pounds) of 19 young people 9 female 10 male Are males or females heavier? Are the weights of males more variable than those of females?

8 T test output from SAS P value for differences in means is 0.0702 (pooled sd) or 0.0680 (common sd)

9 Permutation/randomisation test Here the difference between the means b (girls - boys) was -18.84 pounds Is this more than we would expect by chance if no difference between M and F? –Consider all 19 people –select 9 of them at random to be female –get weight difference for ‘females’ - ‘males’ –this is the randomisation/permutation distribution under H 0

10 Programming the randomisation test This can be done easily Details of a SAS macro to do this are on the next page An EXCEL macro to do this is also available. On th class web page It was downloaded from –hhttp://www.bioss.ac.uk/smart/frames.htmlh/frames.html –It incorporates corrections from one of last year’s Napier honours students

11 SAS program to do this On my web site www.maths.napier.ac.uk/~gillianr –macro - randmac.sas (submit this first) –program rand.sas –this reads in data and runs macro –you can alter it for your own data Go to SAS now if time (this is V8.1)

12 Randomisation distribution of difference in means – actual difference was –18.84 Proportion of the distribution further away from zero than this is 0.0673 This compares with 0.068 or o.07o2 for t-tests

13

14 Conclusions For this problem, all methods give very similar answers for both means and variances This is usually true Exceptions are for odd distributions with possible outliers For these a randomisation test is a good choice To go with it, use a bootstrap C.I.

15 Comparing parametric and bootstrap confidence intervals Bootstrap 95% confidence interval for difference (1000 bootstraps) (-37.87 to 0.16) So again, very similar to parametric

16 Bootstraps Methods developed in the 1970s –Brad Efron –Rob Tibshirani Text book in library by these authors – also describes randomisation tests

17 What is a bootstrap? The data consist of n items Take a sample of size n from the data WITH replacement –data 1 2 3 4 –possible bootstraps 1 1 2 3, 1 1 1 1, 1 2 3 4, Take lots of bootstrap samples and calculate the quantity of interest (e.g. the mean) from each one. –The variability between the quantities calculated from the bootstraps is like the variation you expect in your estimate.

18 Example of 3 bootstrap samples

19 Go to EXCEL demo At University of Illonois M2T2 project Mathematics material for tommorow’s teachers Copied on my web page Found on web at http://www.mste.uiuc.edu/

20 Bootstrap macro for SAS First submit macro file (randmac.sas) Then run macro(example in file bootstrap.sas) %boots(in=new,var=x,seed=233667,fun=mean, nboots=100,out=result); Explanation of each is in the sample program bootstrap.sas Go to SAS now

21 Other bootstrap macros Does bootstrap CI for differences in means Example in program rand.sas %bootdiff(in=class,var=weight,group=sex,n boots=1000,seed=45345,out=bootdiff); Does bootstraps for correlation coefficients Example in rand.sas %bootscor(in=new2,var1=score1,var2=score2,seed=5465,nboots=50,out=corr);

22 Pearson’s correlation coefficient Calculated for sample data It has values between -1 and +1 0 represents no association We can think of our sample value of r estimating a population quantity  So we can calculate a bootstrap CI for 

23 Bootstrap for correlation Data consist of pairs of values (x1, y1) (x2,y2) (x3, y3) (x4,y4) (x5,y5) Bootstraps are samples of pairs – with replacement eg (x1,y1) (x1,y1) (x3,y3) (x5,y5) (x5,y5) The correlations from bootstrap samples will always be between –1 and +1. Sample data next

24 Confidence interval for a correlation coefficient Data on agreement in scoring between two observers from a sample of 30 items scored Sample value is 0.966 How well might this represent the true population value ? Bootstrap confidence interval gives us the interval and also the bootstrap distribution (see next slide) Next we will look at the classical approach to this

25

26

27 Distribution of r If we took many small samples, the distribution of r would not be normal. Fisher showed that, for bivariate normal distributions, –z = 0.5 ln[(1+r)/(1-r)] –is approximately normal with a standard error of 1 / sqrt(n-3) This can be used to get a confidence interval for  –r = [exp(2z)-1]/[exp(2z)+1]

28 Example used for bootstraps R = 0.966 n=30 z = 0.5 ln[ (1+0.966)/(1-0.966)] = 2.0287 standard error of z = 1/  (30-3) = 1/  27 =0.192 95% C Int for z (2.0287 +/- 1.96 x 0.1925) (0.93 - 0.98) Bootstraps gave (0.94-0.99) - near enough

29 Conclusions Randomisation tests and bootstraps are a useful alternative to parametric tests BUT they are more bother to do But should get easier in future In general they vindicate the robustness of classical tests, even when their assumptions are not true An exception may be data with outliers We will investigate this in the practical

30 Summary F tests for ratios of variances How F ratios are used in regression and analysis of variance (sketch) Randomisation / permutation tests Bootstraps Correlation coefficient (introduction)

Download ppt "1 Today’s lecture Inferential methods - review –Bayesian – frequentist –Parametric, non-parametric, semi-parametric A more modern approach to non-parametric."

Similar presentations