Presentation is loading. Please wait.

Presentation is loading. Please wait.

RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.

Similar presentations


Presentation on theme: "RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the."— Presentation transcript:

1 RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the sample size 4.3 Some simple variations 4.4 Further considerations

2 RDPStatistical Methods in Scientific Research - Lecture 42 4.1 Criteria for sample size determination Suppose that we are to conduct an investigation comparing populations,  A and  B Sample A comprises n A units of observation from  A Sample B comprises n B units of observation from  B Suppose that n A = n B and that n = n A + n B The responses will be quantitative, and the analysis will use a t-test How should we choose n?

3 RDPStatistical Methods in Scientific Research - Lecture 43 Let  A  = mean response for  A  B  = mean response for  B Null hypothesis is H 0 :  A  =  B From the data, we will obtain the sample means and and sample standard deviations S A and S B for groups A and B Once we have the data, we can:  Reject H 0 and say that  A  >  B  Reject H 0 and say that  A  <  B  Not reject H 0

4 RDPStatistical Methods in Scientific Research - Lecture 44 When n A = n B = n/2, the t-statistic is where t will tend to be positive if  A  >  B,  negative if  A  <  B and close to zero if  A  =  B

5 RDPStatistical Methods in Scientific Research - Lecture 45 We will:  Reject H 0 and say that  A  >  B  if t  k  Reject H 0 and say that  A  <  B if t   k  Not reject H 0 if  k < t < k Say  A  B  k 0 k t Now we need to find both n and k

6 RDPStatistical Methods in Scientific Research - Lecture 46 Suppose that, in truth,  A  =  B This does not mean that we will observe nor t = 0 In fact, we may observe t  k or t   k, just by chance This means that we might reject H 0 when H 0 is true This is called type I error

7 RDPStatistical Methods in Scientific Research - Lecture 47 Suppose that, in truth,  A  =  B  +  where  > 0, and  is of a magnitude that would be scientifically worth detecting We may still observe t  k by chance This means that we might fail to reject H 0 when H 0 is false This is called type II error

8 RDPStatistical Methods in Scientific Research - Lecture 48 The probability that t  k or t   k, when  A  =  B, is called the risk of type I error, and is denoted by  (This is for a two-sided alternative: the probability that t  k, when  A  =  B, is the risk of type I error for a one-sided alternative and is equal to  /2) The probability that t  k, when  A  =  B +  is called the risk of type II error, and is denoted by  The probability that t  k, when  A  =  B +  is called the power, and is equal to 1  

9 RDPStatistical Methods in Scientific Research - Lecture 49 Reducing type I error Increase k – make it difficult to reject H 0 Increasing power Decrease k – make it easy to reject H 0 Reducing type I error and increasing power simultaneously Increase n – this will make the study more informative, but it will cost more

10 RDPStatistical Methods in Scientific Research - Lecture 410 4.2 Finding the sample size Suppose that the true standard deviation within each of the populations  A and  B is  Then t  Z where Z follows the normal distribution, with standard deviation 1 When  A  =  B, Z has mean 0 When  A  =  B + , Z has mean  n/(2  )

11 RDPStatistical Methods in Scientific Research - Lecture 411 Specify that the type I risk of error (two-sided) should be  : P( Z  k or Z   k :  A  =  B ) =  (1) Under H 0, Z is normally distributed with mean 0 and st dev 1 k is the value exceeded by a normal (0, 1) random variable with prob  /2

12 RDPStatistical Methods in Scientific Research - Lecture 412 Specify that the type II risk of error should be  : P( Z  k :  A  =  B +  ) =  (2) Under H 0, Z is normally distributed with mean  n/(2  ) and st dev 1 k   n/(2  ) is the value exceeded by a normal (0, 1) random variable with prob 1 

13 RDPStatistical Methods in Scientific Research - Lecture 413 For  = 0.05 and 1 –  = 0.90, we have k = 1.960 and k   n/(2  ) =  1.282 Thus Power: 1  0.80.90.95 Type I error:  0.124.73034.25543.289 0.0531.39642.03051.979 0.0146.71659.51871.257

14 RDPStatistical Methods in Scientific Research - Lecture 414 Sample size increases:  as  increases  as  decreases  as  decreases  as 1  increases

15 RDPStatistical Methods in Scientific Research - Lecture 415 Unequal randomisation The power of a study depends on which, for equal sample sizes is equal to For n E = Rn C, n = Rn C + n C and so 4.3 Some simple variations

16 RDPStatistical Methods in Scientific Research - Lecture 416 Unequal randomisation So, the overall sample size is multiplied by the factor and n E by F E and n C by F C, where R123510 F11.1251.3331.8003.025 FEFE 11.5002.0003.0005.500 FCFC 10.7500.6670.6000.550

17 RDPStatistical Methods in Scientific Research - Lecture 417 Unknown standard deviation The sample size formula depends on guessing  If this guess is smaller than the truth, the sample size will be too small and the study underpowered If this guess is larger than the truth, the sample size will be too large and the sample size unnecessarily large A more accurate calculation can be based on the t-distribution rather than the normal, but this makes little difference and does not overcome the dependence on 

18 RDPStatistical Methods in Scientific Research - Lecture 418 Unknown standard deviation Often, the final analysis will be based on a linear model, not just a t-test The formulae given can still be used, but  is now the residual standard deviation (the SD about the fitted model) Fitting the right factors will reduce the residual standard deviation, and so the sample size will also be reduced  but you have to guess what  will be in advance!

19 RDPStatistical Methods in Scientific Research - Lecture 419 Sample size for estimation The sample size can be determined to give a confidence interval of specified width W The 95% confidence interval for  =  A   B is of the form when sample sizes are large (Lecture 1, Slide 24) When n A = n B = n/2, this has length

20 RDPStatistical Methods in Scientific Research - Lecture 420 Sample size for estimation We need to set which means that

21 RDPStatistical Methods in Scientific Research - Lecture 421 Binary data For R = 1,  = 0.05 and 1 –  = 0.90, we have where p C is the anticipated success rate in  C, and p E the improved rate in  E to be detected with power 1 

22 RDPStatistical Methods in Scientific Research - Lecture 422 Examples for binary data: R = 1,  = 0.05 and 1 –  = 0.90 pCpC pEpE  n 0.10.20.8110.1275 502 0.10.31.3500.1600 144 0.10.52.1970.2100 42 0.30.40.4420.2275 946 0.30.50.8470.2400 244 0.30.71.6950.2500 60 0.40.50.4050.24751034 0.40.60.8110.2500 256 0.40.81.7920.2400 56 0.50.60.4050.24751034 0.50.70.8470.2400 244 0.50.92.1970.2100 42

23 RDPStatistical Methods in Scientific Research - Lecture 423 Binary data This approach is based on the log-odds ratio Many other approximate formulae exist All give similar answers when sample sizes are large: exact calculations can be made for small sample sizes

24 RDPStatistical Methods in Scientific Research - Lecture 424 4.4 Further considerations Setting the values for  and  The standard scientific convention is to ensure that  will be small, and allow any risks to be taken with  For example, if an SD or a control success rate is underestimated at the design stage, the study will be underpowered – the analysis maintains the type I error  at the cost of losing power  is the community’s risk of being given a false conclusion  is the scientist’s risk of not proving his/her point

25 RDPStatistical Methods in Scientific Research - Lecture 425 Exceptions If the scientist wishes to prove the null hypothesis (equivalence testing)  then  should be kept small, while  can be inflated if necessary In a pilot study, preliminary to a larger confirmatory study  type I errors can be rectified in the next study, but type II errors will mean that the next study is not conducted at all

26 RDPStatistical Methods in Scientific Research - Lecture 426 Finally:  Many more sample size formulae exist – see Machin et al. (1997)  Software also exists: nQuery advisor, PASS  Ensure that the sample size formula used matches the intended final analysis  In complicated situations, the whole study can be simulated on the computer in advance to determine its power


Download ppt "RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the."

Similar presentations


Ads by Google