Presentation on theme: "SJS SDI_181 Design of Statistical Investigations 18 Sample Size Determination Stephen Senn."— Presentation transcript:
SJS SDI_181 Design of Statistical Investigations 18 Sample Size Determination Stephen Senn
SJS SDI_182 A Note About Terminology The term sample is best reserved for cases where (usually) a representative subset (the sample) of a population is drawn. Nevertheless in the context of experiments when deciding on the size of the experiment one often refers loosely to sample size rather than experiment size
SJS SDI_183 An Important Topic If your experiment/sample is too small you may have an inconclusive result –consequence: you have wasted resources If your experiment/sample is larger than necessary you could have reached an adequate result with less effort –consequence: you have wasted resources Hence getting it right is important
SJS SDI_184 Sampling Consider a simple random sample Hence if we have some target precision for the standard error we can, with knowledge of solve for n. In fact, it is often the case (but not always: practice differs) that we use a standard of precision for the 95% confidence interval instead. Since the limits are approximately 2SE from the point estimate, we often solve for
SJS SDI_185 Sample Size and Clinical Trials Context in which sample size determination is well-established –Ethical/commercial pressures Usual approach is in term of power –Concept in the Neyman-Pearson theory of testing established at UCL during the late 1920s and early 1930s We shall now review this theory very generally before proceeding to a more formal treatment
SJS SDI_186 Hypothesis Testing Nominate null hypothesis –Default state of nature –Hypothesis one wishes to disprove Example –Null: no difference between treatment and control –Alternative: treatment superior to control
SJS SDI_187 Hypothesis Testing Continued Establish suitable alternative hypothesis –Usually we have a family of alternative hypotheses Establish distribution of suitable statistic under null –It should be salient for null and alternative Different values should be probable under null and alternative Given value thus distinguishes between two
SJS SDI_188 Hypothesis Testing Choose critical region of the test –The boundary of this is the critical value The statistics should be unlikely to fall in this region if null true The statistic should be likely to fall in this region if alternative true The region should be chosen so as to fix probability of falling in region if null true
SJS SDI_1813 Power Probability of rejecting the null-hypothesis given that it is false. (Probability of accepting alternative if true.) –For example probability of claiming difference between treatments when they are different This depends on size of effect (amount of difference between treatments) –Usually it is best to think of power as function of size of effect
SJS SDI_1815 How to do a Power Calculation Assume null hypothesis true –Establish critical value as a function of the sample size Assume clinically relevant difference obtains –Use previously established critical value –Calculate probability of rejection This is the power of the test Modify sample size to achieve desired power
SJS SDI_1816 Test statistic (assumed approximately normally distributed with expectation ) Standard error (assumed known) Type I error rate Type II error rate Clinically relevant difference (assumed positive) Critical value of test pdf of Normal Distribution function of Normal Inverse distribution function (quantile)
SJS SDI_1817 We assume a one tailed test. (For a two-tailed test it is conventional to treat this as if it was a one-tailed test of size /2. The theory is then the same, making the necessary substitution.) First we establish the critical value. We require c such that We assume that under H 0 = 0.
SJS SDI_1818 Next we establish the power of the test This formula thus provides a target value for the reciprocal of the variance of the treatment estimate
SJS SDI_1819 Now take a specific example, that of a parallel group trial with variance 2 and two groups each of size n. We have Note that n is an increasing function of and a decreasing function of. Also as and decrease, z and z increase so that n is is a decreasing function of and. (The more we wish our type one and two error rates to decrease the greater our sample size must be.)
SJS SDI_1820 An Example Placebo controlled parallel group trial in asthma. Target variable is FEV 1. Clinical relevant difference is 200 ml. Standard deviation is 450 ml. Two sided significance test at the 5% level. Power is 0.8 or 80%.
SJS SDI_1821 Solution = 200 ml = 450 ml = 0.05 so Z /2 = 1.96 NB Two-sided test being used = 1-0.8 = 0.2 so Z = 0.84 Substituting we have n = 2x(450ml) 2 (1.96+0.84) 2 /(200ml) 2 = 79.38. So about 80 patients per group are needed.
SJS SDI_1822 Actually, in practice the standard error is not known and hence for carrying out the test an estimated standard error has to be substituted. This generally means that the test is based on the t- distribution rather than the Normal distribution and the theory needs to be adjusted accordingly. We take the specific example of the parallel group trial with two arms and n patients per arm to illustrate this. For such a trial we estimate the standard error using the formula
SJS SDI_1823 Consequently the critical value for is not z (2/n but t s (2/n, where t is the point on the integral of Students t-distribution with =2n-2 degrees of freedom corresponding to a probability of. Hence we require Now, given, the LHS has a non-central t-distribution with = 2n-2 degrees of freedom and non-centrality parameter, n = /( (2/n Hence we can solve(numerically) P(T; n n ) t n = 1 - for n. In practice, however, this refinement usually makes little difference
SJS SDI_1824 Sample Size Determination in Practice There are many specialist packages now available for the statistician –nQuery, Pass, Power and Precision etc. A sample size of 81 in each group will have 80% power to detect a difference in means of 200.000 assuming that the common standard deviation is 450.000 using a two group t- test with a 0.050 two-sided significance level. This is the solution found for the previous example using nQuery. Note that use of the non-central t has led to a slightly larger sample size
SJS SDI_1825 Practical Problems A number of practical problems remain, however. First, we should note that if the number of groups being compared is more than 2, if the design is not a parallel group trial, if the outcome is not Normally distributed, if the analysis is more complicated than that indicated above, if the trial is sequential, if the allocation ratio is not one to one, or if the purpose of the trial is to prove equivalence, a different approach will be required. These are technical problems, however, for which solutions can be found. Instead, some more practical issues are listed below.
SJS SDI_1826 Practical Issues Although the test itself does not require knowledge of, the formula (3) for the sample size does. In practice we use some previous estimate but this itself will be subject to sampling error and (3) does not take this into account. There is usually no agreed standard for a clinically relevant difference. The levels of and are themselves arbitrary. An allowance must be made for drop-outs. (Patient withdrawals.) It may be required that the results be robust to a number of analyses. This requires a larger sample. How do we trade-off the interests of patients in the trial against those of future patients?
SJS SDI_1827 Criticisms of the NP Approach The criterion is very strange –fixed and –why? There is no explicit mention of the costs of sampling –Same solution, however costly observations are to obtain.
SJS SDI_1828 Finally It must be understood that just because a sample size has been chosen which gives 80% power does not imply that there is an 80% chance that the trial will be successful. 1) The drug may not work 2) If it works it may not produce a clinically relevant difference. 3) The drug might have a greater effect than the clinically relevant difference. (Implies more power.) 4) The sample size determination depends on the assumption that the trial is run competently.
SJS SDI_1829 Questions Suppose you are estimating a sample size for a placebo controlled clinical trial in hypertension (say) but this is the first time the test drug has ever been used –How would you estimate the variance? If you had no previous information at all on which to base a variance estimate what could you do?