Choice of sample size In practice, one of the most important decisions to make in an experiment. The goal may be to be reasonably sure that you will “detect an effect” if there is an effect. Usually, an initial guess of the size of the effect is need to estimate necessary sample size. Your goal may be to detect effects of “relevant size”. A related goal: To estimate sample size necessary for a sufficiently small confidence interval.
The power of a test The power of a test is the probability of rejecting the null hypothesis, when the null hypothesis is wrong. Often denoted 1-β H0 is not rejected H0 is rejected H0 is true H0 is not true Given that: Probability that: α (Type I error) β (Type II error)
The power depends on the alternative hypothesis The power can only be computed when the alternative hypothesis is formulated so that we can compute the distribution of the test statistic. Thus, the power can be a function for example of the size of the effect that we are trying to detect.
Example: Sample size to detect an effect on a continuous variable Assumptions: –We compare two samples of equal size –Each sample is from a normal distribution with variance σ 2 (common for both distributions) –The actual difference between the means is d –We use a two-sided t-test to detect this difference Then the sample size n needed for power p (probability p to reject the null hypothesis that there is no difference between the means of the groups) is approximately (for the value of k, see next overhead)
Table for k 0.800.900.95 0.106.28.610.8 0.057.910.513.0 0.0111.714.917.8 Power Significance level
Example You are comparing a new production process with an old, to find if the new has a better yield. The standard deviation of the yield for such processes is 5 (you know from other data). You want to detect the yield difference if it is at least 5. How many repetitions do you need?
Why is this so? The actual distribution of the difference in means is The actual distribution of the test statistic is, approximately In the t-test, this is compared to, approximately, N(0,1) k, for power 1-β and significance level α, is defined such that if x~N(k 1/2,1), then Our equation can then be derived from
Sample size to detect an effect on a proportion Assume we compare the frequencies P 1 and P 2 of “successes” in two groups, of size n 1 and n 2 If we want to test the hypothesis that the population frequencies p 1 and p 2 in the two groups are equal, we can use the test statistic where P 3 is the average frequency of the two groups, and compare it with a standard normal distribution. The sample size needed for power p when testing two groups of size n is approximately
Sample size to limit the confidence interval for an effect The confidence interval for an effect eff is approximately on the form Thus the length of the confidence interval is approximately Thus the n that will give a confidence interval of length d is:
Sample size to limit the confidence interval for a proportion For a proportion, the above gives the formula The factor p(1-p) is always smaller than 0.25! Thus, replacing it with 0.25 gives an upper limit for the necessary sample size.
Computations in more complex situations For tests similar to those above, we can derive similar formulas In general, if we specify –the experiment (including sample size) –the exact alternative hypothesis –the test procedure we can always estimate the power of the test. Then we can work backwards to get a sample size.