Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sample Sizes for IE Power Calculations.

Similar presentations


Presentation on theme: "Sample Sizes for IE Power Calculations."— Presentation transcript:

1 Sample Sizes for IE Power Calculations

2 Overview General question: How large does the sample need to be to credibly detect a given effect size? What does “Credibly” mean here? We can be reasonably sure that the difference between the treatment group and the comparison group is due to the program Randomization removes bias, but it does not remove noise. To reduce noise, we need a large sample size. But how large is large?

3 Measuring Impact At the end of an experiment, we will compare the outcome of interest in the treatment and the comparison groups. We are interested in the difference: Mean in treatment - Mean in control = Effect size For example: mean of the malaria prevalence in villages with ITN distribution vs. mean of malaria prevalence in villages with no ITNs To make conclusions based on that effect size, we need it to be calculated with precision- since there is always variability in data If there are other many unobserved factors affecting outcomes, it is harder to say whether the treatment had an effect

4 Precise outcomes

5 Some noise

6 Very noisy

7 Confidence Intervals We only work with data which is a sample of the population. In order to assess whether this is valid for the entire population, we need a measure of reliability A 95% confidence interval for an effect size tells us that, for 95% of any samples that we could have drawn from the same population, the estimated effect would have fallen into this interval. The Standard error (se) of the estimate in the sample captures both the size of the sample and the variability of the outcome it is larger with a small sample and with a variable outcome

8 Two Types of Errors First type of error : Conclude that there is an effect, when in fact there are no effect. The level of your test is the probability that you will falsely conclude that the program has an effect, when in fact it does not. So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect. To be confident, a= 5%, 10%, 1% Rule of thumb is that if the effect size is more than twice the standard error, you can conclude with more than 95% certainty that the program had an effect

9 Two Types of Errors Second type of error: you fail to reject that the program had no effect, when it fact it does have an effect. The Power of a test is the probability of finding a significant effect in the RCT Only with a significant effect can you cleanly influence policy Power Calculations are a tool to see how likely we are to find a significant effect for a given sample size

10 What you Need for a Power Calculation
Significance level -This is often conventionally set at 5%. - Lower levels (less likely to reject a false positive), we need more sample size to detect the effect Power Level -A power level of 80% says: 80% of the time, if there is a true effect you will be able to detect it in a given sample -Larger sample More Power The mean and the variability of the outcome in the comparison group -From previous surveys conducted in similar settings -The larger the variability is, the larger the sample needed for a given power The effect size that we want to detect -What is the smallest effect that should prompt a policy response? - The smaller the expected effect size the larger sample size needed

11 How to Determine Effect Size
What is the smallest effect that should justify the program to be adopted (in terms of cost-benefit)? Sets minimum effect size we would want to be able to test for Common danger: use an effect size that is too optimistic too small of sample size How large an effect you can detect with a given sample depends on how variable the outcomes is. Example: If all children have very similar diarrhea prevalence without a program, a very small impact will be easy to detect The Standardized effect size is the effect size divided by the standard deviation of the outcome Common effect sizes are: .20 (small); .40 (medium); .50 (large)

12 Design Factors to Take into Account
Availability of a Baseline A baseline can help reduce needed sample size since: Removes some variability in data, increasing precision Can been use it to stratify and create subgroups The level of randomization Whenever treatment occurs at a group level, this reduces power relative to randomization at individual level

13 Cluster (Group) Randomization
Rural Water Project: Water Guard Individual Rural Water Project: Spring Improvement Village Community-based Monitoring in Uganda HIV/AIDS Education School-level

14 Implications from Group Design
The outcomes for all the individuals within a unit may be correlated All villagers affected by spring improvements at same time All students at school with trained teachers may have benefited from information The sample size needs to be adjusted for this correlation The more correlation within the group, the more we need to adjust the standard errors

15 Implications It is extremely important to randomize an adequate number of groups. Typically the number of individual within groups matter less than the number of groups Big increases in power usually only happens when the number of groups that are randomized increase If you randomize at the level of the district, with one treated district and one control district, you have 2 observations!

16 Conclusions Power calculations involve some guess work
Some time we do not have the right information to conduct it very properly However, it is important to do them to: Avoid launching studies that will have no power at all: waste of time and money Determine the appropriate resources to the studies that you decide to conduct (and not too much) If you have a fixed budget, can determine whether the project is feasible at all Software:


Download ppt "Sample Sizes for IE Power Calculations."

Similar presentations


Ads by Google