Sample Sizes for IE Power Calculations.

Slides:

Advertisements

Similar presentations

Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.

Advertisements

Designing an impact evaluation: Randomization, statistical power, and some more fun…

AADAPT Workshop South Asia Goa, December 17-21, 2009 Kristen Himelein 1.

Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:

Planning Sample Size for Randomized Evaluations Jed Friedman, World Bank SIEF Regional Impact Evaluation Workshop Beijing, China July 2009 Adapted from.

Chapter 9 Hypothesis Testing.

Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.

SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,

Sampling and Sample Size Part 2 Cally Ardington. Lecture Outline  Standard deviation and standard error Detecting impact  Background  Hypothesis testing.

Sample size calculation

The Hypothesis of Difference Chapter 10. Sampling Distribution of Differences Use a Sampling Distribution of Differences when we want to examine a hypothesis.

Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.

Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.

1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda World Bank 1.

P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

TRANSLATING RESEARCH INTO ACTION Sampling and Sample Size Marc Shotland J-PAL HQ.

1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.

Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.

Institute of Professional Studies School of Research and Graduate Studies Selecting Samples and Negotiating Access Lecture Eight.

+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.

Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.

Methods of Presenting and Interpreting Information Class 9.

Lecture Notes and Electronic Presentations, © 2013 Dr. Kelly Significance and Sample Size Refresher Harrison W. Kelly III, Ph.D. Lecture # 3.

Inference for a Single Population Proportion (p)

Chapter 8: Estimating with Confidence

Confidence Intervals for Proportions

CHAPTER 8 Estimating with Confidence

Experiments, Simulations Confidence Intervals

Chapter 8: Estimating with Confidence

Lurking inferential monsters

Statistical Core Didactic

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Confidence Intervals for Proportions

Confidence Intervals for Proportions

Power, Sample Size, & Effect Size:

CHAPTER 8 Estimating with Confidence

CHAPTER 14: Confidence Intervals The Basics

Chapter 9 Hypothesis Testing.

CHAPTER 8 Estimating with Confidence

Planning sample size for randomized evaluations

Confidence Intervals: The Basics

Significance Tests: The Basics

Chapter 8: Estimating with Confidence

Chapter 12 Power Analysis.

Sampling and Power Slides by Jishnu Das.

CHAPTER 8 Estimating with Confidence

Chapter 8: Estimating with Confidence

Confidence Intervals for Proportions

SAMPLING AND STATISTICAL POWER

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

CHAPTER 8 Estimating with Confidence

CHAPTER 8 Estimating with Confidence

CHAPTER 8 Estimating with Confidence

Chapter 8: Estimating with Confidence

Interpreting Epidemiologic Results.

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

CHAPTER 8 Estimating with Confidence

CHAPTER 8 Estimating with Confidence

Confidence Intervals for Proportions

Chapter 8: Estimating with Confidence

Type I and Type II Errors

Objectives 6.1 Estimating with confidence Statistical confidence

STA 291 Spring 2008 Lecture 21 Dustin Lueker.

Presentation transcript:

Sample Sizes for IE Power Calculations

Overview General question: How large does the sample need to be to credibly detect a given effect size? What does “Credibly” mean here? We can be reasonably sure that the difference between the treatment group and the comparison group is due to the program Randomization removes bias, but it does not remove noise. To reduce noise, we need a large sample size. But how large is large?

Measuring Impact At the end of an experiment, we will compare the outcome of interest in the treatment and the comparison groups. We are interested in the difference: Mean in treatment - Mean in control = Effect size For example: mean of the malaria prevalence in villages with ITN distribution vs. mean of malaria prevalence in villages with no ITNs To make conclusions based on that effect size, we need it to be calculated with precision- since there is always variability in data If there are other many unobserved factors affecting outcomes, it is harder to say whether the treatment had an effect

Precise outcomes

Some noise

Very noisy

Confidence Intervals We only work with data which is a sample of the population. In order to assess whether this is valid for the entire population, we need a measure of reliability A 95% confidence interval for an effect size tells us that, for 95% of any samples that we could have drawn from the same population, the estimated effect would have fallen into this interval. The Standard error (se) of the estimate in the sample captures both the size of the sample and the variability of the outcome it is larger with a small sample and with a variable outcome

Two Types of Errors First type of error : Conclude that there is an effect, when in fact there are no effect. The level of your test is the probability that you will falsely conclude that the program has an effect, when in fact it does not. So with a level of 5%, you can be 95% confident in the validity of your conclusion that the program had an effect. To be confident, a= 5%, 10%, 1% Rule of thumb is that if the effect size is more than twice the standard error, you can conclude with more than 95% certainty that the program had an effect

Two Types of Errors Second type of error: you fail to reject that the program had no effect, when it fact it does have an effect. The Power of a test is the probability of finding a significant effect in the RCT Only with a significant effect can you cleanly influence policy Power Calculations are a tool to see how likely we are to find a significant effect for a given sample size

What you Need for a Power Calculation Significance level -This is often conventionally set at 5%. - Lower levels (less likely to reject a false positive), we need more sample size to detect the effect Power Level -A power level of 80% says: 80% of the time, if there is a true effect you will be able to detect it in a given sample -Larger sample More Power The mean and the variability of the outcome in the comparison group -From previous surveys conducted in similar settings -The larger the variability is, the larger the sample needed for a given power The effect size that we want to detect -What is the smallest effect that should prompt a policy response? - The smaller the expected effect size the larger sample size needed

How to Determine Effect Size What is the smallest effect that should justify the program to be adopted (in terms of cost-benefit)? Sets minimum effect size we would want to be able to test for Common danger: use an effect size that is too optimistic too small of sample size How large an effect you can detect with a given sample depends on how variable the outcomes is. Example: If all children have very similar diarrhea prevalence without a program, a very small impact will be easy to detect The Standardized effect size is the effect size divided by the standard deviation of the outcome Common effect sizes are: .20 (small); .40 (medium); .50 (large)

Design Factors to Take into Account Availability of a Baseline A baseline can help reduce needed sample size since: Removes some variability in data, increasing precision Can been use it to stratify and create subgroups The level of randomization Whenever treatment occurs at a group level, this reduces power relative to randomization at individual level

Cluster (Group) Randomization Rural Water Project: Water Guard Individual Rural Water Project: Spring Improvement Village Community-based Monitoring in Uganda HIV/AIDS Education School-level

Implications from Group Design The outcomes for all the individuals within a unit may be correlated All villagers affected by spring improvements at same time All students at school with trained teachers may have benefited from information The sample size needs to be adjusted for this correlation The more correlation within the group, the more we need to adjust the standard errors

Implications It is extremely important to randomize an adequate number of groups. Typically the number of individual within groups matter less than the number of groups Big increases in power usually only happens when the number of groups that are randomized increase If you randomize at the level of the district, with one treated district and one control district, you have 2 observations!

Conclusions Power calculations involve some guess work Some time we do not have the right information to conduct it very properly However, it is important to do them to: Avoid launching studies that will have no power at all: waste of time and money Determine the appropriate resources to the studies that you decide to conduct (and not too much) If you have a fixed budget, can determine whether the project is feasible at all Software: http://sitemaker.umich.edu/group-based/optimal_design_software