Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.

Slides:



Advertisements
Similar presentations
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.
Advertisements

Designing an impact evaluation: Randomization, statistical power, and some more fun…
Introduction Simple Random Sampling Stratified Random Sampling
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Estimating a Population Proportion
What could go wrong? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop Africa Program for Education.
Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Planning Sample Size for Randomized Evaluations Jed Friedman, World Bank SIEF Regional Impact Evaluation Workshop Beijing, China July 2009 Adapted from.
Confidence Intervals for Proportions
1 Managing Threats to Randomization. Threat (1): Spillovers If people in the control group get treated, randomization is no more perfect Choose the appropriate.
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Experimental Design, Statistical Analysis CSCI 4800/6800 University of Georgia Spring 2007 Eileen Kraemer.
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, 2009 AIM-CDD Using Randomized Evaluations.
Chapter 2: The Research Enterprise in Psychology
1 Randomization in Practice. Unit of randomization Randomizing at the individual level Randomizing at the group level –School –Community / village –Health.
Chapter 1: Introduction to Statistics
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Research methods in adult development
Inference for a Single Population Proportion (p).
AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Slide 13-1 Copyright © 2004 Pearson Education, Inc.
Shawn Cole Harvard Business School Threats and Analysis.
Evaluating a Research Report
Between groups designs (2) – outline 1.Block randomization 2.Natural groups designs 3.Subject loss 4.Some unsatisfactory alternatives to true experiments.
PowerPoint presentation to accompany Research Design Explained 6th edition ; ©2007 Mark Mitchell & Janina Jolley Chapter 10 The Simple Experiment.
IMPACT EVALUATION AND ANALYSIS OF DEVELOPMENT INTERVENTIONS Increasing power through repeated measures Ruth Vargas Hill, IFPRI.
Chapter 3.1.  Observational Study: involves passive data collection (observe, record or measure but don’t interfere)  Experiment: ~Involves active data.
Introduction  Populations are described by their probability distributions and parameters. For quantitative populations, the location and shape are described.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Why Use Randomized Evaluation? Isabel Beltran, World Bank.
Applying impact evaluation tools A hypothetical fertilizer project.
How confident are we in the estimation of mean/proportion we have calculated?
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Randomized Assignment Difference-in-Differences
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Using Randomized Evaluations to Improve.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Statistical Inference: Poverty Indices and Poverty Decompositions Michael Lokshin DECRG-PO The World Bank.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
WELCOME TO BIOSTATISTICS! WELCOME TO BIOSTATISTICS! Course content.
Copyright © 2009 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Inference for a Single Population Proportion (p)
Dependent-Samples t-Test
Threats and Analysis.
Observational Studies and Experiments
Experimental Design.
Experimental Design.
Chapter 7: Sampling Distributions
Confidence Intervals for Proportions
SAMPLING AND STATISTICAL POWER
What do Samples Tell Us Variability and Bias.
Sample Sizes for IE Power Calculations.
Confidence Intervals for Proportions
Week 9 Sampling a population
Presentation transcript:

Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University

What affects the sample size? Effect Size Variance Sample Size Significance Level Power Proportion in Treatment

Variance in outcome Y – Higher variance, higher sample size Effect size – Higher effect, smaller sample size Significance – Lower significance (power), smaller sample size Balance between T and C – More balance, higher power Intra-cluster correlation What affects sample size?

Attrition Spillovers Partial Compliance and Sample Selection Bias – Can calculate ITT, TOT/LATE Choice of outcomes External validity Threats to Analysis of RCTs 4

Introduction to ATAI and J-PAL Using randomized evaluations to test adoption constraints From adoption to impact Alternative strategies for randomizing programs Power and sample size Managing and minimizing threats to analysis Common pitfalls Randomized evaluation: Start-to-finish Course Overview

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size 4.Failing to Monitor Data Quality 5.Communication and Implementation Common Pitfalls

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size 4.Failing to Monitor Data Quality 5.Communication and Implementation Common Pitfalls

What is attrition? Attrition refers to the failure to collect outcome data from some individuals who were part of the original sample Drop-out, migration, death, illness, moving This could be both random or non-random.

Attrition Random attrition will only reduce a study’s statistical power Randomization ensures “balance” in the initial treatment and comparison groups -- but this does not hold after non-random attrition Attrition that is correlated with the treatment may bias estimates

How should we deal with attrition? Do not ignore it or treat it as “missing” data! Manage attrition during the data collection process (very difficult to solve ex-post) Report attrition levels in the treatment and comparison groups Compare the two using baseline data to determine if they differ systematically Bound the treatment effect

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size (imprecise effect) 4.Failing to Monitor Data Quality 5.Communication and Implementation Common Pitfalls

Imperfect compliance o Some farmers in treated villages receive (use) fertilizer (compliers), whereas who were supposed to receive fertilizer do not (non-compliers) o Farmers in control villages receive fertilizer The treatment assignment (allocation) is different from treatment Dropping non-compliers

We can’t simply drop the “non-compliers” or include them in the control group (if non- compliance is non-random) What can we do? o Analyze the program effect by calculating the intention to treat (ITT) o Analyze the program effect by calculating the TOT or LATE Dropping non-compliers

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size 4.Failing to Monitor Data Quality 5.Communication and Implementation Common Pitfalls

Estimating Treatment Effects Quality of estimation o “Good estimate” (Unbiased, attributable to the program) o “Precise estimate” (Size of standard errors) Good Estimate o Unbiased o No Spillovers o High Quality Data (no measurement error) Precise Estimate o How close is the estimate to the truth? (Sample size) 15

So did the treatment have an effect?

The confidence interval (standard error) depends upon sample size – As well as variance in Y (the outcome variable) and variance in X Since it can be hard to “manipulate” Y and X, we can only control N (the sample size) A “too small” sample size can mean that we might not detect a statistically significant effect (even if there is one) Having a (too small) sample size

Ignoring the sample size calculations or only doing one calculation can lead to a sample size that is too small to detect an effect Calculate the sample size several times by modifying key parameters (s.d. of Y, power, intra-cluster correlation) to see how the sample size varies Choose the sample size to balance power and cost Having a (too small) sample size

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size (imprecise effect) 4.Failing to Monitor Data Quality 5.Communication and Implementation Common Pitfalls

Failing to Monitor Data Quality Classical and non-classical measurement error Data collection in treatment and control groups

Measurement error We correctly recorded what the respondent said, but can we trust what they said? Respondent bias Interviewer bias Question comprehension If you were 16 year old girl and had a unprotected sex, would you want to tell an older male stranger about it? Can you remember what you ate in the past 24 hours? Would you know how big (sq feet) your apartment is?

Measurement Error Classical measurement error: Measurement error is random o One farmer overestimates his yields while another farmer underestimates her yields, but this is not correlated with farmer characteristics (systematic) o On average, the error is zero Non-classical measurement error: Measurement error is non-random o Richer farmers report lower yields (to avoid taxes), so (reported) yields are lower as farmers are richer o Farmers in control villages report lower yields in order to participate in the program next year

Measurement Error If we have classical measurement error in the dependent variable, the treatment effect is unbiased but s.e. are less precise o Can make it harder to detect a statistically significant effect If we have classical measurement error in other the independent variables (not the treatment effect), the treatment effect can be biased If we have non-classical measurement error in either, this can lead to even larger problems

How can these be avoided? Include simpler questions Limited recall periods (if possible) o 24 hours for consumption o Two weeks for health/nutrition o Agricultural season for crop planting, harvests Careful selection and training of enumerators and supervisors Clearly explain the objective of the survey Independent verification (if possible, measure yields on-site) Consider survey timing (respondent fatigue)

Two scenarios: o Program staff collect data in treatment areas and professional enumerators collect data in control areas. o Yields in the treatment group are estimated using the cooperative’s sales records in the treatment group and through a household survey in the control group How does that affect our results? Can’t tell whether the difference is due to the program or due to the differences in the data collection process (interviewer or respondent bias) Data Collection in Treatment and Control Groups

We cannot ignore differences in data collection across treatment and control groups Data should (ideally) be collected by the same enumerators, at the same time (period) and in the same way (using the same methods and tools) in both treatment and control groups Data Collection in Treatment and Control Groups

1.Ignoring attrition 2.Dropping non-compliers 3.Having a (too small) sample size (imprecise effect) 4.(Failing to) Monitor Data Quality 5.Implementation and Communication Common Pitfalls

Unlike some other evaluation techniques, randomization can change the way in which a program is designed and implemented (not just evaluated) This can have implications beyond pure evaluation (baseline and endline): o The project design – who, what, where o Rollout of program o Evaluation strategy Communication and Implementation

This implies greater communication between the program and evaluator(s) o Can randomization be used? If so, how? How does this change the implementation? o Did the randomization work? o Were there changes in implementation along the way? o Was there drop-out, imperfect compliance? The evaluator should provide feedback and data in a timely manner Communication and Implementation