Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon.

Slides:

Advertisements

Similar presentations

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.

Advertisements

The World Bank Human Development Network Spanish Impact Evaluation Fund.

An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley

The World Bank Human Development Network Spanish Impact Evaluation Fund.

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.

Choosing the level of randomization

Advantages and limitations of non- and quasi-experimental methods Module 2.2.

Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.

Impact Evaluation Click to edit Master title style Click to edit Master subtitle style Impact Evaluation World Bank InstituteHuman Development Network.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Impact Evaluation Methods. Randomized Trials Regression Discontinuity Matching Difference in Differences.

Public Policy & Evidence: How to discriminate, interpret and communicate scientific research to better inform society. Rachel Glennerster Executive Director.

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

Development Impact Evaluation Initiative innovations & solutions in infrastructure, agriculture & environment naivasha, april 23-27, 2011 in collaboration.

Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, 2009 AIM-CDD Using Randomized Evaluations.

Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.

Measuring Impact: Experiments

AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

The days ahead Monday-Wednesday –Training workshop on how to measure the actual reduction in HIV incidence that is caused by implementation of MC programs.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Why Use Randomized Evaluation? Presentation by Shawn Cole, Harvard Business School and J-PAL Presenter: Felipe Barrera-Osorio, World Bank 1 APEIE Workshop.

Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.

Impact Evaluation Designs for Male Circumcision Sandi McCoy University of California, Berkeley Male Circumcision Evaluation Workshop and Operations Meeting.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.

Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)

Why Use Randomized Evaluation? Isabel Beltran, World Bank.

Applying impact evaluation tools A hypothetical fertilizer project.

Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.

Impact Evaluation for Evidence-Based Policy Making

Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.

CHOOSING THE LEVEL OF RANDOMIZATION. Unit of Randomization: Individual?

Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.

Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.

Randomized Assignment Difference-in-Differences

Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.

Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Using Randomized Evaluations to Improve.

Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Randomization.

Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.

Impact Evaluation Methods Randomization and Causal Inference Slides by Paul J. Gertler & Sebastian Martinez.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.

Monitoring and evaluation 16 July 2009 Michael Samson UNICEF/ IDS Course on Social Protection.

Using Randomized Evaluations to Improve Policy

Measuring Results and Impact Evaluation: From Promises into Evidence

Quasi Experimental Methods I

Quasi Experimental Methods I

An introduction to Impact Evaluation

Quasi-Experimental Methods

Impact Evaluation Methods

Explanation of slide: Logos, to show while the audience arrive.

Using Randomized Evaluations to Improve Policy

Matching Methods & Propensity Scores

Matching Methods & Propensity Scores

Development Impact Evaluation in Finance and Private Sector

Impact Evaluation Methods

1 Causal Inference Counterfactuals False Counterfactuals

Impact Evaluation Toolbox

Matching Methods & Propensity Scores

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Impact Evaluation Methods: Difference in difference & Matching

Evaluating Impacts: An Overview of Quantitative Methods

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Impact Evaluation Designs for Male Circumcision

Sampling for Impact Evaluation -theory and application-

Applying Impact Evaluation Tools: Hypothetical Fertilizer Project

Presentation transcript:

Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon

Impact Evaluation 1)The “final” outcomes we care about - Identify and measure them

Measuring Impacts INPUTS OUTPUTS OUTCOMES IMPACTS difficulty of showing causality

Measuring Impacts HIV Awareness Campaign INPUTS OUTPUTS OUTCOMES IMPACTS Knowledge abt HIV, sexual behavior, incidence difficulty of showing causality No of Street Plays, Users targeted at govt. clinics

Impact Evaluation 1)The “final” outcomes we care about - Identify and measure them 2) True “causal” effect of the intervention - Counterfactual: What would have happened in the absence of the intervention? - Compare measured outcomes with counterfactual  Causal effect

Toolbox for Impact Evaluation Non or Quasi-Experimental 1)Before vs. After 2)With / Without Program 3)Difference –in-Difference 4)Discontinuity Methods 5)Multivariate Regression 6)Instrumental Variable Experimental Method (Gold Standard) 7) Randomized Evaluation

Naïve Comparisons 1) Before-After the program 2) With-Without the program

An Example – Millennium Village Project (MVP) Intensive package intervention to spark development in rural Africa –2005 to 2010 (and ongoing) –Non randomly selected sites –Poorer villages targeted

Before vs After

With vs. Without Program MVP villages Comparison Villages

With a little more data

Compare before and after intervention: A’ = Assumed outcome with no intervention B = Observed outcome B-A = Estimated impact Malnutrition Before-After Comparison AfterBefore B A’ 2004 A 2002 We observe that villagers provided with training experience an increase in malnutrition from 2002 to 2004.

Now compare villagers in the program region A to those in another region B. We find that our “program” villagers have a larger increase in malnutrition than those in region B. Did the program have a negative impact? –Not necessarily (remember program placement!) Region B is closer to the food aid distribution point, so villagers can access food easily (observable) Region A is less collectivized–villagers are less likely to share food with their malnourished neighbors (unobservable) With-Without Comparison

Problems with Simple Comparisons Before-After Problem: Many things change over time, not just the project. Omitted Variables. With-Without Problem: Comparing oranges with apples. Why did the enrollees decide to enroll? Selection bias.

Another example of Selection Bias Impact of Health Insurance Can we just compare those with and without health insurance? – Who purchases insurance? – What happens if we compare health of those who sign up, with those who don’t?

Change in outcome in the treatment group controlling for observable characteristics requires theorizing on what observable characteristics may impact the outcome of interest besides the programme Issues: – how many characteristics can be accounted for? (omission variable bias) – requires a large sample if many factors are to be controlled for Multivariate Regression

Compare before and after intervention: A’ = Assumed outcome with no intervention B = Observed outcome B-A = Estimated impact Control for other factors C = Actual outcome with no intervention (control) B-C = True Impact AfterBefore B A’ 2004 A 2002 Malnutrition C

To Identify Causal Links: We need to know: The change in outcomes for the treatment group What would have happened in the absence of the treatment (“counterfactual”) At baseline, comparison group must be identical (in observable and unobservable dimensions) to those receiving the program.

Creating the Counterfactual Women aged years

What we can observe Women aged years Screened by: income education level ethnicity marital status employment

What we can’t observe Women aged years What we can’t observe: risk tolerance entrepreneurialism generosity respect for authority

Selection based on Observables With Program Without Program

What can happen (unobservables) With Program Without Program

Randomization With Program Without Program

Randomization With Program Without Program

Methods & Study Designs Study Designs Clustering Phased Roll-Out Selective promotion Variation in Treatments Methods Toolbox Randomization Regression Discontinuity Difference in Differences Matching Instrumental Variables

Methods Toolbox Randomization Use a lottery to give all people an equal chance of being in control or treatment groups With a large enough sample, it guarantees that all factors/characteristics will be equal between groups, on average Only difference between 2 groups is the intervention itself “Gold Standard”

Ensuring Validity Random sample Randomization National Population Sample of National Population INTERNAL VALIDITY EXTERNAL VALIDITY

Ensuring Validity Internal Validity: - Have we estimated a causal effect? - Have we identified a valid counterfactual? External Validity: - Is our estimate of the impact “generalizable” - e.g. will IE results from Kerala generalize to Punjab?

Methods Toolbox Regression discontinuity When you can’t randomize… use a transparent & observable criterion for who is offered a program Examples: age, income eligibility, test scores (scholarship), political borders Assumption: There is a discontinuity in program participation, but we assume it doesn’t impact outcomes of interest.

Methods Toolbox Regression discontinuity Non-Poor Poor

Methods Toolbox Non-Poor Poor Regression discontinuity

Example: South Africa pensions program Methods Toolbox Regression discontinuity AgeWomenMen %5% %22% 65+60% What happens to children living in these households? GIRLS: pension-eligible women vs. almost eligible0.6 s.d. heavier for age BOYS: pension-eligible women vs. almost-eligible0.3 s.d. heavier for age With pension-eligible men vs almost-eligibleNo diff.

Limitation: Poor generalizability. Only measures those near the threshold. Must have well-enforced eligibility rule. Methods Toolbox Regression discontinuity

Pre Post Difference in differences 2-way comparison of observed changes in outcomes (Before-After) for a sample of participants and non- participants (With-Without) Methods Toolbox

Big assumption: In absence of program, participants and non-participants would have experienced the same changes. Robustness check: Compare the 2 groups across several periods prior to intervention; compare variables unrelated to program pre/post Can be combined with randomization, regression discontinuity, matching methods Difference in differences Methods Toolbox

Also called: Interrupted time series, trend break analysis Collect data on a continuous variable, across numerous consecutive periods (high frequency), to detect changes over time correlated with intervention. Challenges: Expectation should be well-defined upfront. Often need to filter out noise. Variant: Panel data analysis Time Series Analysis Methods Toolbox

Time Series Analysis Methods Toolbox

Pair each program participant with one or more non-participants, based on observable characteristics Big assumption required: In absence of program, participants and non-participants would have been the same. (No unobservable traits influence participation) **Combine with difference in differences Matching Methods Toolbox

Matching Methods Toolbox Propensity score assigned: score(xi) = Pr(Zi = 1|Xi = xi) Z=treatment assignment x=observed covariates Requires Z to be independent of outcomes… Works best when x is related to outcomes and selection (Z)

Matching Methods Toolbox

Matching Methods Toolbox Often need large data sets for baseline and follow-up: Household data Environmental & ecological data (context) Problems with ex-post matching: More bias with “covariates of convenience” (i.e. age, marital status, race, sex) Prospective if possible!

Instrumental Variables Methods Toolbox Example: Want to learn the effect of access to health clinics on maternal mortality Find a variable – “Instrument” – which affects access to health clinics but not maternal mortality directly. e.g. changes in distance of home to nearest clinic.

Overview of Study Designs Study Designs Clusters Phased Roll-Out Selective promotion Variation in Treatments Not everyone has access to the intervention at the same time (supply variation) The program is available to everyone (universal access or already rolled out)

Study Designs Cluster when spillovers or practical constraints prevent individual randomization. But… easier to get big enough sample if we randomize individuals Individual randomizationGroup randomization Cluster vs. Individual

Issues with Clusters Assignment at higher level sometimes necessary: –Political constraints on differential treatment within community –Practical constraints—confusing for one person to implement different versions –Spillover effects may require higher level randomization Many groups required because of within-community (“intraclass”) correlation Group assignment may also require many observations (measurements) per group

Example: Deworming School-based deworming program in Kenya: –Treatment of individuals affects untreated (spillovers) by interrupting oral-fecal transmission –Cluster at school instead of individual –Outcome measure: school attendance Want to find cost-effectiveness of intervention even if not all children are treated Findings: 25% decline in absenteeism

Possible Units for Assignment –Individual –Household –Clinic –Hospital –Village level –Women’s association –Youth groups –Schools

Unit of Intervention vs. Analysis 20 Intervention Villages 20 Comparison Villages

Target Population Evaluation Sample Unit of Intervention/ Randomization Unit of Analysis ControlTreatment Random Sample Random Assignment Random Sample again? Unit of Analysis

Study Designs Phased Roll-out Use when a simple lottery is impossible (because no one is excluded) but you have control over timing.

Phased Roll-Out Clusters A 1 2 3

Phased Roll-Out Clusters AB 1 Program 2 3

Phased Roll-Out Clusters ABC 1 Program 2 3

Phased Roll-Out Clusters ABCD 1 Program 2 3

Randomize who gets an extra push to participate in the program Already have the program Unlikely to join the program regardless Likely to join the program if pushed Study Designs Randomized Encouragement

Evaluation Toolbox Variation in Treatment

Advantages of “experiments” Clear and precise causal impact Relative to other methods – Much easier to analyze – Can be cheaper (smaller sample sizes) – Easier to explain – More convincing to policymakers – Methodologically uncontroversial 58

When to think impact evaluation? EARLY! Plan evaluation into your program design and roll-out phases When you introduce a CHANGE into the program – New targeting strategy – Variation on a theme – Creating an eligibility criterion for the program Scale-up is a good time for impact evaluation! – Sample size is larger 59

When is prospective impact evaluation impossible? Treatment was selectively assigned and announced and no possibility for expansion The program is over (retrospective) Universal take up already Program is national and non excludable – Freedom of the press, exchange rate policy (sometimes some components can be randomized) Sample size is too small to make it worth it 60