Impact Evaluation Methods: Difference in difference & Matching

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

Impact Evaluation Methods: Causal Inference
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Treatment Evaluation. Identification Graduate and professional economics mainly concerned with identification in empirical work. Concept of understanding.
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
Impact Evaluation Click to edit Master title style Click to edit Master subtitle style Impact Evaluation World Bank InstituteHuman Development Network.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Impact Evaluation Toolbox Gautam Rao University of California, Berkeley * ** Presentation credit: Temina Madon.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
S-005 Intervention research: True experiments and quasi- experiments.
Matching Estimators Methods of Economic Investigation Lecture 11.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
Framework of Preferred Evaluation Methodologies for TAACCCT Impact/Outcomes Analysis Random Assignment (Experimental Design) preferred – High proportion.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
WBI WORKSHOP Randomization and Impact evaluation.
Randomized Assignment Difference-in-Differences
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Do European Social Fund labour market interventions work? Counterfactual evidence from the Czech Republic. Vladimir Kváča, Czech Ministry of Labour and.
Impact Evaluation Methods Randomization and Causal Inference Slides by Paul J. Gertler & Sebastian Martinez.
Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Henrik Winterhager Econometrics III Before After and Difference in Difference Estimators 1 Overview of non- experimental approaches: Before After and Difference.
Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.
The Evaluation Problem Alexander Spermann, University of Freiburg, 2007/ The Fundamental Evaluation Problem and its Solution.
Looking for statistical twins
Quasi Experimental Methods I
General belief that roads are good for development & living standards
Quasi Experimental Methods I
Propensity Score Matching
An introduction to Impact Evaluation
Experimental Research Designs
Quasi-Experimental Methods
Impact Evaluation Methods
Explanation of slide: Logos, to show while the audience arrive.
March 2017 Susan Edwards, RTI International
Quasi-Experimental Methods
Impact evaluation: The quantitative methods with applications
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
Impact Evaluation Methods
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Impact Evaluation Toolbox
Matching Methods & Propensity Scores
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Evaluating Impacts: An Overview of Quantitative Methods
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
The European Statistical Training Programme (ESTP)
Explanation of slide: Logos, to show while the audience arrive.
Sampling for Impact Evaluation -theory and application-
Chapter: 9: Propensity scores
Applying Impact Evaluation Tools: Hypothetical Fertilizer Project
Positive analysis in public finance
Module 3: Impact Evaluation for TTLs
Non-Experimental designs
Steps in Implementing an Impact Evaluation
Presentation transcript:

Impact Evaluation Methods: Difference in difference & Matching Africa Program for Education Impact Evaluation Impact Evaluation Methods: Difference in difference & Matching David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J. Gertler & Sebastian Martinez AFRICA IMPACT EVALUATION INITIATIVE, AFTRL

Measuring Impact Randomized Experiments Quasi-experiments Randomized Promotion – Instrumental Variables Regression Discontinuity Double differences (Diff in diff) Matching

Impact = (Yt1-Yt0) - (Yc1-Yc0) Case 5: Diff in diff Compare change in outcomes between treatments and non-treatment Impact is the difference in the change in outcomes Impact = (Yt1-Yt0) - (Yc1-Yc0)

Outcome Treatment Group Control Group Time Treatment Average Treatment Effect Treatment Group Control Group Time Treatment

Outcome Treatment Group Control Group Time Treatment Measured effect without pre-measurement Treatment Group Control Group

EstimatedAverage Treatment Effect Outcome Average Treatment Effect EstimatedAverage Treatment Effect Treatment Group Control Group Time Treatment

Diff in diff What is the key difference between these two cases? Fundamental assumption that trends (slopes) are the same in treatments and controls (sometimes true, sometimes not) Need a minimum of three points in time to verify this and estimate treatment (two pre-intervention)

Third observation Second observation First observation Outcome Average Treatment Effect Treatment Group Third observation Control Group Second observation First observation Time Treatment

Examples Two neighboring school districts School enrollment or test scores are improving at same rate before the program (even if at different levels) One receives program, one does not Neighboring _______

Case 5: Diff in Diff Case 5 - Diff in Diff Not Enrolled Enrolled t-stat Mean change CPC 8.26 35.92 10.31 Linear Regression Multivariate Linear Regression Estimated Impact on CPC 27.66** 25.53** (2.68) (2.77) ** Significant at 1% level Case 5 - Diff in Diff

Impact Evaluation Example – Summary of Results Case 1 - Before and After Case 2 - Enrolled/Not Enrolled Case 3 - Randomization Case 4 - Regression Discontinuity Case 5 - Diff in Diff Multivariate Linear Multivariate Linear Estimated Impact on CPC 34.28** -4.15 29.79** 30.58** 25.53** (2.11) (4.05) (3.00) (5.93) (2.77) ** Significant at 1% level

Example Old-age pensions and schooling in South Africa Eligible if household member over 60 Not eligible if under 60 Used household with member age 55-60 Pensions for women and girls’ education

Measuring Impact Randomized Experiments Quasi-experiments Randomized Promotion – Instrumental Variables Regression Discontinuity Double differences (Diff in diff) Matching

Matching Pick the ideal comparison group that matches the treatment group from a larger survey. The matches are selected on the basis of similarities in observed characteristics. For example? This assumes no selection bias based on unobserved characteristics. Example: income Example: entrepreneurship Source: Martin Ravallion

Propensity-Score Matching (PSM) Controls: non-participants with same characteristics as participants In practice, it is very hard. The entire vector of X observed characteristics could be huge. Match on the basis of the propensity score P(Xi) = Pr (participationi=1|X) Instead of aiming to ensure that the matched control for each participant has exactly the same value of X, same result can be achieved by matching on the probability of participation. This assumes that participation is independent of outcomes given X (not true if important unobserved outcomes are affecting participation)

Steps in Score Matching Representative & highly comparable survey of non-participants and participants. Pool the two samples and estimate a logit (or probit) model of program participation: Gives the probability of participating for a person with X Restrict samples to assure common support (important source of bias in observational studies) For each participant find a sample of non-participants that have similar propensity scores Compare the outcome indicators. The difference is the estimate of the gain due to the program for that observation. Calculate the mean of these individual gains to obtain the average overall gain.

Density of scores for participants Region of common support High probability of participating given X 1 Propensity score

Steps in Score Matching Representative & highly comparable survey of non-participants and participants. Pool the two samples and estimate a logit (or probit) model of program participation: Gives the probability of participating for a person with X Restrict samples to assure common support (important source of bias in observational studies) For each participant find a sample of non-participants that have similar propensity scores Compare the outcome indicators. The difference is the estimate of the gain due to the program for that observation. Calculate the mean of these individual gains to obtain the average overall gain.

PSM vs an experiment Pure experiment does not require the untestable assumption of independence conditional on observables PSM requires large samples and good data

Lessons on Matching Methods Typically used for IE when neither randomization, RD or other quasi-experimental options are not possible (i.e. no baseline) Be cautious of ex-post matching: Matching on variables that change due to participation (i.e., endogenous) What are some variables that won’t change? Matching helps control for OBSERVABLE differences

More Lessons on Matching Methods Matching at baseline can be very useful: Estimation: Combine with other techniques (i.e. diff in diff) Know the assignment rule (match on this rule) Sampling: Selecting non-randomized control sample Need good quality data Common support can be a problem

Case 7: Matching Age Head -0.03 0.00 Educ Head -0.05 0.01 Age Spouse Case 7 - PROPENSITY SCORE: Pr(treatment=1) Variable Coef. Std. Err. Age Head -0.03 0.00 Educ Head -0.05 0.01 Age Spouse -0.02 Educ Spouse -0.06 Ethnicity 0.42 0.04 Female Head -0.23 0.07 Constant 1.6 0.10 P-score Quintiles Xi T C t-score Age Head 68.04 67.45 -1.2 53.61 53.38 -0.51 44.16 44.68 1.34 37.67 38.2 1.72 32.48 32.14 -1.18 Educ Head 1.54 1.97 3.13 2.39 2.69 1.67 3.25 3.26 -0.04 3.53 3.43 -0.98 2.98 3.12 1.96 Age Spouse 55.95 55.05 -1.43 46.5 46.41 0.66 39.54 40.01 1.86 34.2 34.8 1.84 29.6 29.19 -1.44 Educ Spouse 1.89 2.19 2.47 2.61 2.64 0.31 3.17 3.19 0.23 3.34 -0.78 2.37 2.72 1.99 Ethnicity 0.16 0.11 -2.81 0.24 0.27 -1.73 0.3 0.32 1.04 0.14 0.13 -0.11 0.7 -2.3 Female Head 0.19 0.21 0.92 0.42 -1.4 0.092 0.088 -0.35 0.35 -0.34 0.008 0.83 Quintile 4 Quintile 5 Quintile 1 Quintile 2 Quintile 3

Case 7: Matching 1.16 7.06+ Estimated Impact on CPC (3.59) (3.65) Linear Regression Multivariate Linear Regression Estimated Impact on CPC 1.16 7.06+ (3.59) (3.65) ** Significant at 1% level, + Significant at 10% level Case 7 - Matching

Impact Evaluation Example – Summary of Results Case 1 - Before and After Case 2 - Enrolled/Not Enrolled Case 3 - Randomization Case 4 - Regression Discontinuity Case 5 - Diff in Diff Case 6 - IV (TOT) Case 7 - Matching Multivariate Linear Multivariate Linear 2SLS Estimated Impact on CPC 34.28** -4.15 29.79** 30.58** 25.53** 30.44** 7.06+ (2.11) (4.05) (3.00) (5.93) (2.77) (3.07) (3.65) ** Significant at 1% level

Measuring Impact Experimental design/randomization Quasi-experiments Regression Discontinuity Double differences (Diff in diff) Other options Instrumental Variables Matching Combinations of the above

Remember….. Objective of impact evaluation is to estimate the CAUSAL effect of a program on outcomes of interest In designing the program we must understand the data generation process behavioral process that generates the data how benefits are assigned Fit the best evaluation design to the operational context

Design When to use Advantages Disadvantages Randomization Whenever possible When an intervention will not be universally implemented Gold standard Most powerful Not always feasible Not always ethical Random Promotion When an intervention is universally implemented Learn and intervention Only looks at sub-group of sample Regression Discontinuity If an intervention is assigned based on rank Assignment based on rank is common Only look at sub-group of sample Double differences If two groups are growing at similar rates Eliminates fixed differences not related to treatment Can be biased if trends change Matching One other methods are not possible Overcomes observed differences between treatment and comparison Assumes no unobserved differences (often implausible)