Quasi-Experimental Methods

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley
The World Bank Human Development Network Spanish Impact Evaluation Fund.
N ON -E XPERIMENTAL M ETHODS Shwetlena Sabarwal (thanks to Markus Goldstein for the slides)
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advantages and limitations of non- and quasi-experimental methods Module 2.2.
#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
Heterogeneous impact of the social program Oportunidades on contraceptive methods use in young adult women living in rural areas: limitations of the regression.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Non Experimental Design in Education Ummul Ruthbah.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Matching Estimators Methods of Economic Investigation Lecture 11.
Session III Regression discontinuity (RD) Christel Vermeersch LCSHD November 2006.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.
CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.
Randomized Assignment Difference-in-Differences
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.
Do European Social Fund labour market interventions work? Counterfactual evidence from the Czech Republic. Vladimir Kváča, Czech Ministry of Labour and.
The Evaluation Problem Alexander Spermann, University of Freiburg 1 The Fundamental Evaluation Problem and its Solution SS 2009.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
1 An introduction to Impact Evaluation (IE) for HIV/AIDS Programs March 12, 2009 Cape Town Léandre Bassolé ACTafrica, The World Bank.
Henrik Winterhager Econometrics III Before After and Difference in Difference Estimators 1 Overview of non- experimental approaches: Before After and Difference.
Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.
The Evaluation Problem Alexander Spermann, University of Freiburg, 2007/ The Fundamental Evaluation Problem and its Solution.
Looking for statistical twins
Differences-in-Differences
Research Department Inter-American Development Bank
Measuring Results and Impact Evaluation: From Promises into Evidence
Quasi Experimental Methods I
General belief that roads are good for development & living standards
Quasi Experimental Methods I
An introduction to Impact Evaluation
Difference-in-Differences
Impact Evaluation Methods
Making the Most out of Discontinuities
Quasi-Experimental Methods
Chapter Eight: Quantitative Methods
Impact evaluation: The quantitative methods with applications
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
Development Impact Evaluation in Finance and Private Sector
Impact Evaluation Methods
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Jeremiah Maller Partner Organization: Operation Smile
Impact Evaluation Toolbox
Matching Methods & Propensity Scores
Lesson Using Studies Wisely.
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
Sampling and Power Slides by Jishnu Das.
Explanation of slide: Logos, to show while the audience arrive.
Sampling for Impact Evaluation -theory and application-
What are their purposes? What kinds?
Applying Impact Evaluation Tools: Hypothetical Fertilizer Project
Steps in Implementing an Impact Evaluation
Presentation transcript:

Quasi-Experimental Methods Jean-Louis Arcand The Graduate Institute | Geneva jean-louis.arcand@graduateinstitute.ch This presentation draws on previous presentations by Markus Goldstein, Leandre Bassole, and Alberto Martini

Objective Reality check Find a plausible counterfactual Every method is associated with an assumption The stronger the assumption the more we need to worry about the causal effect Question your assumptions Reality check

Program to evaluate Hopetown HIV/AIDS Program (2008-2012) Objectives Reduce HIV transmission Intervention: Peer education Target group: Youth 15-24 Indicator: Pregnancy rate (proxy for unprotected sex)

I. Before-after identification strategy (aka reflexive comparison) Counterfactual: Rate of pregnancy observed before program started EFFECT = After minus Before

Year Number of areas Teen pregnancy rate (per 1000) 2008 70 62.90 2012 66.37 Difference +3.47

Counterfactual assumption: no change over time Effect = +3.47 Intervention Question: what else might have happened in 2008-2012 to affect teen pregnancy?

Examine assumption with prior data Number of areas Teen pregnancy (per 1000) 2004 2008 2012 70 54.96 62.90 66.37 Assumption of no change over time looks a bit shaky

Teen pregnancy rate (per 1000) in 2012 II. Non-participant identification strategy Counterfactual: Rate of pregnancy among non-participants Teen pregnancy rate (per 1000) in 2012 Participants 66.37 Non-participants 57.50 Difference +8.87

Question: how might participants differ from non-participants? Counterfactual assumption: Without intervention participants have same pregnancy rate as non-participants Participants Effect = +8.87 Non-participants Question: how might participants differ from non-participants?

Test assumption with pre-program data ? REJECT counterfactual hypothesis of same pregnancy rates

III. Difference-in-Difference identification strategy Counterfactual: Nonparticipant rate of pregnancy, purging pre-program differences in participants/nonparticipants “Before” rate of pregnancy, purging before-after change for nonparticipants 1 and 2 are equivalent

Average rate of teen pregnancy in 2008 2012 Difference (2008-2012) Participants (P) 62.90 66.37 3.47 Non-participants (NP) 46.37 57.50 11.13 Difference (P-NP) 16.53 8.87 -7.66

Effect = 3.47 – 11.13 = - 7.66 Participants 66.37 – 62.90 = 3.47 57.50 - 46.37 = 11.13 Non-participants

Effect = 8.87 – 16.53 = - 7.66 Before After 66.37 – 57.50 = 8.87 62.90 – 46.37 = 16.53 After

Counterfactual assumption: Without intervention participants and nonparticipants’ pregnancy rates follow same trends

74.0 16.5

74.0 -7.6

Questioning the assumption Why might participants’ trends differ from that of nonparticipants?

Examine assumption with pre-program data Average rate of teen pregnancy in 2004 2008 Difference (2004-2008) Participants (P) 54.96 62.90 7.94 Non-participants (NP) 39.96 46.37 6.41 Difference (P=NP) 15.00 16.53 +1.53 ? Or with other outcomes not affected by the intervention: household consumption counterfactual hypothesis of same trends doesn’t look so believable

IV. Matching with Difference-in-Difference identification strategy Counterfactual: Comparison group is constructed by pairing each program participant with a “similar” nonparticipant using larger dataset – creating a control group from similar (in observable ways) non-participants

Counterfactual assumption: Unobserved characteristics do not affect outcomes of interest Unobserved = things we cannot measure (e.g. ability) or things we left out of the dataset Question: how might participants differ from matched nonparticipants?

Matched nonparticipant 73.36 Effect = - 7.01 66.37 Matched nonparticipant Participant

Can only test assumption with experimental data Studies that compare both methods (because they have experimental data) find that: unobservables often matter! direction of bias is unpredictable! Apply with care – think very hard about unobservables

V. Regression discontinuity identification strategy Applicability: When strict quantitative criteria determine eligibility Counterfactual: Nonparticipants just below the eligibility cutoff are the comparison for participants just above the eligibility cutoff

Counterfactual assumption: Nonparticipants just below the eligibility cutoff are the same (in observable and unobservable ways) as participants just above the eligibility cutoff Question: Is the distribution around the cutoff smooth? Then, assumption might be reasonable Question: Are unobservables likely to be important (e.g. correlated with cutoff criteria)? Then, assumption might not be reasonable However, can only estimate impact around the cutoff, not for the whole program

Example: Effect of school inputs on test scores Target transfer to poorest schools Construct poverty index from 1 to 100 Schools with a score <=50 are in Schools with a score >50 are out Inputs transfer to poor schools Measure outcomes (i.e. test scores) before and after transfer

Non-Poor Poor

Treatment Effect

Applying RDD in practice: Lessons from an HIV-nutrition program Lesson 1: criteria not applied well Multiple criteria: hh size, income level, months on ART Nutritionist helps her friends fill out the form with the “right” answers Now – unobservables separate treatment from control… Lesson 2: Watch out for criteria that can be altered (e.g. land holding size)

Summary Gold standard is randomization – minimal assumptions needed, intuitive estimates Nonexperimental requires assumptions – can you defend them?

Different assumptions will give you different results The program: ART treatment for adult patients Impact of interest: effect of ART on children of patients (are there spillover & intergenerational effects of treatment?) Child education (attendance) Child nutrition Data: 250 patient HHs 500 random sample HHs Before & after treatment Can’t randomize ART so what is the counterfactual

Possible counterfactual candidates Random sample difference in difference Are they on the same trajectory? Orphans (parents died – what would have happened in absence of treatment) But when did they die, which orphans do you observe, which do you not observe? Parents self report moderate to high risk of HIV Self report! Propensity score matching Unobservables (so why do people get HIV?)

Estimates of treatment effects using alternative comparison groups Effects are now very large and significant for all kids, particularly in newly treated households Effects represent 30-50% increase relative to orphans and high/mod risk kids within first 100 days of treatment (base =29 hrs) Boys continue to experience large increases after 100 days SEGUE: We can also estimate average treatment on treated using propensity score approach Compare to around 6.4 if we use the simple difference in difference using the random sample Standard errors clustered at the household level in each round. Includes child fixed effects, round 2 indicator and month-of-interview indicators.

Estimating ATT using propensity score matching Allows us to define comparison group using more than one characteristic of children and their households Propensity scores defined at household level, with most significant variables being single-headed household and HIV risk As such, we can view propensity score approach as hybrid of prior comparisons SEGUE: First present the probit results for propensity score regression

Probit regression results Dependent variable: household has adult ARV recipient Interesting to note that neither wealth nor travel time were strong predictors of treatment. Use these coefficients to calculate propensity scores for everyone in our sample SEGUE: Turning to results

ATT using propensity score matching Nearest neighbor is significant at less than 5% level and kernel nearly so Two things to note: - really power constrained to do matching - do not break into newly and veteran treated due to power issues - relative to other comparisons, number quite similar - 7.94 hours when compared to orphans - 7.85 hours when compared to high/mod risk households SEGUE: Now ready to turn attention to nutrition

Nutritional impacts of ARV treatment Very large increase is BMI z-scores of 0.57 standard deviations reassuring that do not see significant changes in height-for-age which is longer term measure of nutrition Also see significant 11% decrease in wasting does not kick in until after parent treated for a while NOTE: Interviewer fixed effects because measuring height tricky and may depend on skill and experience of interviewer SEGUE: What about alternative comparison groups? Includes child fixed effects, age controls, round 2 indicator, interviewer fixed effects, and month-of-interview indicators.

Nutrition with alternative comparison groups Comparison to orphans insignificant, but only 7 orphans under 5 in random sample Comparison to high/mod risk hhs reveals larger nutritional impacts – increase in z-score of 0.768 standard deviations context: Duflo’s (2003) work on pensions in SA found 1.19 improvement in z-score NOTE: not enough power to do propensity score approach here SEGUE: OK. So what is the take away? Includes child fixed effects, age controls, round 2 indicator, interviewer fixed effects, and month-of-interview indicators.

Summary: choosing among non-experimental methods At the end of the day, they can give us quite different estimates (or not, in some rare cases) Which assumption can we live with?

Thank You