Presentation on theme: "An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley"— Presentation transcript:
1An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley Impact Evaluation:An OverviewLori Beaman, PhDRWJF Scholar in Health PolicyUC Berkeley
2What is Impact Evaluation? IE assesses how a program affects the well-being or welfare of individuals, households or communities (or businesses)Well-being at the individual level can be captured by income & consumption, health outcomes or ideally bothAt the community level, poverty levels or growth rates may be appropriate, depending on the question
3Outline Advantages of Impact Evaluation Challenges for IE: Need for Comparison GroupsMethods for Constructing Comparison
4IE Versus other M&E Tools The key distinction between impact evaluation and other M&E tools is the focus on discerning the impact of the program from all other confounding effectsIE seeks to provide evidence of the causal link between an intervention and outcomes
5Monitoring and IE IMPACT OUTPUTS OUTCOMES INPUTS Effect on living standards and welfare-infant and child mortality,improved household incomeFinancial and physical resourcesspending in primary health careGoods and services generatednumber of nursesavailability of medicineAccess, usage and satisfaction of usersnumber of children vaccinated,percentage within 5 km ofhealth centerExample here is an agricultural extension project
6Monitoring and IE IMPACTS Program impacts confounded by local, national, global effectsdifficulty of showing causalityOUTCOMESUsers meet service deliveryOUTPUTSGov’t/program production functionINPUTS
7Logic Model: An Example Consider a program of providing Insecticide-Treated Nets (ITNs) to poor householdsWhat are:Inputs?Outputs?Outcomes?Impacts?
8Logic Model: An Example Inputs: # of ITNs; # of health or NGO employees to help disseminationOutputs: # of ITNs received by HHsOutcomes: ITNs utilized by # of householdsImpact: Reduction in illness from malaria; increase in income; improvements in children’s school attendance and performance
9Advantages of IEIn order to be able to determine which projects are successful, need a carefully designed impact evaluation strategyThis is useful for:Understanding if projects worked:Justification for fundingScaling upMeta-analysis: Learning from OthersCost-benefit tradeoffs across projectsCan test between different approaches of same program or different projects to meet national indicatorTalk about PROGRESA and advantages for continuity in political process.
10Essential Methodology Difficulty is determining what would have happened to the individuals or communities of interest in absence of the projectThe key component to an impact evaluation is to construct a suitable comparison group to proxy for the “counterfactual”Problem: can only observe people in one state of the world at one time
11Before/After Comparisons Why not collect data on individuals before and after intervention (the Reflexive)? Difference in income, etc, would be due to projectProblem: many things change over time, including the projectThe country is growing and ITN usage is increasing generally (from in NetMark data), so how do we know an increase in ITN use is due to the program or would have occurred in absence of program?Many factors affect malaria rate in a given year
12Example: Providing Insecticide-Treated Nets (ITNs) to Poor Households The intervention: provide free ITNs to households in ZamfaraProgram targets poor areasWomen have to enroll at local NGO office in order to receive bednetsStarts in 2002, ends in 2003, we have data on malaria rates fromScenario 1: we observe that the households in Zamfara we provided bednets to have an increase malaria from 2002 to 2003
13Basic Problem of Impact Evaluation: Scenario 1 Underestimated Impact whenusing before/after comparisons: High rainfall yearMalaria RateZamfara householdswith bednetsImpact = C – A?An increase inmalaria rate!COpposite situation where the project is implemented in a good year. We would erroneously attribute the difference of the project to be B-A. In reality, all HH would have been better off in period Y3 compared to Y2 even without the program. In this case, the true impact is B-C.A20012002Treatment Period20032004Years
14Basic Problem of Impact Evaluation: Scenario 1 Underestimated Impact whenusing before/after comparisons: High rainfall yearMalaria Rate“Counterfactual”Zamfara Households ifno bednets providedBZamfara householdswith bednetsImpact = C – BA Decline in theMalaria Rate!CImpact ≠ C - AOpposite situation where the project is implemented in a good year. We would erroneously attribute the difference of the project to be B-A. In reality, all HH would have been better off in period Y3 compared to Y2 even without the program. In this case, the true impact is B-C.A20012002Treatment Period20032004Years
15Basic Problem of Impact Evaluation: Scenario 2 Overestimated Impact: Bad RainfallMalaria Rate“Counterfactual”(Zamfara householdsif no bednets provided)Impact ≠ C - ABExample: Drought. Imagine a land titling project which is intended to improve investments in land & therefore increase income. The project is implemented and the following year a drought hits. Then you might see that there was no effect on household income. (Comparing B and A) However, the project had increased income compared to what would have happened in absence of the project. (B -C) In this case, incomes would be significantly negative if the project had not been implemented.AZamfara householdsCTRUE Impact =C - B20012002Treatment Period20032004Years
16Comparison GroupsInstead of using before/after comparisons, we need to use comparison groups to proxy for the counterfactualTwo Core Problems in Finding Suitable Groups:Programs are targetedRecipients receive intervention for particular reasonParticipation is voluntaryIndividuals who participate differ in observable and unobservable ways (selection bias)Hence, a comparison of participants and an arbitrary group of non-participants can lead to misleading or incorrect resultsHence, similar to what I just showed for before/after comparisons, using arbitrary control groups can lead to misleading or incorrect results.
17Comparison 1: Treatment and Region B Scenario 1: Failure of reflexive comparison due to higher rainfall, and everyone experienced an increase in malaria ratesWe compare the households in the program region to those in another regionWe find that our “treatment” households in Zamfara have a larger increase in malaria rates than those in region B, Oyo. Did the program have a negative impact?Not necessarily! Program placement is important:Region B has better sanitation and therefore affected less by rainfall (unobservable)
18Basic Problem of Impact Evaluation: Program Placement High RainfallMalaria rateDTRUE IMPACT: E-DE“Treatment”: ZamfaraA20012002Treatment Period20032004Years
19Basic Problem of Impact Evaluation: Program Placement Underestimated Impact when using region Bcomparison group: High RainfallMalaria rateE-A > C-B : Region Baffected less by rainfallRegion B: OyoCBDTRUE IMPACT: E-DE“Treatment”: ZamfaraA20012002Treatment Period20032004Years
20Comparison 2: Treatment vs. Neighbors We compare “treatment” households with their neighbors. We think the sanitation and rainfall patterns are about the same.Scenario 2: Let’s say we observe that treatment households’ malaria rates decrease more than comparison households. Did the program work?Not necessarily: There may be two types of households: types A and B, with A knowing how malaria is transmitted and also burn mosquito coilsType A households were more likely to register with the program. However, their other characteristics mean they would have had lower malaria rates in the absence of the ITNs (individual unobservables).
21Basic Problem of Impact Evaluation: Selection Bias Comparing Project Beneficiaries (Type A) toNeighbors (Type B)Malaria RatesType B HHsObserveddifferenceType A HHs with ProjectMalaria rates are going down overall over time (ITNs being adopted without project, for example.) But Type A households are generally experiencing a quicker decline even without project. With project, however, they experience a larger decline. But the true impact is a much smaller part of difference between Type A with project and Type B households than true impact.Y1Y2Treatment PeriodY3Y4Years
22Basic Problem of Impact Evaluation: Selection Bias Participants are often different than Non-participantsMalaria RatesType B HHsSelection BiasObserveddifferenceTrue ImpactType A HouseholdsType A HHs with ProjectMalaria rates are going down overall over time (ITNs being adopted without project, for example.) But Type A households are generally experiencing a quicker decline even without project. With project, however, they experience a larger decline. But the true impact is a much smaller part of difference between Type A with project and Type B households than true impact.Y1Y2Treatment PeriodY3Y4Years
23Basic Problem of Impact Evaluation: Spillover Effects Another difficulty finding a true counterfactual has to do will spillover or contagion effectsExample: ITNs will not only reduce malaria rates for those sleeping under nets, but also may lower overall rates because ITNs kill mosquitoesProblem: children who did not receive “treatment” may also have lower malaria rates – and therefore higher school attendance ratesGenerally leads to underestimate of treatment effect
24Basic Problem of Impact Evaluation: Spillover Effects School Attendance“Treatment” ChildrenB“Control” Group ofChildren inNeighborhood SchoolImpact ≠ B - CImpact = B - ACC>A due to spilloverfrom treatmentchildrenOpposite situation where the project is implemented in a good year. We would erroneously attribute the difference of the project to be B-A. In reality, all HH would have been better off in period Y3 compared to Y2 even without the program. In this case, the true impact is B-C.A20012002Treatment Period20032004Years
25Counterfactual: Methodology We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.Number of techniques:Randomization as gold standardVarious Techniques of Matching
26How to construct a comparison group – building the counterfactual RandomizationDifference-in-DifferenceRegression discontinuityMatchingPipeline comparisonsPropensity score
271. RandomizationIndividuals/communities/firms are randomly assigned into participationCounterfactual: randomized-out groupAdvantages:Often addressed to as the “gold standard”: by design: selection bias is zero on average and mean impact is revealedPerceived as a fair process of allocation with limited resources
28Randomization: Disadvantages Ethical issues, political constraintsInternal validity (exogeneity): people might not comply with the assignment (selective non-compliance)External validity (generalizability): usually run controlled experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.Does not always solve problem of spillovers
29When to RandomizeIf funds are insufficient to treat all eligible recipientsRandomization can be the most fair and transparent approachThe program is administered at the individual, household or community levelHigher level of implementation difficult: example – trunk roadsProgram will be scaled-up: learning what works is very valuable
302. Difference-in-difference Observations over time: compare observed changes in the outcomes for a sample of participants and non-participantsIdentification assumption: the selection bias or unobservable characteristics are time-invariant (‘parallel trends’ in the absence of the program)Counter-factual: changes over time for the non-participantsDifference-in-difference also known as double difference of second difference.
31Diff-in-Diff: Continued Constraint: Requires at least two cross-sections of data, pre-program and post-program on participants and non-participantsNeed to think about the evaluation ex-ante, before the programMore valid if there are 2 pre-periods so can observe whether trend is sameCan be in principle combined with matching to adjust for pre-treatment differences that affect the growth rate
32Implementing differences in differences: Different Strategies Some arbitrary comparison groupMatched diff in diffRandomized diff in diffThese are in order of more problems less problems, think about this as we look at this graphically
33Essential Assumptions of Diff-in-Diff Initial difference must be time invariantIn absence of program, the change over time would be identicalThe earlier example where the drought differentially affected the treatment regions would be a violation of this assumption. If rainfall was observed, then this could still be a strategy.
34Difference-in-Difference in ITN Example Instead of comparing Zamfara to Oyo, compare Zamfara to Niger if:While Zamfara and Oyo have different malaria rates and different ITN usage, we expect that they change in parallelUse NetMark data to compare 2000 to 2003 in Zamfara and Niger statesUse additional data (GHS, NLSS) to compare incomes and sanitation infrastructure levels and changes prior to program implementation
353. Regression discontinuity design Exploit the rule generating assignment into a program given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomesCounterfactual: individuals just below the cut-off who did not participateAdvantages:“Identification” built in the program designDelivers marginal gains from the program around the eligibility cut-off point. Important for program expansionDisadvantages:Threshold has to be applied in practice, and individuals should not be able manipulate the score used in the program to become eligible
36RDD in ITN Example Program available for poor households Eligibility criteria: must be below the national poverty line or < 1 ha of landTreatment group: those below cut-offThose with income below the poverty line and therefore qualified for ITNsComparison group: those right above the cutoffThose with income just above poverty line and therefore not-eligible
37RDD in ITN Example Problems: How well enforced was the rule? Can the rule be manipulated?Local effect: may not be generalizable if program expands to households well above poverty lineParticularly relevant since NetMark data indicate low ITN usage across all socio-economic status groups
384. MatchingMatch participants with non-participants from a larger surveyCounterfactual: matched comparison groupEach program participant is paired with one or more non-participant that are similar based on observable characteristicsAssumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneityWhen the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)
394. Matching Advantages: Disadvantages: Does not require randomization, nor baseline (pre-intervention data)Disadvantages:Strong identification assumptionsIn many cases, may make interpretation of results very difficultRequires very good quality data: need to control for all factors that influence program placementRequires significantly large sample size to generate comparison group
40Matching in PracticeUsing statistical techniques, we match a group of non-participants with participants using variables like gender, household size, education, experience, land size (rainfall to control for drought), irrigation (as many observable characteristics not affected by program intervention)One common method: Propensity Score Matching
41Matching in Practice: 2 Approaches Approach 1: After program implementation, we match (within region) those who received ITNs with those who did not. Problem?Problem: likelihood of usage of different households is unobservable, so not included in propensity scoreThis creates selection biasApproach 2: The program is allocated based on land size. After implementation, we match those eligible in region A with those in region B. Problem?Problems: same issues of individual unobservables, but lessened because we compare eligible to potential eligibleNow problem of unobservable factors across regionsEligible vs. Potential eligible: those who have land (or land of a particular size) may vary substantially. Especially if baseline data is not very accurate or non-existent, we may be comparing individuals of different income or wealth levels. Or individuals who are politically connected in the village & can therefore take advantage of new opportunities better.Regional unobservables: land quality, soil type. This data is sometimes available, then very valuable. But many factors: availability of certain products may vary across regions. Different ethnic groups may differ in their agricultural practices as well.
42An extension of matching: pipeline comparisons Idea: compare those just about to get an intervention with those getting it nowAssumption: the stopping point of the intervention does not separate two fundamentally different populationsExample: extending irrigation networksIn ITN example: If only some communities within Zamfara receive ITNs in round 1: compare them to nearby communities will receive ITNs in round 2Difficulty with Infrastructure: Spillover effects may be strong or anticipatory effect