32 parts Impact evaluation methods Impact evaluation practicalities: IE and the project cycleUse rural project examples
4Outline - methods Monitoring and impact evaluation Why do impact evaluationWhy we need a comparison groupMethods for constructing the comparison groupWhen to do an impact evaluation
5Monitoring and IEExample here is an agricultural extension project
6Monitoring and IE IMPACTS Program impacts confounded by local, national, global effectsdifficulty of showing causalityOUTCOMESUsers meet service deliveryOUTPUTSGov’t/program production functionINPUTS
7Impact evaluationMany names (e.g. Rossi et al call this impact assessment) so need to know the concept.Impact is the difference between outcomes with the program and without itThe goal of impact evaluation is to measure this difference in a way that can attribute the difference to the program, and only the program
8Why it mattersWe want to know if the program had an impact and the average size of that impactUnderstand if policies workJustification for program (big $$)Scale up or not – did it work?Meta-analyses – learning from others(with cost data) understand the net benefits of the programUnderstand the distribution of gains and losses
9What we need The difference in outcomes with the program versus without the program – for the same unit of analysis (e.g. individual)Problem: individuals only have one existenceHence, we have a problem of a missing counter-factual, a problem of missing data
10Thinking about the counterfactual Why not compare individuals before and after (the reflexive)?The rest of the world moves on and you are not sure what was caused by the program and what by the rest of the worldWe need a control/comparison group that will allow us to attribute any change in the “treatment” group to the program (causality)
11comparison group issues Two central problems:Programs are targeted Program areas will differ in observable and unobservable ways precisely because the program intended thisIndividual participation is (usually) voluntaryParticipants will differ from non-participants in observable and unobservable waysHence, a comparison of participants and an arbitrary group of non-participants can lead to heavily biased results
12Example: providing fertilizer to farmers The intervention: provide fertilizer to farmers in a poor region of a country (call it region A)Program targets poor areasFarmers have to enroll at the local extension office to receive the fertilizerStarts in 2002, ends in 2004, we have data on yields for farmers in the poor region and another region (region B) for both yearsWe observe that the farmers we provide fertilizer to have a decrease in yields from 2002 to 2004
13Did the program not work? Further study reveals there was a national drought, and everyone’s yields went down (failure of the reflexive comparison)We compare the farmers in the program region to those in another region. We find that our “treatment” farmers have a larger decline than those in region B. Did the program have a negative impact?Not necessarily (program placement)Farmers in region B have better quality soil (unobservable)Farmers in the other region have more irrigation, which is key in this drought year (observable)
14OK, so let’s compare the farmers in region A We compare “treatment” farmers with their neighbors. We think the soil is roughly the same.Let’s say we observe that treatment farmers’ yields decline by less than comparison farmers. Did the program work?Not necessarily. Farmers who went to register with the program may have more ability, and thus could manage the drought better than their neighbors, but the fertilizer was irrelevant. (individual unobservables)Let’s say we observe no difference between the two groups. Did the program not work?Not necessarily. What little rain there was caused the fertilizer to run off onto the neighbors’ fields. (spillover/contamination)
15The comparison groupIn the end, with these naïve comparisons, we cannot tell if the program had an impact We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.
16How to construct a comparison group – building the counterfactual RandomizationMatchingDifference-in-DifferenceInstrumental variablesRegression discontinuity
171. RandomizationIndividuals/communities/firms are randomly assigned into participationCounterfactual: randomized-out groupAdvantages:Often addressed to as the “gold standard”: by design: selection bias is zero on average and mean impact is revealedPerceived as a fair process of allocation with limited resourcesDisadvantages:Ethical issues, political constraintsInternal validity (exogeneity): people might not comply with the assignment (selective non-compliance)Unable to estimate entry effectExternal validity (generalizability): usually run controlled experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.
18Randomization in our example… Simple answer: randomize farmers within a community to receive fertilizer...Potential problems?Run-off (contamination) so control for thisTake-up (what question are we answering)
192. MatchingMatch participants with non-participants from a larger surveyCounterfactual: matched comparison groupEach program participant is paired with one or more non-participant that are similar based on observable characteristicsAssumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneityWhen the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)
202. Matching Advantages: Disadvantages: Does not require randomization, nor baseline (pre-intervention data)Disadvantages:Strong identification assumptionsRequires very good quality data: need to control for all factors that influence program placementRequires significantly large sample size to generate comparison group
21Matching in our example… Using statistical techniques, we match a group of non-participants with participants using variables like gender, household size, education, experience, land size (rainfall to control for drought), irrigation (as many observable charachteristics not affected by fertilizer)
22Matching in our example… 2 scenarios Scenario 1: We show up afterwards, we can only match (within region) those who got fertilizer with those who did not. Problem?Problem: select on expected gains and/or ability (unobservable)Scenario 2: The program is allocated based on historical crop choice and land size. We show up afterwards and match those eligible in region A with those in region B. Problem?Problems: same issues of individual unobservables, but lessened because we compare eligible to potential eligiblenow unobservables across regions
23An extension of matching: pipeline comparisons Idea: compare those just about to get an intervention with those getting it nowAssumption: the stopping point of the intervention does not separate two fundamentally different populationsexample: extending irrigation networks
243. Difference-in-difference Observations over time: compare observed changes in the outcomes for a sample of participants and non-participantsIdentification assumption: the selection bias is time-invariant (‘parallel trends’ in the absence of the program)Counter-factual: changes over time for the non-participantsConstraint: Requires at least two cross-sections of data, pre-program and post-program on participants and non-participantsNeed to think about the evaluation ex-ante, before the programCan be in principle combined with matching to adjust for pre-treatment differences that affect the growth rate
25Implementing differences in differences in our example… Some arbitrary comparison groupMatched diff in diffRandomized diff in diffThese are in order of more problems less problems, think about this as we look at this graphically
26As long as the bias is additive and time-invariant, diff-in-diff will work ….
27What if the observed changes over time are affected?
284. Instrumental Variables Identify variables that affects participation in the program, but not outcomes conditional on participation (exclusion restriction)Counterfactual: The causal effect is identified out of the exogenous variation of the instrumentAdvantages:Does not require the exogeneity assumption of matchingDisadvantages:The estimated effect is local: IV identifies the effect of the program only for the sub-population of those induced to take-up the program by the instrumentTherefore different instruments identify different parameters. End up with different magnitudes of the estimated effectsValidity of the instrument can be questioned, cannot be tested.
29IV in our exampleIt turns out that outreach was done randomly…so the time/intake of farmers into the program is essentially random.We can use this as an instrumentProblems?Is it really random? (roads, etc)
305.Regression discontinuity design Exploit the rule generating assignment into a program given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomesCounterfactual: individuals just below the cut-off who did not participateAdvantages:Identification built in the program designDelivers marginal gains from the program around the eligibility cut-off point. Important for program expansionDisadvantages:Threshold has to be applied in practice, and individuals should not be able manipulate the score used in the program to become eligible.
32RDD in our example…Back to the eligibility criteria: land size and crop historyWe use those right below the cut-off and compare them with those right above…Problems:How well enforced was the rule?Can the rule be manipulated?Local effect
33Discussion example: building a control group for irrigation Scenario: we have a project to extend existing reaches and build some new canalAn initial analysis shows that farmers who are newly irrigated have increased yield…was the project a success?What is the evaluation question?What is a logical comparison group and method?
34Investment operation vs adjustment/budget support ProjectMaybe evaluate all, but unlikelyPick subcomponentsAdjustment/budget supportBuild a strong M&E unitImpact evaluation designed by govtEvaluate policy reform pilotse.g. health insurance pilot, P4P, tariff changesAnything economy wide ≠ impact evaluation
35Prioritizing for Impact Evaluation It is not cheap – relative to monitoringPossible prioritization criteria:Don’t know if policy is effectivee.g. conditional cash transfersPoliticse.g. Argentina workfare programIt’s a lot of moneyNote that 2 & 3 are variants of not “knowing” – in this context, etc.
36Summing up: MethodsNo clear “gold standard” in reality – do what works best in the contextWatch for unobservables, but don’t forget observablesBe flexible, be creative – use the contextIE requires good monitoring and monitoring will help you understand the effect size
38Objective of this part of the presentation Walk you through what it takes to do an impact evaluation for your project from Identification to ICRPersuade you that impact evaluation will add value to your project
39We will talk about… General Principles In the context of 3 project periods:Evaluation activities – the core issues for evaluation design and implementation, andHousekeeping activities—procedural, administrative and financial management issuesWhere to go for assistance
40Some general principles Government ownership as whole—what matters is institutional buy-in so that the results get usedRelevance and applicability—asking the right questionsFlexibility and adaptabilityHorizon matters
41OwnershipIE can provide one avenue to build institutional capacity and a culture of managing-by-results – so the IE should be as widely owned within gov’t as possibleAgree on a dissemination plan to maximize use of results for policy development.Identify entry points in project and policy cyclesmidpoint and closing, for project;sector reporting, CGs, MTEF, budget, for WBBudget cycles, policy reviews for gov’tUse partnerships with local academics to build local capacity for impact evaluation.
42Relevance and Applicability For an evaluation to be relevant, it must be designed to respond to the policy questions that are of importance.Clarifying early what it is that will be learned and designing the evaluation to that end will go some way to ensure that the recommendations of the evaluation will feed into policy making.Make sure to to think about unintended consequences (e.g. export crop promotion shifts the intrahousehold allocation of power or S. Africa pensions) – qualitative and interdisciplinary perspectives are key here
43Flexibility and adaptability The evaluation must be tailored to the specific project and adapted to the specific institutional context.The project design must be flexible to secure our ability to learn in a structured manner, feed evaluation results back into the project and change the project mid-course to improve project end results.Can be broad project redesign or push in new directions e.g. feed into nutritional targeting designThis is an important point: In the past projects have been penalized for affecting mid-course changes in project design. Now we want to make change part of the project design.
44But don’t be afraid to look at intermediate outcomes either Horizon mattersThe time it takes to achieve results is an important consideration for timing the evaluation. Conversely, the timing of the evaluation will determine what outcomes should be focused on.Early evaluations should focus on outcomes that are quick to show changeFor long-term outcomes, evaluations may need to span beyond project cycle. e.g. Indonesia school building projectThink through how things are expected to change over time and focus on what is within the time horizon for the evaluationDo not confuse the importance of an outcome with the time it takes for it to change—some important outcomes are obtained instantaneously !But don’t be afraid to look at intermediate outcomes either
46Get an Early Start How do you get started? Get help and access to resources: contact person in your region or sector responsible for impact evaluation and/or Thematic Group on Impact EvaluationDefine the timing for the various steps of the evaluation to ensure you have enough lead time for preparatory activities (e.g. baseline goes to the field before program activities start)The evaluation will require support from a range of policy-makers: start building and maintaining constituents, dialogue with relevant actors in government, build a broad base of support, include stakeholders
47Build the TeamSelect impact evaluation team and define responsibilities of:program managers (government),WB project team, and other donors,lead evaluator (impact evaluation specialist),local research/evaluation team, anddata collection agency or firmSelection of lead evaluator is critical for ensuring quality of product, and so is the capacity of the data collection agencyPartner with local researchers and research institutes to build local capacity
48Shift Paradigm From a project design based on “we know what’s best” To project design based on the notion that “we can learn what’s best in this context, and adapt to new knowledge as needed”Work iteratively:Discuss what the team knows and what it needs to learn–the questions for the evaluation—to deliver on project objectivesDiscuss translating this into a feasible project designFigure out what questions can feasibly be addressedHousekeeping: Include these first thoughts in a paragraph in the PCNe.g. ARV evaluation – funding constraints shifted radically, quickly – design changed, and changed again
50Define project development objectives and results framework This activityclarifies the results chain (logic of impacts) for the project,identifies the outcomes of interest and the indicators best suited to measure changes in those outcomes, andthe expected time horizon for changes in those outcomes.This will provide the lead evaluator with the project specific variables that must be included in the survey questionnaire and a notion of timing for scheduling data collection.
51Work out project design features that will affect evaluation design Target population and rules of selectionThis provides the evaluator with the universe for the treatment and comparison sampleRoll out planThis provide the evaluation with a framework for timing data collection and, possibly, an opportunity to define a comparison groupThink about non-objective undermining changes that will enhance the evaluation (and this will likely be iterative)
52Narrow down the questions for the evaluation Questions aimed at measuring the impact of the project on a set of outcomes, andQuestions aimed at measuring the relative effectiveness of different features of the project
53Questions aimed at measuring the impact of the project are relatively straightforward What is your hypothesis? (Results framework)By expanding water supply, the use of clean water will increase, water borne disease decline, and health status will improveWhat is the evaluation question?Does improved water supply result in better health outcomes?How can do you test the hypothesis?The government might randomly assign areas for expansion in water supply during the first and second phase of the programWhat will you measure?Measure the change in health outcomes in phase I areas relative to the change in outcomes in phase II areas. Outcomes will include use of safe water (S-T), incidence of diarrhea (S/M-T), and health status (L-T, depending on when phase II occurs). Add other outcomes.What will you do with the results?If the hypothesis proves true go to phase II; if false, modify policy.
54require identifying the tough design choices on the table… Questions aimed at measuring the relative effectiveness of different project featuresrequire identifying the tough design choices on the table…What is the issue?What is the best package of products or services?Where do you start from (what is the counterfactual)?What package is the government delivering now?Which changes do you or the government think could be made to improve effectiveness?
55What will you do with the results? How do you test it?The government might agree to provide a package to a randomly selected group of households and another package to another group of households to see how the two package performWhat will you measure?The average change in relevant outcomes for households receiving one package versus the same for households receiving the other packagee.g. extension vs fertilizer+extension vs fertilizer+extension+seedsWhat will you do with the results?The package that is most effective in delivering desirable outcomes becomes the one adopted by the project from the evaluation onwards
56Application, features that should be tested early on Early testing of project features (say 6 months to 1 year) can provide the team with the information needed to adjust the project early on in the direction most likely to deliver success.Features might include:alternative modes of delivery (e.g. use seed merchants vs. extension agents),alternative packages of outputs, ordifferent pricing schemes (e.g. alternative subsidy levels).
57Develop identification strategy (to identify the impact of the project separately from changes due to other causes )One the questions are defined, the lead evaluator selects one or more comparison groups against which to measure results in the treatment group.The “rigor” with which the comparison group is selected will determine the reliability of the impact estimates.Rigor?More-same observables and unobservables (experimental),Less-same observables (non-experimental)
58Explore Existing DataExplore what data exists that might be relevant for use in the evaluation.Discuss with the agencies of the national statistical system and universities to identify existing data sources and future data collection plans.Check DECDG websiteRecord data periodicity, quality, variables covered and sampling frame and sample size, forCensusesSurveys (household, firms, facility, etc)Administrative dataData from the project monitoring system
59New Data Start identifying additional data collection needs. Data for impact evaluation must be representative of treatment and comparison groupQuestionnaires must include outcomes of interest (consumption, income, assets etc), questions about the program in question and questions about other programs, as well as control variablesThe data might be at household, community, firm, facility, or farm levels and might be combined with specialty data such as those from water or land quality tests.Investigate synergies with other projects to combine data collection efforts and/or explore existing data collection efforts on which the new data collection could piggy backDevelop a data strategy for the impact evaluation including:The timing for data collectionThe variables neededThe sample (including size)Plans to integrate data from other sources (e.g project monitoring data)
60Prepare for collecting data Identify data collection agencyLead evaluator or team will work with the data collection agency to design sample, and train enumeratorsLead evaluator or team will prepare survey questionnaire or questionnaire module as neededPre-testing survey instrument may take place at this stage to finalize instrumentsIf financed with outside funds, baseline can now go to the field. If financed by project funds, baseline will go to the field just after effectiveness but before implementation starts
61Develop a Financial Plan Costs:Lead evaluator and research/evaluation team,Data collection,Supervision andDisseminationFinances:BB,Trust fund,Research grants,Project funds, orOther donor funds
62HousekeepingInitiate an IE activity. The IE code in SAP is a way of formalizing evaluation activities. The IE code recognizes the evaluation as a separate AAA product.Prepare concept noteIdentify peer reviewers –impact evaluation and sector specialistCarry out review processAppraisal documentsInclude in the project description plans to modify project overtime to incorporate resultsWork the impact evaluation into the M&E section of the PAD and Annex 3Include the impact evaluation in the Quality Enhancement Review (TTL).
64Ensure timely implementation Ensure timely procurement of evaluation services especially contracting the data collection, andSupervise timely implementation of the evaluation includingData collectionData analysisDissemination and feedback
65Data collection agency/firm Data collection agency or firm must have technical knowledge and sufficient logistical capacity relative to the scale of data collection requiredThe same agency or firm should be expected to do baseline and follow up data collection (and use the same survey instrument)
66Baseline data collection and analysis Baseline data collection should be carried out before program implementation begins; optimally even before program is announcedAnalysis of baseline data will provide program management with additional information that might help finalize program design
67Follow-up data collection and analysis The timing of follow-up data collection must reflect the learning strategy adoptedEarly data collection will help modifying programs mid course to maximize longer-term effectivenessLater data collection will confirm achievement of longer-term outcomes and justify continued flows of fiscal resources into the program
68Watch implementation closely from an evaluation point of view Watch (monitor) what is actually being implemented:Will help understand results of evaluationWill help with timing of evaluation activitiesWatch for contamination in the control groupWatch for violation of eligibility criteriaWatch for other programs for the same beneficiariesLook for unintended impactsLook for unexploited evaluation opportunities Good evaluation team communication is key here
69DisseminationImplement plan for dissemination of evaluation results ensuring that the timing is aligned with government’s decision making cycle.Ensure that results are used to inform project management and that available entry points are exploited to provide additional feedback to the governmentEnsure that wider dissemination takes place only after the client has had a chance to preview and discuss the resultsNurture collaboration with local researchers throughout the process
70HousekeepingPut in place arrangements to procure the impact evaluation work and fund it on timeUse early results to inform mid-term reviewUse later results to inform the ICR, CAS and future operations
71Summing up: Practicalities Making evaluation work for you requires a change in the culture of project design and implementation, one that maximizes the use of learning to change course when necessary and improve the chances for successImpact evaluation is more than a tool – it is an organizing analytical framework for doing this – it is not about measuring success or failure so much as it is about learning…
72Where to go for assistance / more information ClinicsBrochure here, PREMTG resourcesSearchable database of evaluationsSearchable roster of consultantsDoing IE series – general and sector notesWebsite (http://impactevaluation)Courses – workshop on IE, WBI training, PAL courseSouth Asia resources: Jishnu Das (12/06)