Evaluation in the Practice of Development

Evaluation in the Practice of Development
Lecture at the “Perspectives on Impact Evaluation” Conference, Cairo, 2009 Evaluation in the Practice of Development Martin Ravallion Development Research Group, World Bank Further reading: “Evaluation in the Practice of Development,” World Bank Research Observer, Spring 2009. “Should the Randomistas Rule?” Economists Voice Vol. 6, No

“Seeking truth from facts.”
In 1978, the Chinese Communist Party’s 11th Congress broke with its ideology-based view of policy making in favor of a pragmatic approach, which Deng Xiaoping famously dubbed “feeling our way across the river.” At its core was the idea that public action should be based on evaluations of experiences with different policies—“the intellectual approach of seeking truth from facts.” In looking for facts, a high weight was put on demonstrable success in actual policy experiments on the ground. Not Randomized Control Trials (RCTs); would not pass muster by modern standards. But the basic idea was right. The rural reforms that were then implemented nationally helped achieve probably the most dramatic reduction in the extent of poverty the world has yet seen.

2. Current evaluations fall short
1. We under-invest in evaluation There are significant gaps between what we know and what we want to know about development effectiveness. These gaps stem from distortions in the market for knowledge. 2. Current evaluations fall short Standard approaches are not geared to addressing these distortions and consequent knowledge gaps. We can do better! 10 steps for more relevant impact evaluations.

Why are their persistent gaps in our knowledge about development effectiveness?

Knowledge market failures
Suppliers and demanders of knowledge meet, but: Asymmetric information about the quality of the evaluation. Development practitioners cannot easily assess the quality and expected benefits of an impact evaluation, to weigh against the costs. Short-cut non-rigorous methods promise quick results at low cost, though rarely are users well informed of the inferential dangers. Externalities: Benefits spillover to future development projects. But current individual projects hold the purse strings. The project manager will typically not take account of the external benefits when deciding how much to spend on evaluative research. Larger externalities for some types of evaluation (first of its kind; “clones” expected; more innovative)

Current research priorities are not in line with the needs of practitioners
Classic concern is with internal validity for mean treatment effect on the treated for an assigned program with no spillover effects. And internal validity is mainly judged by how well one has dealt with selection bias due to unobservables. This approach has severely constrained the relevance of impact evaluation to development policy making.

Bigger problem for some evaluations
Far easier to evaluate an intervention that yields its likely impact within one year (say) than one that takes many years. No surprise that credible evaluations of the longer-term impacts of (for example) infrastructure projects are rare. We know very little about the long-term impacts of development projects that do deliver short-term gains; . So future practitioners are often poorly informed about what works and what does not. Biases in our knowledge, favoring projects with well-defined beneficiaries and yielding quick results.

Are social experiments the solution?
No obvious reason why doing more social experiments would help correct for the distortions that generated those knowledge gaps. Randomization is only feasible for a non-random sub-set of policies and settings; It is rarely feasible to randomize the location of infrastructure projects and related programs, which are core activities in almost any poor country’s development strategy. And when randomization is a relevant tool, it will be adopted more readily in some settings than others Ethical and political concerns loom large—stemming from the fact that some of those to which a program is randomly assigned will almost certainly not need it, while some in the control group will. Better idea: randomize what gets evaluated rigorously and then chose a method appropriate to each sampled intervention, with randomization as one option.

Dissemination/publication is a crucial but weak link in the learning process
Externalities again: High current costs of completing the cycle of knowledge generation, with benefits accruing to others. Prior beliefs (based on past publications) seem to get too high a weight in the review process. Reviewers tend (past contributors) often over-state the precision of past knowledge (priors are overweighed in Bayesian sense) Academic objectives dominate: emphasis on internal validity even if the question addressed is of little relevance to filling pressing knowledge gaps about development effectiveness

Researcher incentives
The incentives of researchers are not well aligned with prevailing knowledge gaps. Choice of topic and what to report; multiple outcomes, with selective reporting If you have enough outcome variables you will eventually have something to report, even if there is no real impact Replication is seen as second-class citizen Weak incentives to make data public and weak coverage and enforcement of journal policies

However,.. Rising donor interest
There is now a broader awareness of the problems faced when trying to do evaluations, including the age-old problem of identifying “causal” impacts. This has helped make donors less willing to fund weak proposals for evaluations that are unlikely to yield reliable knowledge about development effectiveness. However,.. … the resources do not always go to rigorous evaluations. Nor are the extra resources having as much impact as they could on the incentives facing project managers, governments and researchers. Donor support needs to focus on increasing marginal private benefits from evaluation, or reducing marginal costs.

Ten steps to more policy-relevant impact evaluations
Can we do better? Ten steps to more policy-relevant impact evaluations

1: Start with a policy-relevant question and be eclectic on methods
Policy relevant evaluation must start with interesting and important questions. But instead many evaluators start with a preferred method and look for questions that can be addressed with that method. By constraining evaluative research to situations in which one favorite method is feasible, research may exclude many of the most important and pressing development questions.

Standard methods don’t address all the policy-relevant questions
What is the relevant counterfactual? “Do nothing”: that is rare; but how to identify more relevant CF? Example from workfare programs in India (do nothing CF vs. alternative policy) What are the relevant parameters to estimate? Mean vs. poverty (marginal distribution) Average vs. marginal impact Joint distribution of YT and YC , esp., if some participants are worse off: ATE only gives net gain for participants. Policy effects vs. structural parameters. What are the lessons for scaling up? Why did the program have (or not have) impact?

2. Take the ethical objections and political sensitivities seriously; policy makers do!
Pilots (using NGOs) can often get away with methods not acceptable to governments accountable to voters. Deliberately denying a program to those who need it and providing the program to some who do not. - Yes, too few resources to go around. But is randomization the fairest solution to limited resources? - What does one condition on in conditional randomizations? Intention-to-treat helps alleviate these concerns => randomize assignment, but free to not participate But even then, the “randomized out” group may include people in great need. The information available to the evaluator (for conditioning impacts) is a partial subset of the information available “on the ground” (incl. voters)

3. Taking a comprehensive approach to the sources of bias
Two sources of selection bias: observables and unobservables (to the evaluator) i.e., participants have latent attributes that yield higher/lower outcomes Some economists have become obsessed with the latter bias, while ignoring enumerable other biases/problems. Less than ideal methods of controlling for observable heterogeneity including ad hoc (linear, parametric) models of outcomes. Evidence that we have given too little attention to the problem of selection bias based on observables. Arbitrary preferences for one conditional independence assumption (exclusion restrictions) over another (conditional exogeneity of placement) We cannot scientifically judge appropriate assumptions/ methods independently of program, setting and data.

4. Do a better job on spillover effects
Are there hidden impacts for non-participants? Spillover effects can stem from: Markets Behavior of participants/non-participants Behavior of intervening agents (governmental/NGO) Example 1: Employment Guarantee Scheme assigned program, but no valid comparison group if the program works the way it is intended to work. Example 2: Southwest China Poverty Reduction Program displacement of local government spending in treatment villages => benefits go to the control vaillages substantial underestimation of impact Model implies that true DD=1.5 x empirical DD Key conclusions on long-run impact robust in this case

5. Take a sectoral approach, recognizing fungibility/flypaper effects
You are not in fact evaluating what the extra public resources (incl. aid) actually financed. So your evaluation may be deceptive about the true impact of those resources. Flypaper effects Impacts may well be found largely within the “sector”. Example for Vietnam roads project: fungibility within transport sector, but flypaper effect on sector. Need for a broad sectoral approach

6. Fully explore impact heterogeneity
Impacts will vary with participant characteristics (including those not observed by the evaluator) and context. Participant heterogeneity Interaction effects Also essential heterogeneity, with participant responses (Heckman-Urzua-Vytlacil) Implications for: evaluation methods (local instrumental variables estimator) project design and even whether the project can have any impact. (Example from China’s SWPRP.) external validity (generalizability) =>

Impact heterogeneity cont.,
Contextual heterogeneity “In certain settings anything works, in others everything fails” Local institutional factors in development impact Example of Bangladesh’s Food-for-Education program Same program works well in one village, but fails hopelessly nearby

7. Take “scaling up” seriously
With scaling up: Inputs change: Entry effects: nature and composition of those who “sign up” changes with scale. Migration responses. Intervention changes: Resources effects on the intervention Outcomes change: Lags in outcome responses Market responses (partial equilibrium assumptions are fine for a pilot but not when scaled up) Social effects/political economy effects; early vs. late capture. But little work on external validity and scaling up.

Examples of external invalidity: Scaling up from randomized pilots
The people normally attracted to a program do not have the same characteristics as those randomly assigned + impacts vary with those characteristics =>“randomization bias” (Heckman & Smith) We have evaluated a different program to the one that actually gets implemented nationally!

8. Understand what determines impact
Replication across differing contexts Example of Bangladesh’s FFE: inequality etc within village => outcomes of program Implications for sample design => trade off between precision of overall impact estimates and ability to explain impact heterogeneity Intermediate indicators Example of China’s SWPRP Small impact on consumption poverty But large share of gains were saved Qualitative research/mixed methods Test the assumptions (“theory-based evaluation”) But poor substitute for assessing impacts on final outcome In understanding impact, Step 9 is key =>

9. Don’t reject theory and structural models
Standard evaluations are “black boxes”: they give policy effects in specific settings but not structural parameters (as relevant to other settings). Structural methods allow us to simulate changes in program design or setting. However, assumptions are needed. (The same is true for black box social experiments.) That is the role of theory. PROGRESA example (Attanasio et al.; Todd & Wolpin) Modeling schooling choices using randomized assignment for identification Budget-neutral switch from primary to secondary subsidy would increase impact

10. Develop capabilities for evaluation within developing countries
Strive for a culture of evidence-based evaluation practice. China example: “Seeking truth from fact” + role of research Evaluation is a natural addition to the roles of the government’s sample survey unit. Independence/integrity should already be in place. Connectivity to other public agencies may be a bigger problem. Sometimes a private evaluation capability will be required.

There are significant gaps between what we know and what we want to know about development effectiveness. These gaps stem from distortions in the market for knowledge. Standard approaches to impact evaluation are not geared to addressing these distortions and consequent knowledge gaps. We can do better!

Evaluation in the Practice of Development

Similar presentations

Presentation on theme: "Evaluation in the Practice of Development"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Evaluation in the Practice of Development

Similar presentations

Presentation on theme: "Evaluation in the Practice of Development"— Presentation transcript:

Similar presentations

About project

Feedback