Evaluation in the Practice of Development

Slides:

Advertisements

Similar presentations

Evaluating Anti-Poverty Programs Part 1: Concepts and Methods Martin Ravallion Development Research Group, World Bank.

Advertisements

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:

The Role of Pilots Alex Bryson Policy Studies Institute ESRC Methods Festival, St Catherines College, Oxford University, 1st July 2004.

When are Impact Evaluations (IE) Appropriate and Feasible? Michele Tarsilla, Ph.D. InterAction IE Workshop May 13, 2013.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Measuring Impact: lessons from the capacity building cluster SWF Impact Summit 2 nd October 2013 Leroy White University of Bristol Capacity Building Cluster.

Knowing if the RBF mechanism is working Incorporating Rigorous Impact Evaluation into your HRBF program Sebastian Martinez World Bank.

Mywish K. Maredia Michigan State University

Benefits and limits of randomization 2.4. Tailoring the evaluation to the question Advantage: answer the specific question well – We design our evaluation.

The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.

The French Youth Experimentation Fund (Fonds d’Expérimentation pour la Jeunesse – FEJ) Mathieu Valdenaire (DJEPVA - FEJ) International Workshop “Evidence-based.

Econometric Modeling More on Experimental Design.

1 Legal Empowerment of the Poor: An Action Agenda for the World Bank Ana Palacio April 19, 2006.

Types of Evaluation.

Aid and the Business Environment Mushtaq H. Khan, Department of Economics, SOAS, University of London DIIS, Copenhagen, 2 nd December 2014.

Health Systems and the Cycle of Health System Reform

Chapter 5 Valuing Benefits and Costs in Secondary Markets

Evidence-Based Practice Current knowledge and practice must be based on evidence of efficacy rather than intuition, tradition, or past practice. The importance.

Striving for Quality Using continuous improvement strategies to increase program quality, implementation fidelity and durability Steve Goodman Director.

Lecture 2 Chapter 5. A Closer Look at Economic Efficiency.

Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.

How should we “do” development? From economics to policy.

Operational Issues – Lessons learnt So you want to do an Impact Evaluation… Luis ANDRES Lead Economist Sustainable Development Department South Asia Region.

Trade Liberalization and the Politics of Financial Development Matías Braun, UCLA Claudio Raddatz, World Bank LAFN Dec 3 rd,2004.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.

Evaluation methods and tools (Focus on delivery mechanism) Jela Tvrdonova, 2014.

Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.

AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.

Assessing the Impact of Policy Oriented Research Peter Hazell External Coordinator for Impact Assessment at IFPRI 1.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Methodological Problems Faced in Evaluating Welfare-to-Work Programmes Alex Bryson Policy Studies Institute Launch of the ESRC National Centre for Research.

Assessing the Distributional Impact of Social Programs The World Bank Public Expenditure Analysis and Manage Core Course Presented by: Dominique van de.

Why Evaluate? Evaluating the Impact of Projects and Programs, Beijing, China April Shahid Khandker World Bank Institute.

KATEWINTEREVALUATION.com Education Research 101 A Beginner’s Guide for S STEM Principal Investigators.

Evaluating Job Training Programs: What have we learned? Haeil Jung and Maureen Pirog School of Public and Environmental Affairs Indiana University Bloomington.

GSSR Research Methodology and Methods of Social Inquiry socialinquiry.wordpress.com January 17, 2012 I. Mixed Methods Research II. Evaluation Research.

MAINSTREAMING MONITORING AND EVALUATION IN EDUCATION Can education be effectively managed without an M & E system in place?

Impact Evaluation in Education Introduction to Monitoring and Evaluation Andrew Jenkins 23/03/14.

Impact evaluations: lessons from AFD’s experience Phnom Penh SKY evaluation meeting 4-5 October 2011.

Screen 1 of 22 Food Security Policies – Formulation and Implementation Policy Monitoring and Evaluation LEARNING OBJECTIVES Define the purpose of a monitoring.

Market research for a start-up. LEARNING OUTCOMES By the end of this lesson I will be able to: –Define and explain market research –Distinguish between.

Chapter 10: Arguments for and against Protection.

Begin at the Beginning introduction to evaluation Begin at the Beginning introduction to evaluation.

Applying impact evaluation tools A hypothetical fertilizer project.

META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

1 2 3 Economist Sociologist Project Officer. Evaluating Anti-Poverty Programs: Concepts and Methods Martin Ravallion Development Research Group, World.

Impact of evaluations matters IDEAS Conference 2011, Amman “Evidence to Policy: Lessons Learnt from Influential Impact Evaluations” Presenter: Daniel Svoboda,

Targeting of Public Spending Menno Pradhan Senior Poverty Economist The World Bank office, Jakarta.

Evidence-Based Practice Evidence-Based Practice Current knowledge and practice must be based on evidence of efficacy rather than intuition, tradition,

Designing New Programs Design & Chronological Perspectives (Presentation of Berk & Rossi’s Thinking About Program Evaluation, Sage Press, 1990)

Social Experimentation & Randomized Evaluations Hélène Giacobino Director J-PAL Europe DG EMPLOI, Brussells,Nov 2011 World Bank Bratislawa December 2011.

Public Finance and Public Policy Jonathan Gruber Third Edition Copyright © 2010 Worth Publishers 1 of 24 Copyright © 2010 Worth Publishers.

OED Perspective on ICR Quality Soniya Carvalho, OED Quality At Entry Course on SFs/CDD April 13, 2005 * Contributions from OED’s ICR Review Panel members.

Network analysis as a method of evaluating support of enterprise networks in ERDF projects Tamás Lahdelma (Urban Research TA, Finland)

Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.

TRANSLATING RESEARCH INTO ACTION Randomized Evaluation Start-to-finish Abdul Latif Jameel Poverty Action Lab povertyactionlab.org.

IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.

Introduction Social ecological approach to behavior change

Modeling Poverty Martin Ravallion Development Research Group, World Bank.

Introduction Social ecological approach to behavior change

Strategic Information Systems Planning

Measuring Results and Impact Evaluation: From Promises into Evidence

Quasi Experimental Methods I

General belief that roads are good for development & living standards

Impact evaluation: The quantitative methods with applications

Development Impact Evaluation in Finance and Private Sector

Evaluating Impacts: An Overview of Quantitative Methods

Applying Impact Evaluation Tools: Hypothetical Fertilizer Project

Presentation transcript:

Evaluation in the Practice of Development Lecture at the “Perspectives on Impact Evaluation” Conference, Cairo, 2009 Evaluation in the Practice of Development Martin Ravallion Development Research Group, World Bank Further reading: “Evaluation in the Practice of Development,” World Bank Research Observer, Spring 2009. “Should the Randomistas Rule?” Economists Voice Vol. 6, No. 2. 2009

“Seeking truth from facts.” In 1978, the Chinese Communist Party’s 11th Congress broke with its ideology-based view of policy making in favor of a pragmatic approach, which Deng Xiaoping famously dubbed “feeling our way across the river.” At its core was the idea that public action should be based on evaluations of experiences with different policies—“the intellectual approach of seeking truth from facts.” In looking for facts, a high weight was put on demonstrable success in actual policy experiments on the ground. Not Randomized Control Trials (RCTs); would not pass muster by modern standards. But the basic idea was right. The rural reforms that were then implemented nationally helped achieve probably the most dramatic reduction in the extent of poverty the world has yet seen.

2. Current evaluations fall short 1. We under-invest in evaluation There are significant gaps between what we know and what we want to know about development effectiveness. These gaps stem from distortions in the market for knowledge. 2. Current evaluations fall short Standard approaches are not geared to addressing these distortions and consequent knowledge gaps. We can do better! 10 steps for more relevant impact evaluations.

Why are their persistent gaps in our knowledge about development effectiveness?

Knowledge market failures Suppliers and demanders of knowledge meet, but: Asymmetric information about the quality of the evaluation. Development practitioners cannot easily assess the quality and expected benefits of an impact evaluation, to weigh against the costs. Short-cut non-rigorous methods promise quick results at low cost, though rarely are users well informed of the inferential dangers. Externalities: Benefits spillover to future development projects. But current individual projects hold the purse strings. The project manager will typically not take account of the external benefits when deciding how much to spend on evaluative research. Larger externalities for some types of evaluation (first of its kind; “clones” expected; more innovative)

Current research priorities are not in line with the needs of practitioners Classic concern is with internal validity for mean treatment effect on the treated for an assigned program with no spillover effects. And internal validity is mainly judged by how well one has dealt with selection bias due to unobservables. This approach has severely constrained the relevance of impact evaluation to development policy making.

Bigger problem for some evaluations Far easier to evaluate an intervention that yields its likely impact within one year (say) than one that takes many years. No surprise that credible evaluations of the longer-term impacts of (for example) infrastructure projects are rare. We know very little about the long-term impacts of development projects that do deliver short-term gains; . So future practitioners are often poorly informed about what works and what does not. Biases in our knowledge, favoring projects with well-defined beneficiaries and yielding quick results.

Are social experiments the solution? No obvious reason why doing more social experiments would help correct for the distortions that generated those knowledge gaps. Randomization is only feasible for a non-random sub-set of policies and settings; It is rarely feasible to randomize the location of infrastructure projects and related programs, which are core activities in almost any poor country’s development strategy. And when randomization is a relevant tool, it will be adopted more readily in some settings than others Ethical and political concerns loom large—stemming from the fact that some of those to which a program is randomly assigned will almost certainly not need it, while some in the control group will. Better idea: randomize what gets evaluated rigorously and then chose a method appropriate to each sampled intervention, with randomization as one option.

Dissemination/publication is a crucial but weak link in the learning process Externalities again: High current costs of completing the cycle of knowledge generation, with benefits accruing to others. Prior beliefs (based on past publications) seem to get too high a weight in the review process. Reviewers tend (past contributors) often over-state the precision of past knowledge (priors are overweighed in Bayesian sense) Academic objectives dominate: emphasis on internal validity even if the question addressed is of little relevance to filling pressing knowledge gaps about development effectiveness

Researcher incentives The incentives of researchers are not well aligned with prevailing knowledge gaps. Choice of topic and what to report; multiple outcomes, with selective reporting If you have enough outcome variables you will eventually have something to report, even if there is no real impact Replication is seen as second-class citizen Weak incentives to make data public and weak coverage and enforcement of journal policies

However,.. Rising donor interest There is now a broader awareness of the problems faced when trying to do evaluations, including the age-old problem of identifying “causal” impacts. This has helped make donors less willing to fund weak proposals for evaluations that are unlikely to yield reliable knowledge about development effectiveness. However,.. … the resources do not always go to rigorous evaluations. Nor are the extra resources having as much impact as they could on the incentives facing project managers, governments and researchers. Donor support needs to focus on increasing marginal private benefits from evaluation, or reducing marginal costs.

Ten steps to more policy-relevant impact evaluations Can we do better? Ten steps to more policy-relevant impact evaluations

1: Start with a policy-relevant question and be eclectic on methods Policy relevant evaluation must start with interesting and important questions. But instead many evaluators start with a preferred method and look for questions that can be addressed with that method. By constraining evaluative research to situations in which one favorite method is feasible, research may exclude many of the most important and pressing development questions.

Standard methods don’t address all the policy-relevant questions What is the relevant counterfactual? “Do nothing”: that is rare; but how to identify more relevant CF? Example from workfare programs in India (do nothing CF vs. alternative policy) What are the relevant parameters to estimate? Mean vs. poverty (marginal distribution) Average vs. marginal impact Joint distribution of YT and YC , esp., if some participants are worse off: ATE only gives net gain for participants. Policy effects vs. structural parameters. What are the lessons for scaling up? Why did the program have (or not have) impact?

2. Take the ethical objections and political sensitivities seriously; policy makers do! Pilots (using NGOs) can often get away with methods not acceptable to governments accountable to voters. Deliberately denying a program to those who need it and providing the program to some who do not. - Yes, too few resources to go around. But is randomization the fairest solution to limited resources? - What does one condition on in conditional randomizations? Intention-to-treat helps alleviate these concerns => randomize assignment, but free to not participate But even then, the “randomized out” group may include people in great need. The information available to the evaluator (for conditioning impacts) is a partial subset of the information available “on the ground” (incl. voters)

3. Taking a comprehensive approach to the sources of bias Two sources of selection bias: observables and unobservables (to the evaluator) i.e., participants have latent attributes that yield higher/lower outcomes Some economists have become obsessed with the latter bias, while ignoring enumerable other biases/problems. Less than ideal methods of controlling for observable heterogeneity including ad hoc (linear, parametric) models of outcomes. Evidence that we have given too little attention to the problem of selection bias based on observables. Arbitrary preferences for one conditional independence assumption (exclusion restrictions) over another (conditional exogeneity of placement) We cannot scientifically judge appropriate assumptions/ methods independently of program, setting and data.

4. Do a better job on spillover effects Are there hidden impacts for non-participants? Spillover effects can stem from: Markets Behavior of participants/non-participants Behavior of intervening agents (governmental/NGO) Example 1: Employment Guarantee Scheme assigned program, but no valid comparison group if the program works the way it is intended to work. Example 2: Southwest China Poverty Reduction Program displacement of local government spending in treatment villages => benefits go to the control vaillages substantial underestimation of impact Model implies that true DD=1.5 x empirical DD Key conclusions on long-run impact robust in this case

5. Take a sectoral approach, recognizing fungibility/flypaper effects You are not in fact evaluating what the extra public resources (incl. aid) actually financed. So your evaluation may be deceptive about the true impact of those resources. Flypaper effects Impacts may well be found largely within the “sector”. Example for Vietnam roads project: fungibility within transport sector, but flypaper effect on sector. Need for a broad sectoral approach

6. Fully explore impact heterogeneity Impacts will vary with participant characteristics (including those not observed by the evaluator) and context. Participant heterogeneity Interaction effects Also essential heterogeneity, with participant responses (Heckman-Urzua-Vytlacil) Implications for: evaluation methods (local instrumental variables estimator) project design and even whether the project can have any impact. (Example from China’s SWPRP.) external validity (generalizability) =>

Impact heterogeneity cont., Contextual heterogeneity “In certain settings anything works, in others everything fails” Local institutional factors in development impact Example of Bangladesh’s Food-for-Education program Same program works well in one village, but fails hopelessly nearby

7. Take “scaling up” seriously With scaling up: Inputs change: Entry effects: nature and composition of those who “sign up” changes with scale. Migration responses. Intervention changes: Resources effects on the intervention Outcomes change: Lags in outcome responses Market responses (partial equilibrium assumptions are fine for a pilot but not when scaled up) Social effects/political economy effects; early vs. late capture. But little work on external validity and scaling up.

Examples of external invalidity: Scaling up from randomized pilots The people normally attracted to a program do not have the same characteristics as those randomly assigned + impacts vary with those characteristics =>“randomization bias” (Heckman & Smith) We have evaluated a different program to the one that actually gets implemented nationally!

8. Understand what determines impact Replication across differing contexts Example of Bangladesh’s FFE: inequality etc within village => outcomes of program Implications for sample design => trade off between precision of overall impact estimates and ability to explain impact heterogeneity Intermediate indicators Example of China’s SWPRP Small impact on consumption poverty But large share of gains were saved Qualitative research/mixed methods Test the assumptions (“theory-based evaluation”) But poor substitute for assessing impacts on final outcome In understanding impact, Step 9 is key =>

9. Don’t reject theory and structural models Standard evaluations are “black boxes”: they give policy effects in specific settings but not structural parameters (as relevant to other settings). Structural methods allow us to simulate changes in program design or setting. However, assumptions are needed. (The same is true for black box social experiments.) That is the role of theory. PROGRESA example (Attanasio et al.; Todd & Wolpin) Modeling schooling choices using randomized assignment for identification Budget-neutral switch from primary to secondary subsidy would increase impact

10. Develop capabilities for evaluation within developing countries Strive for a culture of evidence-based evaluation practice. China example: “Seeking truth from fact” + role of research Evaluation is a natural addition to the roles of the government’s sample survey unit. Independence/integrity should already be in place. Connectivity to other public agencies may be a bigger problem. Sometimes a private evaluation capability will be required.

There are significant gaps between what we know and what we want to know about development effectiveness. These gaps stem from distortions in the market for knowledge. Standard approaches to impact evaluation are not geared to addressing these distortions and consequent knowledge gaps. We can do better!