What could go wrong? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop Africa Program for Education.

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

The World Bank Human Development Network Spanish Impact Evaluation Fund.
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Girls’ scholarship program.  Often small/no impacts on actual learning in education research ◦ Inputs (textbooks, flipcharts) little impact on learning.
A Guide to Education Research in the Era of NCLB Brian Jacob University of Michigan December 5, 2007.
Defining Characteristics
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Experimental Research Designs
Correlation AND EXPERIMENTAL DESIGN
Research Design and Validity Threats
1 Managing Threats to Randomization. Threat (1): Spillovers If people in the control group get treated, randomization is no more perfect Choose the appropriate.
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Psych 231: Research Methods in Psychology
The World Bank Human Development Network Spanish Impact Evaluation Fund.
EVALUATING YOUR RESEARCH DESIGN EDRS 5305 EDUCATIONAL RESEARCH & STATISTICS.
Experimental Design The Gold Standard?.
Intervention Studies Principles of Epidemiology Lecture 10 Dona Schneider, PhD, MPH, FACE.
Chapter 11 Research Methods in Behavior Modification.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Measuring Impact: Experiments
AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.
Shawn Cole Harvard Business School Threats and Analysis.
PYGMALION EFFECT: TEACHERS’ EXPECTATIONS AND HOW THEY IMPACT STUDENT ACHIEVEMENT Glen Gochal Professor O’Connor-Petruso Seminar in Applied Theory and Research.
Reliability and Validity Why is this so important and why is this so difficult?
Research Methods Psychology uses specific methods of behavioral research to standardize findings. The experimental method has four parts: hypothesis, procedure,
CAUSAL INFERENCE Shwetlena Sabarwal Africa Program for Education Impact Evaluation Accra, Ghana, May 2010.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.
1 Evaluating Research This lecture ties into chapter 17 of Terre Blanche We know the structure of research Understand designs We know the requirements.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Conditional Cash Transfer (CCT) Programme Kano State Nigeria.
Why Use Randomized Evaluation? Isabel Beltran, World Bank.
Applying impact evaluation tools A hypothetical fertilizer project.
Impact Evaluation “Randomized Evaluations” Jim Berry Asst. Professor of Economics Cornell University.
Impact Evaluation for Evidence-Based Policy Making
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.
Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.
Ch 9 Internal and External Validity. Validity  The quality of the instruments used in the research study  Will the reader believe what they are readying.
Development Impact Evaluation in Finance and Private Sector 1.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
Randomized Assignment Difference-in-Differences
Building an evidence-base from randomised control trials Presentation of the findings of the impact evaluation of the Reading Catch-Up Programme 18 August.
CHAPTER 8 EXPERIMENTS.
Three ‘R’s for Evaluating the Memphis Striving Readers Project: Relationships, Real-World Challenges, and RCT Design Jill Feldman, RBS Director of Evaluation.
Massimiliano La Marca Multilateral Cooperation Department International Labour Office September 16, 2015 Randomized Control Experiments: some experience.
CJ490: Research Methods in Criminal Justice UNIT #4 SEMINAR Professor Jeffrey Hauck.
What is Impact Evaluation … and How Do We Use It? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Randomization.
Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
RE-AIM Framework. RE-AIM: A Framework for Health Promotion Planning, Implementation and Evaluation Are we reaching the intended audience? Is the program.
Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.
A. CAUSAL EFFECTS Eva Hromádková, Applied Econometrics JEM007, IES Lecture 2A.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
Can you hear me now? Keeping threats to validity from muffling assessment messages Maureen Donohue-Smith, Ph.D., RN Elmira College.
Randomized Evaluation: Dos and Don’ts An example from Peru Tania Alfonso Training Director, IPA.
Measuring Results and Impact Evaluation: From Promises into Evidence
Threats and Analysis.
Implementation Challenges
Brahm Fleisch Research supported by the Zenex Foundation October 2017
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Explanation of slide: Logos, to show while the audience arrive.
Sampling for Impact Evaluation -theory and application-
Monitoring and Evaluating FGM/C abandonment programs
Steps in Implementing an Impact Evaluation
Presentation transcript:

What could go wrong? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop Africa Program for Education Impact Evaluation (APEIE) Accra, Ghana May

What could go wrong? A lot! – Threats to internal and external validity Randomization is undermined Hawthorne effect Spillover effects Attrition Non-compliance Pilot phase “startup” problems …  Let’s just focus on a few areas

Just a reminder Internal validity – Extent to which the evaluation is indeed estimating the parameter of interest E.g. Impact of a scholarship program on attendance External validity – Extent to which the evaluation provides relevant information about the likely effectiveness of a program in a different setting, or if implemented at a different scale

Internal validity – Extent to which the evaluation is indeed estimating the parameter of interest  Are you estimating what you think you are? Some issues – Attrition – Spillover effects – Partial compliance and sample selection bias

Internal validity: Attrition Attrition is a generic problem in data analysis  Some people drop out of the analysis or the sample Is this a problem in impact evaluation?  Yes, if the attrition is related to with the intervention, or its likely impact.

Attrition bias: An Example Study: Impact of a scholarship program on learning achievement – Baseline = students in grade 5 – Intervention = offer of a scholarship for grade 6 – Follow-up = test scores of students at end of grade 6 Impact evaluation question: – Did the scholarship program improve test scores? Can you see a problem with this setup? How might you go about addressing any potential bias?

Addressing attrition bias Check that attrition is not different between treatment and control groups Also check that it is not correlated with observables. Try to bound the extent of the bias – For example: Recalculate impact assuming everyone who dropped out from the treatment got the lowest score that anyone got Recalculate impact assuming everyone who dropped out of control got the highest score that anyone got

Internal validity: Spillover effects/externalities Spillover effects or Externalities occur when other people than the target population are affected by the intervention How is this a problem?

Spillover effects/externalities: Example A teacher training program affects all teachers in a school, not just those who were randomly selected to go through the training

Addressing spillover effects/externalities Main way to address problem is to  Randomize in such a way as to encompass externalities – E.g. Randomize training to all teachers in a school

Internal validity: Partial compliance and sample selection bias Occurs when the treatment and control groups are not comparable, either – Because the randomization “didn’t work” – Study population’s behavior “undermined” the randomization

Randomization “didn’t work” The validity of randomization as an approach to ensure equal characteristics depends on sufficiently large samples In any single application the samples could be different – That is, average characteristics might differ between treatment and control groups How might you address this in the analysis?

Behavior “undermined” the randomization For example, – Example 1: Students who were offered a scholarship drop out after 1 year (and could therefore be considered “control” children) – Example 2: Students from non-recipient schools move to schools that were randomly chosen to receive a grant How might you address this in the analysis?

Addressing partial compliance and sample selection bias Use the “Intent To Treat” (ITT) approach – Frame the evaluation in terms of the original design For example – Example 1: Study the impact of offering a scholarship on outcomes – Example 2: Study the impact of the grants program in terms of the school they were originally enrolled in

Addressing partial compliance and sample selection bias The “Intent To Treat” approach is powerful, but does raise issues of its own – It often captures what is in the control of those implementing the intervention E.g. offering a scholarship – But it may not reflect what is likely to happen if the program goes to scale E.g. if all schools were in the grants program then students wouldn’t switch One can use the results to estimate the “Average Treatment on the Treated” but this requires further modeling

External validity Extent to which the evaluation provides relevant information about the likely effectiveness of a program in a different setting, or if implemented at a different scale  Are the findings be meaningful for policy?

External validity: Behavioral responses The behaviors of the study population may be affected by the study itself (as opposed to the intervention) – Treatment group behavior changes: Hawthorne effect People in the treatment group are being closely observed and studies, so alter their behavior (e.g. teachers being observed). – Control group behavior changes: John Henry effect People in the control group views themselves as being in competition with the treatment group and so changes their behavior (e.g. students denied a scholarship).

External validity: Generalizability Is the program—as evaluated—truly possible to replicate at scale? – Did the intervention require a lot of careful attention in order to make it work Was the evaluation carried out on a truly representative population? – Was it restricted to a province that was “ready” for the intervention … and therefore not like other places it might be carried out in

Conclusion Many “threats” to both internal and external validity Some may come from pressure – From study population – From higher levels of government – From donors The “technocratic” approach becomes something of an “art” in needing to balance these various goals

Thank you