Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Overall Agenda When Will We Ever Learn: general introduction to Impact Evaluation When Random Assignment is Possible? –Implementing and Evaluating.

Similar presentations


Presentation on theme: "The Overall Agenda When Will We Ever Learn: general introduction to Impact Evaluation When Random Assignment is Possible? –Implementing and Evaluating."— Presentation transcript:

1 The Overall Agenda When Will We Ever Learn: general introduction to Impact Evaluation When Random Assignment is Possible? –Implementing and Evaluating RCTs When Random Assignment is Not Possible? –Quasi-experimental methods  propensity scores, matching, IV, Regression Discontinuity and DinD.

2 When Will We Ever Learn? What is impact evaluation, and when and how should we use it? Session 1 Scott Rozelle Stanford University

3 Amazing Ideas Sleeping Bag Incubator Treadle Pump Irrigation Agricultural Price Services through Mobile Phones Computer Assisted Learning for Remedial Tutoring

4 New programs in China (huge fiscal investment) Rural Health Insurance (NCMS or hezuo yiliao in China) New Subsidy Program (liangshi butie) New Education Programs (e.g., raising teacher salaries … or … eliminating tuition for high school) Financial Crisis Stimulus Package (investments by central gov’t; investments by localities)

5 How many of the innovations/programs that we heard about on the news … … how many of the new technologies/programs that we have become excited about … … how many have been rigorously evaluated? Do we have empirical evidence, based on a carefully constructed counterfactuals, that these breakthroughs/programs work, can positively affect the lives of the poor and do so in a cost effective way? Unfortunately, the answer is almost certainly some, but, not many …

6 UN Millennium Development Villages USAID Bilateral Investment Program World Bank / ADB’s Loan Program Huge Global Initiatives

7 Statement of Facts: “Accelerating social progress in low- and middle- income countries requires knowledge about what kinds of social programs are effective. Yet all too often, such basic knowledge is lacking because governments, development agencies, and foundations/NGOs have few incentives to start and sustain the impact evaluations that generate this important information.” (International Evaluation Gap Working Group) “When it comes to attribution, there is shockingly little concrete evidence about what works and what does not” (Author of report: When will we ever learn) I was at a conference about 2 to 3 years ago, where one young researcher claimed (in front of scores of older, experienced development economists  after 40+ years of development economics and we had not learned anything … until his work (of course)

8 The Excuses We don’t have time It costs too much to do rigorous impact evaluation It is unethical Project implementation is site and context specific

9 The Excuses We don’t have time It costs too much to do rigorous impact evaluation It is unethical Project implementation is completely site and context specific We already know!

10 Example: J. Sachs  we already know ITN’s work (insecticide treated nets) for the prevention of malaria In fact, this time  rigorous public health trials support it: –90 villages: give residents ITN’s –90 villages: give residents “0”  In treatment villages, reductions of malaria, anemia and other benefits  even positive spillovers: villages/hamlet around the treatment villages (within 300 meters) also benefitted through reduction of malaria (although no ITN’s) … miracle?

11 ITN’s (insecticide treated nets)

12 Policy implications People were not buying them … –Despite people being “very afraid” of malaria … Why? –Stanford University team’s hypothesis:  one-time cost too high –Leads to a new RCT

13 Micro credit or free Through an NGO that had “cells of members” [10 to 20 to 30 households per village) in 100s of villages in India, did RCT with two treatment arms: –Treatment village 1: give away for free to all NGO members –Treatment village 2: sell ITN’s to households as part of a Micro credit (peer monitoring) project –Control villages: “0” What is the outcome?

14 Impact: ZERO [none: for malaria / none: for anemia … NONE none in treatment village 1 / none in treatment village 2 / none in control villages] Explanation: –Have no definitive proof (though now we may know why villagers do not buy them … they don’t seem to work … Theory: –Revisit the original trial … and Revisit and live in own project villages People do not always use ITN’s … trouble / hard to hang / uncomfortable / too many people, not enough nets

15 An explanation How is it that if people do not use them (even in the original public health trial treatment villages) that they have an impact in the villages AND on surrounding villages? Only real difference between original trial (100% of households in trial) and Stanford’s trial (10 to 20% of households in trial)  maybe it is that all mosquitos are killed and populations collapse when all households have ITN’s … this would account for efficacy in trial and the spillover … However, in the partial roll out villages, the ITN’s not effective!

16 ITN’s do work … but, with a caveat Current most plausible explanation: –In the large public health trial, when all of the villagers received the ITN’s … and were encouraged to use them (and did, at first) … ALL of the mosquitos died … this reduces malaria in the treatment villages and the surrounding hamlets Jeffey’s response? –Of course, that is why we give them to all of the families … of course, he had no idea … –Maybe he did “know” … but, surely he does not understand … [but, then none of us do now]

17

18 The Rural Education Action Project of Stanford University is a Research Organization / NGO / Government Organization / Policy Action partnership. Collaborators in ChinaAt Stanford University

19 What Can Be Done to Overcome the Gap in Human Capital between the rural, unskilled, poor and the urban, skilled middle class? Fundamental question which we try to answer:

20 To understand the barriers keeping the rural poor from closing the gap AND learn what can be done…. REAP Works in Two Ways... 1.) We design and implement new program interventions AND we do the evaluations 2.) We partner with NGOs and gov’t agencies who are trying to implement projects –We advise. –They carry out. –We evaluate.

21 REAP Partners

22 Including our best partner (of course): LICOS

23 REAP’s Educational Challenge Areas Access to Secondary Education and Beyond Technology and Human Capital Health, Nutrition and Education

24 REAP Projects in China (1) Health, Nutrition and Education 1.Overcoming the Anemia Puzzle in Rural China 2.Worm Count: Intestinal Worms in Rural China 3.Is One Egg Enough? School Nutrition Programs in Rural Shaanxi 4.Vitameal or Vitamins? Grades and Nutrition in Shaanxi 5.Experimenting with Nutrition: Ningshan County 6.Paying for Performance in the Battle against Anemia 7.Conditional Cash Transfers and Cost Effectiveness in the Battle Against Anemia 8.Nutritional Training in Ningxia 9.Eggs and Grades 10. Reducing Transaction Costs: Chewable Vitamins in Gansu 11.Best Buy Toolkit: Nutrition, Deworming & Vision Interventions in Rural Schools

25 REAP Projects in China (2) Technology and Human Capital 12. Computer Assisted Learning in Beijing Area Migrant Schools 13. Computer Assisted Learning in Rural Boarding Schools 14. Computer Assisted Learning in Rural Minority Areas 15. One Laptop Per Child: Does It Help? 16. Nutritional Training and Mobile Messaging

26 Access to Quality Secondary Education and Beyond 17. Boarding School Management 18. Pre School Vouchers for Needy Families 19. Evaluating Pre School Teacher Training (Nokia, China) 20. Early Commitment of Financial Aid for University 21. SOAR Foundation: What if High School Were Free? 22. Scholarships with Strings Attached: Community Service 23. Financial Aid in Shilou County 24. Contracting for Dreams in Ningshan County 25. Summer Fresh Migrant School Teacher Training Program 26. Peer Tutoring versus Paying for Grades 27. Vouchers, Vocational Education and Career Counselling 28. Scholarships at Four Tier One Universities 29. Breaking the Cycle of Poverty: Cash Transfers for Jr. High REAP Projects in China (3)

27 REAP Projects in China

28 Today’s (this session’s) plan Introduce the concept of IE –Definitions and examples of what is right and what is not right –RCT’s … when possible, sexy gold standards! –When you can’t randomize (still a lot of excitement) IE is not enough: Supplementary tools Issues in choosing an IE strategy –Selecting a control group –RCTs or Quasi-experimental approaches?” A lot more rigor in sessions 2 and 3 …

29 What is impact? Impact = the outcome with the intervention compared to what it would have been in the absence of the intervention Unpacking the definition –Can include unintended outcomes –Can include others not just intended beneficiaries –No reference to time-frame, which is context-specific –At the heart of it is the idea of a attribution – and attribution implies a counterfactual (either implicit or explicit)

30 Defined in this way – UNFORTUNATELY (as discussed above) – we (the international development community) have little evidence on impact of development programs [in other words: we don’t (systematically) know the results of many of the uncountable number of programs that development agencies, gov’ts and other organizations have been implementing in recent years]

31 The attribution problem: factual and counterfactual Impact varies over time Impacts also are defined over time … Little attention has been given to the dynamics over time … though people think about this … With project Project impact

32 Change in the CAL program effect on the standardized math test scores over time The CAL program effect occurred by the midterm evaluation, less than two months after the start of the program

33 Change in the CAL program effect on the standardized math test scores over time There is no improvement between month two and month three

34 Impact of nutrition at infancy in Guatemala After 2 years  greater BMI After 10 years  higher grades in school After 15 years  higher school attainment After 40 years  higher wages / income

35 What has been the impact of the French revolution? “It is too early to say” Zhou Enlai And, even longer run …

36 Lets examine a less grandiose intervention The venue: Poor areas of South West China … a remote mountainous region … populated by groups of Dai and Dong minorities … In 1980s and 1990s only small share of girls attended school … most were involved with farming, tending livestock and raising siblings … An NGO began giving scholarships in the early 1990s … objective: increase attendance of girls … they claim in their very polished promotion material and in the many workshops that they attend that they have been effective in their mission …

37 What do we need to measure impact? Girl’s primary school enrollment BeforeAfter Project (treatment)92 Control The majority of evaluations have just this information … which means we can say absolutely nothing about impact NOTE: if you measure this well, what is it? Outcome monitoring

38 What does 92 percent mean? Is it high? Is it low? What does a single number mean? What do we compare this to? Even if done well … output monitoring in its simplest form  TELLS US NOTHING about impact

39 “Before versus after” single-difference comparisons Before versus after = 92 – 40 = 52 This ‘before versus after’ approach is more careful outcome monitoring, which has become popular recently. Outcome monitoring has its place, but: outcome monitoring ≠ impact evaluation BeforeAfter Project (treatment)4092 Control “scholarships have led to rising schooling of young girls in the project villages”

40 The changing macro environment … and rising employment opportunities and wages Yuan / monthPercent of cohort Employment in the off farm labor market – 16 to 25 year olds Off farm wage rate

41 Rates of completion of elementary male and female students in all rural China’s poor areas Share of rural children

42 Outcome monitoring does not tell us about effectiveness Results… cannot as a rule be attributed specifically, either wholly or in part, to the intervention

43 An (important) aside Collecting data in order to measure outcomes “before an intervention” Can we collect data about outcomes before interventions, after the intervention (that is: is recollection data valid?) No (or be careful): work by economists have shown clearly that there are lots of biases introduced to IE by relying on recollection data (most of them psychological) –If individuals have been given a treatment, they often will selectively remember … they will exaggerate the benefits as a way of showing their gratefulness … –Those in the control groups will often want to show they are less fortunate and understate their condition (or improvement) –Empirically, recollection data have lots of biases … hard to determine the direction … –Best practice (only practice?): collect baseline before the project begins

44 Post-treatment control comparisons Single-difference = 92 – 84 = 8 BeforeAfter Project (treatment)92 Control84 Another common approach (lets compare to another set of villages):

45 But we don’t know if treatment and control groups were similar before… How often are intervention villages / schools / clinics / etcetera / chosen in a way that make them systematically different than control villages? [either for convenience / political necessity / feasibility / cost considerations / or from leaving it to the local partner who uses who-knows-what-type of selection method] In the SW China villages, the NGO went to a poor county, but, the local bureau of education chose the villages … and chose them along the road … Is attendance in elementary school lower in the control villages because the NGO did not pass out scholarships, or because villagers in control villages had less use for education (or the cost of going to school higher)

46 Post-treatment control comparisons Single difference = 92 – 84 = 7 Main point: Post treatment control comparisons are only valid if treatments and controls were identical at the time the intervention began … BeforeAfter Project (treatment)92 Control84 Another common approach (lets compare to another set of villages):

47 Double difference = (92-40)-(84-26) = = -6 Conclusion: Longitudinal (panel) data, with a control group, allow for the strongest impact evaluation design (BUT: still need matching … if they are different at the start of the project … is there something different in the village which would affect the village’s response to the intervention?) Therefore: lets collect data for all of the cells? BeforeAfter Project (treatment)4092 Control2684

48 Main points so far Analysis of impact implies a counterfactual comparison Outcome monitoring is a factual analysis, and so cannot tell us about impact The counterfactual is most commonly determined by using a rigorously/carefully chosen control group If you are going to do impact evaluation you need a credible counterfactual using a control group (not necessarily RCT / but, still need control)

49 “Gold Standard”  Randomized Control Trials Zero Medical What is the counterfactual?

50 “Gold Standard”  Randomized Control Trials Zero Crop field trials What is the counterfactual?

51 Can also do randomized control trials in schools to test the effectiveness of new school program … Social Experimentation … Step 1: choose 50 schools … randomly divide into 2 groups 25 elementary schools in Gansu

52 Does one egg per day, improve test scores / attendance? One Egg Per Day None 25 elementary schools in Gansu 0 What is counterfactual?

53 0 Randomized Control Trial [like in agriculture or medicine] our question: Will one egg per day lead to higher test scores? 1. Baseline survey 2. POLICY EXPERIMENT RCT’s 3. Evaluation survey treated control Three Stages

54 Results: One Egg vs. “0” Difference statistically significant at 95% level of confidence Change in test scores between baseline and evaluation surveys One Egg / Day Schools Control Schools (0) What is causing the difference between Treatment Schools and Control Schools? What is the counterfactual?

55 Before, we talk more about pros/cons and keys/pitfalls of running large RCTs studies : Broaden our set of definitions about IE Discussion above was for ‘large n’ interventions –There are a large number of units of intervention, e.g. children, households, firms, schools. –Examples of “small n” are most (but not all) policy reform and many (but not all) capacity building projects. –E.g.: some reforms (e.g. health insurance) can be given large n designs ‘Small n’ interventions require: –Modelling (computable general equilibrium, CGE, models), e.g. trade and fiscal policy … or role of agriculture in development? –A theory-based analysis (this is what is modeled in a small-n study …) … it is the logic through which the reform / new capacity will drive economic change …

56 In fact, many things can’t be randomized? Effect of a road on access to off farm employment An agricultural subsidy program (that already has been rolled out? Impact of the decisions of migrant families to: leave kids behind (get educated in village’s rural public schools while living with Grandma … or go with Mom and Dad and get educated in the city in a private, unregulated migrant school).

57 How well do students that attend migrant schools perform in standardized tests? Children in migrant schools actually are a bit above those in poor rural schools Standardized math score

58 Control for observable characteristics of students and parents (in both rural schools and migrant schools) … and for length of time that migrant children have been in migrant schools … and using quasi experimental methods (e.g., matching in this case) Standardized math score The argument is that parents bring their children that are better students into urban areas with them … so after controlling for these factors … the difference goes away … AND: If you then compare students in migrant schools that have been in Beijing for > 3 years

59 In fact, many things can’t be randomized? Effect of a road on access to off farm employment An agricultural subsidy program (that already has been rolled out? Impact of the decisions of migrant families to: leave kids behind (get educated in village’s rural public schools while living with Grandma … or go with Mom and Dad and get educated in the city in a private, unregulated migrant school). Should we not work on these questions?

60 But some “randomistas” act as though “if you can’t randomize, don’t study it” Why? They argue: can’t control for unobservables … no matter what you do … except for randomize … [there is a name for this: RADICAL SKEPTICISM] OLS does not work / IV does not work / matching not enough / regression discontinuity … there are always possible unobservables that might confound the results … so just don’t try … just randomize them away..

61 Carter and Barrett response: Barrett C & M. Carter. “The Power and Pitfalls of Experiments in Development Economics,” Applied Economic Perspectives and Policy” 32(4): Carter, M. & C. Barrett. “Retreat from Radical Skepticism: Rebalancing Theory, Observational Data and Randomization in Development Economics.” On the web (forthcoming in some edited volume)

62 Their main point: According to Randomistas: “If you are a radical skeptic (you can never account for unobservables) …” Carter/Barrett’s statement: If this is true, you should NOT be a Randomista … [based on the randomista’s own logic] Logically, you can not believe that there is anything generalizable that can be learned from impact evaluations … Why? … because there is NO external validity  due to the same reason of radical skepticism … [there are unobservables in the locality in which the experiments are being run that interact with the treatment and are part of the measured effect … since those unobservables are unknowable / unmeasureable and uncountable, then we have ZERO predictive power when we take the program outside of the original experimental zone …

63 How do biologists deal with this? I met a biologist recently who told me that he was working on a study: Meta Analysis of Meta Analyses: Effect of Aspirin on Heart Attacks … In past 15 years: 1500 studies (almost all RCTs) … 33 Meta Studies … and now 1 Meta-Meta Study … This is one way to deal with it? But, are we ever going to do 1500 RCTs on the effect of ICTs on Malaria? In economics, will we ever do two?

64 Truth is somewhere between: Need skepticism … not radical skepticism … RCTs are great, when you can do them … but, there are still limitations … Observational data sets have limitations, but, there is still a lot to learn from them … if the care and cautions are taken …

65 Regardless of being large-n or small-n, our focus is on learning why things work, not just what: (measurement is not evaluation) This is where we need: Qualitative supplementary work (with quantitative IE) and/or Theory-based impact evaluation Why? To allow us to interpret the IE measurements … and identify why a project is working … or not … or how it can be improved … etc …

66 This helps address one more criticism of the current wave of IE studies … They only tell us: what works … and not much else! This has two disadvantages: –Why would anyone want to be told that their project does not work World Bank employee? Government official? NGO? –If you only know it does not work, what is the implication? Eliminate the program … or fix it? But, how?

67 International Initiative for Impact Evaluation (3ie) is an international organization trying to put the “how” [or wow] in rigorous IE with “theory-based evaluation” or “causal chain analysis” Example: a nutrition project in Bangladesh Source: Howard White and Edoardo Masset (2007) ‘The Bangladesh Integrated Nutrition Program: findings from an impact evaluation’ Journal of International Development 19:

68 Bangladesh Integrated Nutrition Project (BINP) … a World Bank Project Problem: lots of malnutrition … difficult to solve in traditional institutional structures  Growth monitoring, nutritional counselling and supplementary feeding (based on a program in Tamil Nadu, which was successful) According to the design of the project, implemented by NGOs at field level, used Community Nutrition Practitioners (CNPs)

69 Program design (theory of change) A B1 B2 E C1 C2 D1 D2 D3 children mothers targeting participants counselling supplemental feeding change behavior in child nutrition sufficient qnty / qlty no leakage / substitute improved nutrition outcome

70 The evaluation story Looked like it was working – all bits in place and outcome monitoring data showed fall in severe malnutrition Bank agreed to scale up (this is an expensive program … funded at expense of other projects) But Save the Children UK critical, though Bank’s M&E team was positive Bank’s evaluation department (IEG) did a more rigorous evaluation – found little or no impact Theory-based approach explains why

71 Measuring outcomes and impacts (M&E) Project M&E: Post treatment control differences OED: propensity score matching (PSM) Height for Age Scores single differences (between treatment and controls) Mid term After the project

72 The evaluation story Looked like it was working – all bits in place and outcome monitoring data showed fall in severe malnutrition Bank agreed to scale up (this is an expensive program … funded at expense of other projects) But Save the Children UK critical, though Bank’s M&E team was positive Bank’s evaluation department (IEG) did a more rigorous evaluation – found little or no impact Theory-based approach explains why

73 Measuring outcomes and impacts Project M&E: Post treatment control differences OED: propensity score matching (PSM) Height for Age Scores single differences (between treatment and controls) Two points: 1.) need for control group that is similar to treatment group (this is what PSM does) 2.) demands an explanation of the problems with the program Mid term After the project

74 The evaluation story Looked like it was working – all bits in place and outcome monitoring data showed fall in severe malnutrition Bank agreed to scale up (this is an expensive program … funded at expense of other projects) But Save the Children UK critical, though Bank’s M&E team was positive Bank’s evaluation department (IEG) did a more rigorous evaluation – found little or no impact Theory-based approach explains why

75 AssumptionFindings Provide nutritional counseling to care givers Mothers are not decision makers, especially if they live with their mother-in-law Women know about sessions and attend 90% participation, lower in more conservative areas Malnourished and growth faltering children correctly identified No – community nutrition practitioners (CNPs) cannot interpret growth charts Women acquire knowledgeThose attending training do so And knowledge is turned into practiceNo there is a substantial knowledge- practice gap Supplementary feeding is additional food for intended beneficiary No, considerable evidence of substitution and leakage Adopted changes are sufficient to improve intended outcomes Only sometimes Implementing theory-based analysis

76 Program design (theory of change) A B1 B2 E C1 C2 D1 D2 D3 children mothers targeting participants counselling supplemental feeding change behavior in child nutrition sufficient qnty / qlty no leakage / substitute improved nutrition outcome

77 Impacts when mother participated Project M&E: Post treatment control differences OED: propensity score matching (PSM) Height for Age Scores single differences (between treatment and controls) When examining just mothers that did not live with their mother in laws … and babies were supposed to be in the project … Mid term After the project

78 Illustrating the principles of rigorous IE (good measurement and astute analysis) Rigorous evaluation of impact using an appropriate counterfactual: PSM versus simple control Map out the causal chain (programme theory) Understand context: Bangladesh is not Tamil Nadu (where CPNs were well trained; they were NOT well trained in Bangladesh) Anticipate heterogeneity: more malnourished children are in one subgroup of population than others Use mixed methods: informed by anthropology, focus groups, own field visits

79 Final comments Things to remember

80 Why has impact become important? The results agenda –E.g. Millennium Development Goals and report cards –Focus on outcomes an improvement over impact monitoring But outcome monitoring is not impact –Experience of USAID Why has impact evaluation become important?

81 Other places it is taken more seriously Examples of results-orientation –Academic funding in developed countries –Mexico: Progresa/Opportunidades and Coneval Progresa/Opportunidades… –India: outcome budgeting and IEO

82 Key steps for impact evaluation design Well defined objectives and measurable outcomes A credible counterfactual, usually using a comparison group based on either random assignment or matching (before versus after and naïve comparisons don’t work)

83 Key steps for impact evaluation design (2) Baseline data considerably strengthens the design [start early!!] A theory-based approach (using causal chain) allowing analysis of causal pathways

84 The appeal of random assignment In a randomized control trial the treatment is randomly allocated This is more possible than you may think, with examples from governance, legal reform, and environment as well as health and education Most arguments against it are readily countered

85 Alternatives to randomization Randomization not always possible Then use statistical matching / other quasi-experimental methods … Don’t give up on observational data (either your own …or others  Can often examine program impact using existing data (e.g. IADB vocational training studies) “Dean Karlan fights poverty one RCT at a time …” … this is the type of statements that get Esther duFlo and others who practice rigorous IE in trouble … [on cover of his book: “More Than Good Intentions”

86 When to do an impact evaluation Pilot programs – only a meaningful pilot if you evaluate it (and outcome monitoring tells you nothing about impact) Representative or important programs We have learned a lot about development in the past 40 years … we could have learned a lot more!

87 The final word Impact evaluation matters Matters to spending money well Matters to know how to make programs work better Matters to ending poverty

88 Thank you [ see next slide for schedule for rest of today and tomorrow ]

89 Rest of the day / tomorrow Doing Impact Evaluation  When Relying on Randomized Control Trials Doing Impact Evaluation  With Quasi- experimental Methods


Download ppt "The Overall Agenda When Will We Ever Learn: general introduction to Impact Evaluation When Random Assignment is Possible? –Implementing and Evaluating."

Similar presentations


Ads by Google