Presentation on theme: "American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs Larry L. Orr, Independent Consultant Stephen."— Presentation transcript:
American Lessons on Designing Reliable Impact Evaluations, from Studies of WIA and Its Predecessor Programs Larry L. Orr, Independent Consultant Stephen H. Bell, Abt Associates Jacob A. Klerman, Abt Associates
The early evaluations 1960s: MDTA (pre/post) 1970s: – YEDPA (400+ studies; various methods) – CETA (comparison groups from national survey samples) 1980s: – National Academy review of YEDPA studies found “little reliable information on the effectiveness of the programs”, recommended random assignment – More than a dozen CETA evaluations produced widely divergent impact estimates – with essentially the same data (Barnow, 1987) – DOL-convened expert panel recommended random assignment for evaluation of new Job Training Partnership Act (JTPA)
Evaluating the econometric evaluations LaLonde (1986) and Maynard and Fraker (1987) applied a variety of nonexperimental methods to data from a randomized trial, were unable to replicate the experimental estimates Since then, a number of replication studies have been conducted (see summaries in Glazerman et al., 2003; Bloom et al., 2005; and Pirog et al., 2009). No nonexperimental method has consistently replicated experimental results.
The current consensus No known nonexperimental method can reliably produce unbiased estimates of the impact of training programs – this means that you can never know ex post whether you have a good estimate or not Randomized trials are the strongly preferred method of estimating training program impacts on technical grounds Randomized trials are also more intuitively understandable to policy makers than complex econometric methods Nonexperimental studies frequently give rise to technical controversy that detracts from their credibility and acceptance, whereas randomized trials are generally accepted by both evaluators and policy makers
Why is it so hard to obtain reliable results from nonexperimental studies? “Impact” is the difference between trainees’ actual outcomes (e.g., earnings) and what those outcomes would have been without training The fundamental problem of evaluation is to estimate what the trainees’ outcomes would have been without training To see how difficult that task is, consider the time path of earnings for the JTPA control group – individuals who were just like the trainees except that they didn’t get JTPA services…
Time path of earnings, control group, National JTPA Study
What is the margin for error? Treatment Group Control Group
Time path of earnings, program and comparison groups, from Heinrich et al.
Our Conclusions/Recommendations (1) Random assignment is the only safe way to estimate the impacts of training programs – Different nonexperimental approaches yield widely varying results – In dozens of replication studies, nonexperimental methods have almost never satisfactorily replicated the experimental estimates – The stakes are too high to take the kind of risk and uncertainty entailed in nonexperimental methods – Nonexperimental evaluations inevitably shift the debate from substance to method
Our Conclusions/Recommendations (2) If the ESF does decide to use nonexperimental methods: – Need to pay close attention to timing of job loss and pre- program dynamics of earnings in matching comparison group (a necessary, but not sufficient, condition) – Before adopting any nonexperimental method, it should be demonstrated that it replicates multiple experimental results (Note that what should be tested is an algorithm that can be applied in other evaluations, not a set of estimates that are unique to a single evaluation.)
Our Conclusions/Recommendations (3) Learn from our mistakes – don’t spend 40 years repeating them!