Presentation on theme: "Monitoring and Evaluating the Performance and Impact of ESF Interventions Carolyn J. Heinrich University of Wisconsin-Madison Shaping the Future of the."— Presentation transcript:
Monitoring and Evaluating the Performance and Impact of ESF Interventions Carolyn J. Heinrich University of Wisconsin-Madison Shaping the Future of the ESF ESF and Europe 2020 23rd-24th June 2010, Workshop 1 Learning (monitoring, evaluation, mutual learning, etc.)
Monitoring and evaluation in the ESF ESF is committed to strengthening its capacity to evaluate added value and use results-based management in applying what is learned to improve program outcomes Evidence-based policy making: basing public policies and practices on scientifically rigorous evidence on impacts Performance management: regularly assessing policy or program performance to promote learning and performance improvement and to hold public officials accountable for policy and program results What are key challenges and prospects in evaluating active labor market policies/ESF-funded schemes and feeding results back into the policy-making cycle?
Key issues/questions addressed in this presentation Can rigorous standards and methods for identifying program impacts be reconciled with demands for public accountability and production of timely performance information for decision-making? How can evidence on performance and impacts be made more accessible in the policy cycle for improving policy/program outcomes?
Key issues/questions (cont.) What are the costsfinancial and political associated with performance assessment and impact evaluation activities and use of this information in policy making? What do recent U.S. impact evaluation and performance management efforts reveal about program effectiveness in increasing skills levels, employment and earnings among the workforce, particularly for economically vulnerable groups?
Basic tasks/challenges in performance and impact evaluation Clearly define and agree on measurable performance and evaluation goals ESF objectives: social protection, social inclusion, employment growth and labor market insertion Constructing performance measures for only a subset of objectives can contribute to distorted responses U.S. focus on earnings impacts reflects societal values Identify empirical measures (and data sources) for calculating outcomes and determine methods for assessing program performance and impacts
Knowledge creation under performance pressures Calculating policy/program impacts (or added value) requires: Observation and measurement of program over time Careful construction of counterfactual state (what would have happened in absence of the program) Hypothesis testing and estimation using quantitative (experimental or nonexperimental) methods Performance analysis: relies on shorter-term measures for regularly monitoring program processes and outcomes (no counterfactual) May be characterized by broader range of acceptable evidence, more participatory discourse and larger role for public in evaluation (e.g., citizen satisfaction measures)
Holy grail for policymakers and evaluators Identifying performance measures where effects of public sector actions on measured performance are aligned with effects of those same actions on policy or program added value Empirically, investigations look for strong correlation between public program performance measured with simple, short-term measures (e.g., job placements) and valued added by programs over a longer term (e.g., net impacts on individual earnings)
Evidence from experimental evaluations Weak relationships between simple, short-term outcome measures and impacts (added value) U.S. Job Training Partnership Act Study: measures of job entry at program exit (and 13 weeks after exit) were weakly or negatively correlated with earnings impacts measured 18 and 30 months after random assignment U.S. National Job Corps youth training program evaluation: annual performance rankings produced to identify high and low performing centers (completed high school equivalency diploma, hourly wage, hours/week worked, and weeks in last 13 employed) were uncorrelated with earnings impacts estimated at 12, 30 and 48 months after random assignment
Sensitivity of results to modeling and methodological choices Impact estimates sensitive to length of time over which outcomes are observed Meta-analyses of training programs (incl. a 2009 analysis using 199 estimates from 97 recent studies, 4/5 from Western Europe) find job training programs (with longer durations) have small or negative initial impacts, but impacts become positive in 2 nd or 3 rd years
Experimental vs. nonexperimental On the contrary, impact estimates differ little between experimental and nonexperimental studies Important recent study randomly assigned subjects to experimental and observational studies and corroborated this finding Random assignment frequently alters typical client selection process, while nonexperimental methods require modeling processes of selection into programs and production of outcomes
Accounting for selection and strategic responses By design, many evaluation systems involve stakeholders in tasks of producing evidence on effectiveness for performance accountability Accounting for who receives services is particularly important in performance assessment/accountability systems that establish standards for performance Research confirms strategic responses (cream-skimming, restricted access to services and other resource diversions to unproductive uses) Formal adjustments to performance standards can reduce risks of working with hard-to-serve groups
Balancing tradeoffs using multiple methods and measures More sophisticated methods for periodically assessing added value/impacts balanced with process and outcome measures that are more accessible and regularly available to public Different weights may be placed on measures in decision making Accommodating larger role for public in production/use of information while limiting vulnerability to politicization or inappropriate use of information by stakeholders Requires transparency in processes and methods for both producing and using information
Dialogue and transparency Citizens may contribute to interactive dialogues in learning forums, etc. Dialogue among policymakers and public after expert peer review to verify quality of performance/evaluation information Citizen satisfaction measures: beneficial if citizen satisfaction reports reflect service quality and improvement Research shows little relationship between self-reported satisfaction/outcomes and program impacts
Workforce Investment Act (WIA) Impact Evaluation Largest workforce development program in U.S.; program implementation differs by state and local area Distinctive features: universal access to employment and training services; One-Stop service delivery system to improve coordination and integration of employment and training services with other social services; use of training vouchers (Individual Training Accounts) to allow training recipients to purchase services from private sector Nonexperimental impact evaluation of WIA Adult and Dislocated Worker programs commissioned by U.S. Department of Labor in 2007
WIA programs evaluated Two primary adult programs serve: Disadvantaged workers: unemployed, underemployed and those in low-paying and unstable jobs Dislocated workers: have lost jobs or are scheduled to be laid off Voluntary participation; universal access but some restrictions through sequencing of services Core services: outreach, job search, placement aid, and labor market information Intensive services: comprehensive assessments, individual employment plans, counseling and career planning Training services: mostly occupational/vocational training, some on-the-job training
Nonexperimental impact evaluation Outcomes: earnings and employment four years after program entry Estimation: average program impact across 12 states for WIA participants entering in program years 2003 and 2004 (relative to two comparison groups) Approximately 160,000 WIA participants and 3 million comparison group members Methodology: propensity score matching Within states, by gender and within quarter of participation
Administrative data used to construct comparison groups Comparison groups: From programs with substantial overlap Individuals with employment problems seeking assistance WIA participants receive most intensive services WIA U.S. Employment Services UI Claimants 12 states: Connecticut, Indiana, Kentucky, Maryland, Mississippi, Missouri, Minnesota, Montana, New Mexico, Tennessee, Utah, Wisconsin
Impact of WIA Training on Quarterly Earnings for Females
Impact of WIA Training on Quarterly Earnings for Males
Impact of Dislocated Worker Program on Female Quarterly Earnings
Impact of Dislocated Worker Program on Male Quarterly Earnings
Summary of WIA evaluation findings Women obtain greater benefits from participation More women receive classroom training; men more likely to get on-the-job training Value of training (compared to intensive or core services) is greater, particularly in long run Training recipients have lower initial earnings but catch up within 2.5 years Differences in program impacts/patterns of impacts across states reflect differences in local economic environments, program structure, and labor force composition Nonexperimental evaluation cost: less than $1.5 million (mostly for data assembly), or 1/20 of anticipated experimental evaluation costs
Concluding comments Appropriate methodologies to apply in performance or impact evaluation depend on: Policy questions, goals and context, data quality/availability, and program design/implementation Policymakers will need to rely on some combination of impact evaluation and performance information in guiding policy and resource allocation decisions Tradeoffs in timeliness and accuracy (in measuring added value) and limitations inherent in data/methods should be acknowledged Encouraging dialogues about performance and impact evaluation findings is more likely to promote productive uses of evaluation results than attaching rewards or sanctions to results based on performance standards