© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.

Slides:

Advertisements

Similar presentations

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:

Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.

Propensity Score Matching and the EMA pilot evaluation

Introduction to Propensity Score Matching

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.

Mywish K. Maredia Michigan State University

Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.

Cross Sectional Designs

The World Bank Human Development Network Spanish Impact Evaluation Fund.

VIII Evaluation Conference ‘Methodological Developments and Challenges in UK Policy Evaluation’ Daniel Fujiwara Senior Economist Cabinet Office & London.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.

Introduction to statistics in medicine – Part 1 Arier Lee.

PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.

Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.

Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

Personality, 9e Jerry M. Burger

TOOLS OF POSITIVE ANALYSIS

Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland

Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.

Research Design Methodology Part 1. Objectives  Qualitative  Quantitative  Experimental designs  Experimental  Quasi-experimental  Non-experimental.

Applying Science Towards Understanding Behavior in Organizations Chapters 2 & 3.

Chapter 1: Introduction to Statistics

Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.

Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.

TRANSLATING RESEARCH INTO ACTION What is Randomized Evaluation? Why Randomize? J-PAL South Asia, April 29, 2011.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)

Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.

Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)

Correlational Research Chapter Fifteen Bring Schraw et al.

Research Seminars in IT in Education (MIT6003) Research Methodology I Dr Jacky Pow.

© Nuffield Trust 22 June 2015 Matched Control Studies: Methods and case studies Cono Ariti

Matching Estimators Methods of Economic Investigation Lecture 11.

Impact Evaluation in Education Introduction to Monitoring and Evaluation Andrew Jenkins 23/03/14.

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.

Notes on Research Design You have decided –What the problem is –What the study goals are –Why it is important for you to do the study Now you will construct.

CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.

QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.

Applying impact evaluation tools A hypothetical fertilizer project.

1.) *Experiment* 2.) Quasi-Experiment 3.) Correlation 4.) Naturalistic Observation 5.) Case Study 6.) Survey Research.

7.4 DV’s and Groups Often it is desirous to know if two different groups follow the same or different regression functions -One way to test this is to.

Chapter 10 Finding Relationships Among Variables: Non-Experimental Research.

Experimental Research Methods in Language Learning Chapter 5 Validity in Experimental Research.

FINAL MEETING – OTHER METHODS Development Workshop.

Lorraine Dearden Director of ADMIN Node Institute of Education

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

1 General Elements in Evaluation Research. 2 Types of Evaluations.

Randomized Assignment Difference-in-Differences

Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.

1 Prepared by: Laila al-Hasan. 1. Definition of research 2. Characteristics of research 3. Types of research 4. Objectives 5. Inquiry mode 2 Prepared.

Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.

Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.

ENDOGENEITY - SIMULTANEITY Development Workshop. What is endogeneity and why we do not like it? [REPETITION] Three causes: – X influences Y, but Y reinforces.

STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.

STEP - 4 Research Design 1. The term “research design” can be defined as, The systematic study plan used to turn a research question or research questions.

Chapter 6 Selecting a Design. Research Design The overall approach to the study that details all the major components describing how the research will.

Looking for statistical twins

Development Impact Evaluation in Finance and Private Sector

Impact Evaluation Methods

Matching Methods & Propensity Scores

Evaluating Impacts: An Overview of Quantitative Methods

Product moment correlation

Positive analysis in public finance

Introduction to the design (and analysis) of experiments

Presentation transcript:

© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education & Institute for Fiscal Studies Presentation for the Social Research Association (SRA) Annual Conference, British Library, 6 th December 2012

Introduction Understanding what works and what doesn’t work is crucial in social research To do this effectively, you need good qualitative work coupled with robust impact evaluation and an assessment of both costs and impact. Why? –Inappropriate quantitative methods can often find a correlation between a policy and an outcome that is not causal which is of no use for social research or policy making (Patrick’s talk) –Finding a causal ex post quantitative impact on an outcome of interest is generally only part of the story and will often not tell you why a policy is having an impact – need qualitative work to help this this (William’s talk) or more sophisticated evaluation models (dynamic structural models) –Just because something has a quantitative impact doesn’t mean it is a good policy if the costs are large - need some assessment of both benefits and costs

© Ins The Evaluation Problem Most empirical questions in social research can be set up in an evaluation framework What most empirical social researchers want to ask is: – what is the causal impact/effect of some program/variable of interest on an outcome of interest? Good quantitative evaluation methods try to utilise methods that can estimate this causal impact in a robust way However, there is no off the shelf evaluation technique that can be used in all circumstances which is a mistake often made in social research Best method depends on nature of the intervention; the way it was introduced; the richness of the data; whether outcomes are also measured before the intervention...

© Institute The missing counterfactual and selection Question we want to answer: What is the effect of some program/treatment on some outcome of interest compared to the outcome if the program/treatment had not taken place Problem is that we never observe this missing counterfactual Fine if program/treatment is randomly assigned, but in most social research settings this is not the case Generally have to construct a counterfactual group from those who don’t get treatment But these two groups are generally systematically different from each other in both observed and unobserved characteristics which means they are often not a good counterfactual group – selection problem

So how do we get around this selection problem? Non-experimental evaluation techniques use a variety of statistical methods to identify the causal impact of a treatment on an outcome of interest Generally rely on having good quality data; and/or a natural or social experiment (policy accident/pilot study) Methods differ in the assumptions they make in order to recover the missing counterfactual but try to replicate a randomised control trial Take you through a brief tour of some of these: Matching methods Regression Discontinuity Design (Patrick already discussed) Instrumental and Control Function methods (won’t discuss) Difference- in-Difference (DID) methods Dynamic structural models

Matching Methods Need to have a well defined treatment and control group Relies on having a rich set of pre-program/treatment variables for those who get treatment and those who don’t – matching variables The matching variables need to be good predictors of whether you get treatment or not and/or the outcome of interest They need to be measured before the treatment – you cannot match on any variable which has the potential to get affected by the program/treatment Crucial Assumption: assume ALL relevant differences between the groups pre-treatment can be captured by the matching variables Conditional Independence Assumption (CIA)

s How do you match? Regression Models Standard regression models are matching models but have quite strong assumptions Simply regress outcome of interest on matching variables and treatment variable (dummy variable of whether or not you receive program/treatment) –Coefficient on treatment dummy variable gives you effect Some Key Assumptions: –that there is only selection on the basis of the matching variables –That a linear model can accurately specify the relationship between the matching variables and the outcome of interest –Effect of the matching variables on outcome of interest doesn’t change as a result of the intervention –Can relax (test) this last assumption using a regression framework by interacting matching variables with treatment variable

Propensity Score Matching (PSM) More flexible matching method but more computationally difficult Involves selecting from the non-treated pool a control group in which the distribution of observed/matching variables is as similar as possible to the distribution in the treated group –This is done by deriving weights which make the control group look like treatment group in terms of matching variables There are a number of ways of doing this but they almost always involve calculating the propensity score The propensity score is the predicted probability of being in the treatment group, given your matching characteristics –Can do this using traditional regression techniques and significant variables in this estimation procedure will pick up matching variables that systematically differ between two groups Rather than matching on the basis of all matching variables can match on basis of this propensity score (Rosenbaum and Rubin (1983))

How do we match using propensity score? Nearest neighbour matching –each person in the treatment group choose the individual in the control group with the closest propensity score to them –can do this with (most common) or without replacement –not very efficient as discarding a lot of information about the control group (throw away all people not matched and may use some individuals a lot of times) Kernel based matching – each person in the treatment group is matched to a weighted sum of individuals (adding to one) who have similar propensity scores with greatest weight being given to people with closer scores – Some methods use ALL people in non-treated group (e.g. Gaussian kernel) whereas others only use people within a certain range (e.g. Epanechnikov )

Estimated impact with PSM Compare mean outcome in treated group to the appropriately weighted mean outcome in the control group (using propensity score weights) So just comparing two (weighted) means No guarantee that you can come up with matching weights that make the two groups look the same in terms of matching variables – Quite common if two groups are fundamentally different – Can drop those for whom you can’t find matches (imposing common support) but then effect is measured only on a sub-sample Don’t have to specify how the matching variables affect outcome so much more flexible and robust than regression methods but much less efficient

Difference-in-difference methods DID approach uses a natural experiment to mimic the randomisation of a social experiment Natural experiment – some naturally occurring event which creates a policy shift for one group and not another - E.g. It may be a change in policy in one jurisdiction but not another The difference in outcomes between the two groups before and after the policy change gives the estimate of the policy impact Requires either longitudinal data on same person/firm/area or repeated cross section data on similar persons/firms/areas (where samples are drawn from the same population) before and after the intervention Assumes that change that occurs to control group would have happened to treatment group in absence of policy change so any additional change is the impact of the policy (ATT) Can do matched DID (DID using propensity score weights)

Dynamic Structural Models Evaluation techniques described so far are for analysing impact of changes we have observed (ex post) –Results are specific to the policy, time and environment How can we model the impact of future (ex ante) policy reforms? Build a dynamic structural model of impact based on theory which can disentangle impact of programme on incentives from how incentives affect individual decisions (cf traditional evaluation methods) Use existing quasi-experimental results/data to estimate and validate (calibrate) models When/If succeed in doing this use model to simulate policy impact of new policies. Very difficult and computationally complex so models to date tend to be very simple but increasing computer power means that this is a new and exciting area in evaluation and of extreme policy interest.

Conclusions Number of options available when evaluating whether something has impact or is likely to have impact in social research Depends on nature of intervention, available data, question you want to answer Each methods has advantages and disadvantages and involves assumptions that may or may not be credible and all these factors have to be carefully assessed New PEPA Node based at IFS will be looking at improving/developing programme evaluation methods for policy analysis as well as running a comprehensive training and capacity building program.