Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew.

Slides:



Advertisements
Similar presentations
Experimental and Ex Post Facto Designs
Advertisements

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advantages and limitations of non- and quasi-experimental methods Module 2.2.
Chapter 21 Research Design Applications: Randomized Groups and Correlated Groups.
Introduction to Statistics: Political Science (Class 7) Part I: Interactions Wrap-up Part II: Why Experiment in Political Science?
Stressful Life Events and Its Effects on Educational Attainment: An Agent Based Simulation of the Process CS 460 December 8, 2005.
Reading the Dental Literature
Chapter 3 Producing Data 1. During most of this semester we go about statistics as if we already have data to work with. This is okay, but a little misleading.
Non-Experimental designs: Developmental designs & Small-N designs
Non-Experimental designs: Developmental designs & Small-N designs
What is Descriptive Research Method also known as statistical research describes data and characteristics about the population or phenomenon the questions.
Chapter 8 Experimental Research
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Research Strategies Chapter 6. Research steps Literature Review identify a new idea for research, form a hypothesis and a prediction, Methodology define.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 4: Designing Studies Section 4.2 Experiments.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Evaluating the Experiment from the Inside: Internal Validity Taking a Broader Perspective: The Problem of External Validity Handling a Nonsignificant Outcome.
CHAPTER 9: Producing Data: Experiments. Chapter 9 Concepts 2  Observation vs. Experiment  Subjects, Factors, Treatments  How to Experiment Badly 
What is Science? or 1.Science is concerned with understanding how nature and the physical world work. 2.Science can prove anything, solve any problem,
QUANTITATIVE RESEARCH Presented by SANIA IQBAL M.Ed Course Instructor SIR RASOOL BUKSH RAISANI.
Applying impact evaluation tools A hypothetical fertilizer project.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
JS Mrunalini Lecturer RAKMHSU Data Collection Considerations: Validity, Reliability, Generalizability, and Ethics.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Chapter 8: Simple Linear Regression Yang Zhenlin.
+ Experiments Observational Study versus Experiment In contrast to observational studies, experiments don’t just observe individuals or ask them questions.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
CJ490: Research Methods in Criminal Justice UNIT #4 SEMINAR Professor Jeffrey Hauck.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
Definition Slides Unit 2: Scientific Research Methods.
Definition Slides Unit 1.2 Research Methods Terms.
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Quasi Experimental Methods I
Quasi Experimental Methods I
Research & Writing in CJ
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Statistical Reasoning December 8, 2015 Chapter 6.2
Evaluating Impacts: An Overview of Quantitative Methods
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Chapter 4: Designing Studies
CHAPTER 4 Designing Studies
Chapter 4: Designing Studies
Presentation transcript:

Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew University George Mason University

Oliver Wendell Holmes

Where I am Going Describe how non-experimental evaluation studies attempt to gain unbiased results in a world where outcomes are confounded. Describe how non-experimental evaluation studies attempt to gain unbiased results in a world where outcomes are confounded. Define the fundamental weakness of this approach. Define the fundamental weakness of this approach. Critically examine the folklore that suggests non-experimental studies are good enough despite this weakness. Critically examine the folklore that suggests non-experimental studies are good enough despite this weakness. Folk lore: the traditional beliefs, customs, and stories of a community, passed through the generations by word of mouth. (Oxford Pocket Dictionary) Folk lore: the traditional beliefs, customs, and stories of a community, passed through the generations by word of mouth. (Oxford Pocket Dictionary)

Experiments are Good Enough Experimental studies provide a statistical solution to the problem of confounding. Experimental studies provide a statistical solution to the problem of confounding. They should be good enough. They should be good enough. Critically examine the folk lore that seems to suggest that experiments are not good enough despite their statistical advantages. Critically examine the folk lore that seems to suggest that experiments are not good enough despite their statistical advantages.

Neutralizing Confounding in Non-Experimental Research

The Key Question In evaluating treatments or programs the key issue is getting an unbiased estimate of the treatment effect. In evaluating treatments or programs the key issue is getting an unbiased estimate of the treatment effect. Without that, any other considerations such as the ability to generalize results are superfluous. Without that, any other considerations such as the ability to generalize results are superfluous. The main problem we face is that treatment is confounded with other factors. The main problem we face is that treatment is confounded with other factors.

The Problem We Need to Solve Example: We measure the effect of prison on recidivism. Example: We measure the effect of prison on recidivism. We find that prison increases recidivism. We find that prison increases recidivism. But the reason for this may be that we have not taken into account the fact the prisoners are more likely to recidivate in the first place because they have on average more severe prior records. But the reason for this may be that we have not taken into account the fact the prisoners are more likely to recidivate in the first place because they have on average more severe prior records. Treatment (prison) is confounded with prior record. Treatment (prison) is confounded with prior record.

Creating Unbiased Estimates in Non-Experimental Studies Non-experimental methods such as regression techniques or matching rely on a similar logic. Non-experimental methods such as regression techniques or matching rely on a similar logic. If we know what the factors are that confound treatment we can take them into account. If we know what the factors are that confound treatment we can take them into account. The primary method of doing this is statistical (Multivariate Statistical Methods). The primary method of doing this is statistical (Multivariate Statistical Methods). But Quasi-Experiments that rely on matching, or propensity scores are based on the same logic.But Quasi-Experiments that rely on matching, or propensity scores are based on the same logic.

Solving the Problem Statistically: CC is the Confounding Cause

Elegant Solution, But… If we want to get an unbiased estimate of treatment in a non-experimental study we would in theory have to identify all confounding causes. If we want to get an unbiased estimate of treatment in a non-experimental study we would in theory have to identify all confounding causes. That assumption is on its face unrealistic, but evaluation researchers often use folk lore to argue that non-experimental studies are in any event good enough. That assumption is on its face unrealistic, but evaluation researchers often use folk lore to argue that non-experimental studies are in any event good enough.

The Folk Lore of Non- Experimental Evaluations Non-Experimental Methods are Good Enough?

1) Overall We Identify the Most Important Causes

Arent we Doing Well Enough? A common defense for non-experimental methods is that our models take into account the most important factors. A common defense for non-experimental methods is that our models take into account the most important factors. The assumption here is that in practice we dont have to worry about excluded variables. The assumption here is that in practice we dont have to worry about excluded variables. The major ones (that might effect the outcomes in meaningful ways) are already known and accounted for in the model. The major ones (that might effect the outcomes in meaningful ways) are already known and accounted for in the model.

Impact of Small Excluded Effects With Little Influence is Small

How Well do Criminologists Explain Crime Alex Piquero and I have recently published an article in Crime and Justice in which we examined this assumption. Alex Piquero and I have recently published an article in Crime and Justice in which we examined this assumption. We reviewed all the articles in Criminology that used multivariate statistical modeling to examine a criminological theory and provided some measurement of variance explained. We reviewed all the articles in Criminology that used multivariate statistical modeling to examine a criminological theory and provided some measurement of variance explained. While my concern here is isolating a treatment effect, the question is similar since we would not expect our understanding of treatments or programs to be very different then our underlying understanding of crime and justice. While my concern here is isolating a treatment effect, the question is similar since we would not expect our understanding of treatments or programs to be very different then our underlying understanding of crime and justice.

Average Variance Explained Across the articles that reported an R2 value over the time period covered, the average R2 was.389. Across the articles that reported an R2 value over the time period covered, the average R2 was.389. Some 25% of the 169 articles exhibit R2 values of below.20, while over 70% have an R2 under.50. Some 25% of the 169 articles exhibit R2 values of below.20, while over 70% have an R2 under.50.

Aggregate R2 Value over Time (N=169 articles). (Note: Years with zero observations are removed for ease of presentation.)

The Folk Lore is Most Likely Wrong There is a good deal left unexplained, most often more than half the variance. There is a good deal left unexplained, most often more than half the variance. It would seem very difficult to assume that in all of this variance unexplained there are not very meaningful confounding factors that are routinely excluded. It would seem very difficult to assume that in all of this variance unexplained there are not very meaningful confounding factors that are routinely excluded.

2) If the Effect of Treatment is Large than You can Assume that Excluded Causes Would not Change that Estimate in a Meaningful Way

This Effect is Large Enough Not to Worry About! Another folk lore often used to defend a reliance on non-experimental methods is that very large and robust effects are not likely to be meaningfully altered even if there are unmeasured confounding factors. Another folk lore often used to defend a reliance on non-experimental methods is that very large and robust effects are not likely to be meaningfully altered even if there are unmeasured confounding factors. Statisticians in contrast have often noted the instability of regression parameters under differing assumptions. Statisticians in contrast have often noted the instability of regression parameters under differing assumptions.

AOC Death Penalty Study Joe Naus from Rutgers University and I were asked by the AOC of New Jersey to Assess the Effects of Race on Death Penalty Sentencing. Joe Naus from Rutgers University and I were asked by the AOC of New Jersey to Assess the Effects of Race on Death Penalty Sentencing. Following an approach that identified major factors influencing death penalty sentencing we developed a model that showed a very significant effect of race of victim on the likelihood of advancement to penalty trial. Following an approach that identified major factors influencing death penalty sentencing we developed a model that showed a very significant effect of race of victim on the likelihood of advancement to penalty trial.

White Victim is the Single Most Significant Effect on Advancement to Penalty Trial

Regional Effects The State Prosecutor argued that the effect of race of victim was confounded by district of prosecution. The State Prosecutor argued that the effect of race of victim was confounded by district of prosecution. He noted that counties that had large numbers of white victims were places where it was more likely for a case to go to penalty trial for other reasons. He noted that counties that had large numbers of white victims were places where it was more likely for a case to go to penalty trial for other reasons. For example, the cases with large numbers of white victims were in counties with many fewer death eligible cases. Prosecutors in such cases were more likely to focus in more aggressively on such cases. For example, the cases with large numbers of white victims were in counties with many fewer death eligible cases. Prosecutors in such cases were more likely to focus in more aggressively on such cases.

White Victim Controlling for County

3) We can Assume that the Biases are Balanced

Everything Will Balance Off in the End A common folklore is that the excluded variables balance each other, so we can assume that the parameter estimate is unbiased. A common folklore is that the excluded variables balance each other, so we can assume that the parameter estimate is unbiased. This assumption relies on a model in which the exclusion of variables is random, and therefore we would assume an unbiased estimate of b. This assumption relies on a model in which the exclusion of variables is random, and therefore we would assume an unbiased estimate of b. If this assumption had any basis to it we could just rely on the bivariate model. No-one would argue that that model provides an unbiased estimate! If this assumption had any basis to it we could just rely on the bivariate model. No-one would argue that that model provides an unbiased estimate!

Knowledge Development is not Random Indeed, there is good reason to believe that we identify variables in clusters around specific theoretical constructs (like poverty or social disorganization). Indeed, there is good reason to believe that we identify variables in clusters around specific theoretical constructs (like poverty or social disorganization). By definition we are then missing clusters which are likely to cause bias in specific directions. By definition we are then missing clusters which are likely to cause bias in specific directions. Data restrictions (e.g. gathering official data) are likely to be even more systematic in their biases. Data restrictions (e.g. gathering official data) are likely to be even more systematic in their biases.

So Why are Experiments Good Enough?

Randomized Experiments: A Naïve Approach Because treatment has been allocated randomly, in theory it is not going to be related systematically to other factors such as gender, race, age, attitudes etc. Because treatment has been allocated randomly, in theory it is not going to be related systematically to other factors such as gender, race, age, attitudes etc. THERE ARE NO CONFOUNDING CAUSES! THERE ARE NO CONFOUNDING CAUSES!

No Confounding!

So Rather than Taking Confounding Causes Into Account a Randomized Experiment Makes the Confounding Irrelevant The product of the correlations is zero in a randomized experiment.

The Folk Lore of Why Experiments are Not Good Enough

1) Experiments are Not Ethical Many people still claim that it is not ethical to carry out social experiments. Many people still claim that it is not ethical to carry out social experiments. It seems that at least in crime and justice evaluation researchers dont really accept this folk lore ( Lum and Yang, 2003). It seems that at least in crime and justice evaluation researchers dont really accept this folk lore ( Lum and Yang, 2003).

Randomized experimental design is the best method of linking cause and effect. t= -2.70* p=.010

Randomized experiments cannot be carried out ethically in criminal justice settings. t= p=.051

2) Experiments Cannot be Implemented in the Real World Crime Reduction Experiments ( N=267) Crime Reduction Experiments ( N=267)

3) Experiments Have Low External Validity Only innovative agencies are willing to participate in experiments. Only innovative agencies are willing to participate in experiments. Ordinary agencies may be brought on board if there is strong governmental encouragement and financial support that rewards participation. Ordinary agencies may be brought on board if there is strong governmental encouragement and financial support that rewards participation. Experiments operate in an artificial world that is controlled and not dynamic. Experiments operate in an artificial world that is controlled and not dynamic. There is no free lunch! There is no free lunch!

Randomized Experiments are Good Enough Everything Else is Commentary

The great Talmudic scholar Hillel responded when asked to explain Judaism on one foot, that its essence was the dictum: Treat others as you would like them to treat you. He then noted that everything else is commentary. The great Talmudic scholar Hillel responded when asked to explain Judaism on one foot, that its essence was the dictum: Treat others as you would like them to treat you. He then noted that everything else is commentary. In our case, the essence of evaluation research is that experiments are good enough. In our case, the essence of evaluation research is that experiments are good enough. Non-experimental methods are not good enough. Non-experimental methods are not good enough.

The Commentary Of course, as in the case of Hillel, the commentary is very important. Of course, as in the case of Hillel, the commentary is very important. But a simple rule should follow. We should begin any study with an assumption that an experimental design is required. We should only then get to the commentary. But a simple rule should follow. We should begin any study with an assumption that an experimental design is required. We should only then get to the commentary.