Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew.

Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew University George Mason University

Oliver Wendell Holmes

Where I am Going Describe how non-experimental evaluation studies attempt to gain unbiased results in a world where outcomes are confounded. Describe how non-experimental evaluation studies attempt to gain unbiased results in a world where outcomes are confounded. Define the fundamental weakness of this approach. Define the fundamental weakness of this approach. Critically examine the folklore that suggests non-experimental studies are good enough despite this weakness. Critically examine the folklore that suggests non-experimental studies are good enough despite this weakness. Folk lore: the traditional beliefs, customs, and stories of a community, passed through the generations by word of mouth. (Oxford Pocket Dictionary) Folk lore: the traditional beliefs, customs, and stories of a community, passed through the generations by word of mouth. (Oxford Pocket Dictionary)

Experiments are Good Enough Experimental studies provide a statistical solution to the problem of confounding. Experimental studies provide a statistical solution to the problem of confounding. They should be good enough. They should be good enough. Critically examine the folk lore that seems to suggest that experiments are not good enough despite their statistical advantages. Critically examine the folk lore that seems to suggest that experiments are not good enough despite their statistical advantages.

Neutralizing Confounding in Non-Experimental Research

The Key Question In evaluating treatments or programs the key issue is getting an unbiased estimate of the treatment effect. In evaluating treatments or programs the key issue is getting an unbiased estimate of the treatment effect. Without that, any other considerations such as the ability to generalize results are superfluous. Without that, any other considerations such as the ability to generalize results are superfluous. The main problem we face is that treatment is confounded with other factors. The main problem we face is that treatment is confounded with other factors.

The Problem We Need to Solve Example: We measure the effect of prison on recidivism. Example: We measure the effect of prison on recidivism. We find that prison increases recidivism. We find that prison increases recidivism. But the reason for this may be that we have not taken into account the fact the prisoners are more likely to recidivate in the first place because they have on average more severe prior records. But the reason for this may be that we have not taken into account the fact the prisoners are more likely to recidivate in the first place because they have on average more severe prior records. Treatment (prison) is confounded with prior record. Treatment (prison) is confounded with prior record.

Creating Unbiased Estimates in Non-Experimental Studies Non-experimental methods such as regression techniques or matching rely on a similar logic. Non-experimental methods such as regression techniques or matching rely on a similar logic. If we know what the factors are that confound treatment we can take them into account. If we know what the factors are that confound treatment we can take them into account. The primary method of doing this is statistical (Multivariate Statistical Methods). The primary method of doing this is statistical (Multivariate Statistical Methods). But Quasi-Experiments that rely on matching, or propensity scores are based on the same logic.But Quasi-Experiments that rely on matching, or propensity scores are based on the same logic.

Solving the Problem Statistically: CC is the Confounding Cause

Elegant Solution, But… If we want to get an unbiased estimate of treatment in a non-experimental study we would in theory have to identify all confounding causes. If we want to get an unbiased estimate of treatment in a non-experimental study we would in theory have to identify all confounding causes. That assumption is on its face unrealistic, but evaluation researchers often use folk lore to argue that non-experimental studies are in any event good enough. That assumption is on its face unrealistic, but evaluation researchers often use folk lore to argue that non-experimental studies are in any event good enough.

The Folk Lore of Non- Experimental Evaluations Non-Experimental Methods are Good Enough?

1) Overall We Identify the Most Important Causes

Arent we Doing Well Enough? A common defense for non-experimental methods is that our models take into account the most important factors. A common defense for non-experimental methods is that our models take into account the most important factors. The assumption here is that in practice we dont have to worry about excluded variables. The assumption here is that in practice we dont have to worry about excluded variables. The major ones (that might effect the outcomes in meaningful ways) are already known and accounted for in the model. The major ones (that might effect the outcomes in meaningful ways) are already known and accounted for in the model.

Impact of Small Excluded Effects With Little Influence is Small

How Well do Criminologists Explain Crime Alex Piquero and I have recently published an article in Crime and Justice in which we examined this assumption. Alex Piquero and I have recently published an article in Crime and Justice in which we examined this assumption. We reviewed all the articles in Criminology that used multivariate statistical modeling to examine a criminological theory and provided some measurement of variance explained. We reviewed all the articles in Criminology that used multivariate statistical modeling to examine a criminological theory and provided some measurement of variance explained. While my concern here is isolating a treatment effect, the question is similar since we would not expect our understanding of treatments or programs to be very different then our underlying understanding of crime and justice. While my concern here is isolating a treatment effect, the question is similar since we would not expect our understanding of treatments or programs to be very different then our underlying understanding of crime and justice.

Average Variance Explained Across the articles that reported an R2 value over the time period covered, the average R2 was.389. Across the articles that reported an R2 value over the time period covered, the average R2 was.389. Some 25% of the 169 articles exhibit R2 values of below.20, while over 70% have an R2 under.50. Some 25% of the 169 articles exhibit R2 values of below.20, while over 70% have an R2 under.50.

Aggregate R2 Value over Time (N=169 articles). (Note: Years with zero observations are removed for ease of presentation.)

The Folk Lore is Most Likely Wrong There is a good deal left unexplained, most often more than half the variance. There is a good deal left unexplained, most often more than half the variance. It would seem very difficult to assume that in all of this variance unexplained there are not very meaningful confounding factors that are routinely excluded. It would seem very difficult to assume that in all of this variance unexplained there are not very meaningful confounding factors that are routinely excluded.

2) If the Effect of Treatment is Large than You can Assume that Excluded Causes Would not Change that Estimate in a Meaningful Way

This Effect is Large Enough Not to Worry About! Another folk lore often used to defend a reliance on non-experimental methods is that very large and robust effects are not likely to be meaningfully altered even if there are unmeasured confounding factors. Another folk lore often used to defend a reliance on non-experimental methods is that very large and robust effects are not likely to be meaningfully altered even if there are unmeasured confounding factors. Statisticians in contrast have often noted the instability of regression parameters under differing assumptions. Statisticians in contrast have often noted the instability of regression parameters under differing assumptions.

AOC Death Penalty Study Joe Naus from Rutgers University and I were asked by the AOC of New Jersey to Assess the Effects of Race on Death Penalty Sentencing. Joe Naus from Rutgers University and I were asked by the AOC of New Jersey to Assess the Effects of Race on Death Penalty Sentencing. Following an approach that identified major factors influencing death penalty sentencing we developed a model that showed a very significant effect of race of victim on the likelihood of advancement to penalty trial. Following an approach that identified major factors influencing death penalty sentencing we developed a model that showed a very significant effect of race of victim on the likelihood of advancement to penalty trial.

White Victim is the Single Most Significant Effect on Advancement to Penalty Trial

Regional Effects The State Prosecutor argued that the effect of race of victim was confounded by district of prosecution. The State Prosecutor argued that the effect of race of victim was confounded by district of prosecution. He noted that counties that had large numbers of white victims were places where it was more likely for a case to go to penalty trial for other reasons. He noted that counties that had large numbers of white victims were places where it was more likely for a case to go to penalty trial for other reasons. For example, the cases with large numbers of white victims were in counties with many fewer death eligible cases. Prosecutors in such cases were more likely to focus in more aggressively on such cases. For example, the cases with large numbers of white victims were in counties with many fewer death eligible cases. Prosecutors in such cases were more likely to focus in more aggressively on such cases.

White Victim Controlling for County

3) We can Assume that the Biases are Balanced

Everything Will Balance Off in the End A common folklore is that the excluded variables balance each other, so we can assume that the parameter estimate is unbiased. A common folklore is that the excluded variables balance each other, so we can assume that the parameter estimate is unbiased. This assumption relies on a model in which the exclusion of variables is random, and therefore we would assume an unbiased estimate of b. This assumption relies on a model in which the exclusion of variables is random, and therefore we would assume an unbiased estimate of b. If this assumption had any basis to it we could just rely on the bivariate model. No-one would argue that that model provides an unbiased estimate! If this assumption had any basis to it we could just rely on the bivariate model. No-one would argue that that model provides an unbiased estimate!

Knowledge Development is not Random Indeed, there is good reason to believe that we identify variables in clusters around specific theoretical constructs (like poverty or social disorganization). Indeed, there is good reason to believe that we identify variables in clusters around specific theoretical constructs (like poverty or social disorganization). By definition we are then missing clusters which are likely to cause bias in specific directions. By definition we are then missing clusters which are likely to cause bias in specific directions. Data restrictions (e.g. gathering official data) are likely to be even more systematic in their biases. Data restrictions (e.g. gathering official data) are likely to be even more systematic in their biases.

So Why are Experiments Good Enough?

Randomized Experiments: A Naïve Approach Because treatment has been allocated randomly, in theory it is not going to be related systematically to other factors such as gender, race, age, attitudes etc. Because treatment has been allocated randomly, in theory it is not going to be related systematically to other factors such as gender, race, age, attitudes etc. THERE ARE NO CONFOUNDING CAUSES! THERE ARE NO CONFOUNDING CAUSES!

No Confounding!

So Rather than Taking Confounding Causes Into Account a Randomized Experiment Makes the Confounding Irrelevant The product of the correlations is zero in a randomized experiment.

The Folk Lore of Why Experiments are Not Good Enough

1) Experiments are Not Ethical Many people still claim that it is not ethical to carry out social experiments. Many people still claim that it is not ethical to carry out social experiments. It seems that at least in crime and justice evaluation researchers dont really accept this folk lore ( Lum and Yang, 2003). It seems that at least in crime and justice evaluation researchers dont really accept this folk lore ( Lum and Yang, 2003).

Randomized experimental design is the best method of linking cause and effect. t= -2.70* p=.010

Randomized experiments cannot be carried out ethically in criminal justice settings. t= -1.98 p=.051

2) Experiments Cannot be Implemented in the Real World Crime Reduction Experiments 1945-1993( N=267) Crime Reduction Experiments 1945-1993( N=267)

3) Experiments Have Low External Validity Only innovative agencies are willing to participate in experiments. Only innovative agencies are willing to participate in experiments. Ordinary agencies may be brought on board if there is strong governmental encouragement and financial support that rewards participation. Ordinary agencies may be brought on board if there is strong governmental encouragement and financial support that rewards participation. Experiments operate in an artificial world that is controlled and not dynamic. Experiments operate in an artificial world that is controlled and not dynamic. There is no free lunch! There is no free lunch!

Randomized Experiments are Good Enough Everything Else is Commentary

The great Talmudic scholar Hillel responded when asked to explain Judaism on one foot, that its essence was the dictum: Treat others as you would like them to treat you. He then noted that everything else is commentary. The great Talmudic scholar Hillel responded when asked to explain Judaism on one foot, that its essence was the dictum: Treat others as you would like them to treat you. He then noted that everything else is commentary. In our case, the essence of evaluation research is that experiments are good enough. In our case, the essence of evaluation research is that experiments are good enough. Non-experimental methods are not good enough. Non-experimental methods are not good enough.

The Commentary Of course, as in the case of Hillel, the commentary is very important. Of course, as in the case of Hillel, the commentary is very important. But a simple rule should follow. We should begin any study with an assumption that an experimental design is required. We should only then get to the commentary. But a simple rule should follow. We should begin any study with an assumption that an experimental design is required. We should only then get to the commentary.

Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew.

Similar presentations

Presentation on theme: "Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew.

Similar presentations

Presentation on theme: "Why Non-Experimental Methods are Not Good Enough and Why Experimental Methods Are: Challenging the Folk Lore of Evaluation Research David Weisburd Hebrew."— Presentation transcript:

Similar presentations

About project

Feedback