Clash of Causal Inference Techniques

Slides:

Advertisements

Similar presentations

Educational Research: Causal-Comparative Studies

Advertisements

Stephen C. Court Presented at

Introduction to Psychology

Grant review at NIH for statistical methodology Jeremy M G Taylor Michelle Dunn Marie Davidian.

Introduction to Propensity Score Matching

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.

BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE.

Estimating the Impact of Liens on Taxpayer Compliance Behavior and Income Taxpayer Advocate Service Research & Analysis June 2012.

Experimental Research Designs

© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.

Today Concepts underlying inferential statistics

6-1 Chapter Six DESIGN STRATEGIES. 6-2 What is Research Design? A plan for selecting the sources and types of information used to answer research questions.

Chapter One: The Science of Psychology

Introduction to the design (and analysis) of experiments James M. Curran Department of Statistics, University of Auckland

McGraw-Hill © 2006 The McGraw-Hill Companies, Inc. All rights reserved. Correlational Research Chapter Fifteen.

Inferential Statistics

Experimental Design The Gold Standard?.

Chapter 4 Principles of Quantitative Research. Answering Questions  Quantitative Research attempts to answer questions by ascribing importance (significance)

LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.

Propensity Score Matching

Simple Linear Regression

Chapter One: The Science of Psychology. Ways to Acquire Knowledge Tenacity Tenacity Refers to the continued presentation of a particular bit of information.

Understanding Statistics

L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.

Chapter 1: Introduction to Statistics

Correlational Research Chapter Fifteen Bring Schraw et al.

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

Matching Estimators Methods of Economic Investigation Lecture 11.

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.

Notes on Research Design You have decided –What the problem is –What the study goals are –Why it is important for you to do the study Now you will construct.

CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.

Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.

Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.

Research Methods in Psychology Chapter 2. The Research ProcessPsychological MeasurementEthical Issues in Human and Animal ResearchBecoming a Critical.

Causal inferences Most of the analyses we have been performing involve studying the association between two or more variables. We often conduct these kinds.

Research Strategies. Why is Research Important? Answer in complete sentences in your bell work spiral. Discuss the consequences of good or poor research.

Research Design. Selecting the Appropriate Research Design A research design is basically a plan or strategy for conducting one’s research. It serves.

American Educational Research Association Annual Meeting AERA San Diego, CA - April 13-17, 2009 Denise Huang Examining the Relationship between LA's BEST.

CHAPTER 2 Research Methods in Industrial/Organizational Psychology

Research Design ED 592A Fall Research Concepts 1. Quantitative vs. Qualitative & Mixed Methods 2. Sampling 3. Instrumentation 4. Validity and Reliability.

Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

Experimental Control Definition Is a predictable change in behavior (dependent variable) that can be reliably produced by the systematic manipulation.

Randomized Assignment Difference-in-Differences

Single-Subject and Correlational Research Bring Schraw et al.

REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.

The Psychologist as Detective, 4e by Smith/Davis © 2007 Pearson Education Chapter One: The Science of Psychology.

Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.

Helpful hints for planning your Wednesday investigation.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Propensity Score Matching in SPSS: How to turn an Audit into a RCT

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.

Matching methods for estimating causal effects Danilo Fusco Rome, October 15, 2012.

Writing a sound proposal

Lurking inferential monsters

Principles of Quantitative Research

Constructing Propensity score weighted and matched Samples Stacey L

Chapter 8 Experimental Design The nature of an experimental design

Statistical Data Analysis

CHAPTER 2 Research Methods in Industrial/Organizational Psychology

Propensity Score Matching Makes Program Evaluation Easy

Methods of Economic Investigation Lecture 12

Joint Statistical Meetings, Vancouver, August 1, 2018

Evaluating Impacts: An Overview of Quantitative Methods

Statistical Data Analysis

Regression Analysis.

DESIGN OF EXPERIMENTS by R. C. Baker

Presentation transcript:

Clash of Causal Inference Techniques Clash London Calling album cover from http://en.wikipedia.org/wiki/File:TheClashLondonCallingalbumcover.jpg Basil Rathbone as Sherlock Holmes photo By Employee(s) of Universal Studios (Photograph in possession of SchroCat) [Public domain], via Wikimedia Commons Terrence Willett and Nathan Pellegrin RP Group conference Kellogg west April 2014

http://images4.fanpop.com/image/photos/22000000/Ned-Jaime-game-of-thrones-22077633-1024-576.png http://savagejapanmoviereviews.com/wp-content/uploads/2011/03/samurai-rebellion-japanese-movie-review-toshiro-mifune.jpg http://cdn.mos.totalfilm.com/images/s/star-wars-episode-v-the-empire-strikes-back-1980-_141674-fli_1375362828.jpg http://upload.wikimedia.org/wikipedia/commons/9/97/FrzDuellImBoisDeBoulogneDurand1874.jpg

Outcomes Describe purpose of regression and propensity score matching (PSM) Explain data requirements and basic procedure of regression and PSM Compare and contrast regression and PSM Identify additional resources for further exploration

Why causal inference If you need to use statistics, then you should design a better experiment. –attributed to Rutherford Most education research is observational/correlational, not experimental. Atom from http://simple.wikipedia.org/wiki/File:Stylised_Lithium_Atom.png

Common scenario Did participation in an activity, class or support result in higher outcomes for students than would have happened had they not participated? Students self-selected to participate and/or were recruited to participate. When participants are compared to non-participants, differences in outcomes can be attributed to differences in background variables or motivation. Can we determine if the participation caused a change in outcomes? No, but…

Regression 𝑦= 𝑎 0 + 𝑎 1 𝑥 1 + 𝑎 2 𝑥 2 …+ 𝑎 𝑛 𝑥 𝑛 +𝑒 Classic correlational technique Covariates used in model to attempt to control for differences in background variables or motivation Background variables can include measures of or proxies for skill level, social capital, or socio-economic status Measures of self-motivation often unavailable Models are imperfect and generally must be combined with other evidence to more completely describe the possible influence of an intervention, program or strategy

Propensity Score Matching One of several ways to create a matched comparison group of non-participants intended to be similar to participant group for a valid comparison Likelihood or other techniques used to create a score indicating the likelihood that a particular non-participant would have been a participant based on similarity to one or more participants

The Counterfactual (potential outcomes) framework ∆ = 𝑌 𝑡 − 𝑌 𝑐 This shows the causal effect of some event (participation, treatment, intervention) on an individual. 𝑌 𝑡 is the potential outcome under treatment. 𝑌 𝑐 is the potential outcome under non-treatment (control, counterfactual) and thus cannot be observed in our universe*. *In light of recently published evidence for inflationary theories of the cosmos, we note that we may be living in a multiverse (Alan Guth, 2014). Counterfactual conditions may obtain in alternate universes.

The potential outcomes matrix Yt Yc Actual Treatment Status (T) 1 = observable = not observable

Average Treatment effect (ATE) Participants and non-participants differ systematically (w.r.t. demographics, trajectories, risk profiles, self-selection, etc.) Different people respond differently to treatment (differential response) These facts must be taken into account when modeling/computing treatment effects. This means all four cells of the matrix must be estimated in order to obtain an average treatment effect ! How ? Programs (treatments) are often designed for and targeted toward those who are expected to gain the most from participation (aka “positive selection”); the potential treatment effect can be different for participants and non-participants.

Symbolic derivation of average effects 1 ∆ = 𝑌 𝑖 𝑡 − 𝑌 𝑖 𝑐 2 𝑌 𝑖 = 𝑇 𝑖 𝑌 𝑖 𝑡 − (1−𝑇 𝑖 ) 𝑌 𝑖 𝑐 3 𝐸[∆] =𝐸[ 𝑌 𝑖 𝑡]−𝐸[ 𝑌 𝑖 𝑐] 4 𝐸 𝑁 [ 𝑌 𝑖 | 𝑇 𝑖 =1]− 𝐸 𝑁 [ 𝑌 𝑖 | 𝑇 𝑖 =0] 5 𝐸 ∆ = 𝜌𝐸 𝑌 𝑖 𝑡 𝑇 𝑖 =1 + 1−𝜌 𝐸 𝑌 𝑖 𝑡 𝑇 𝑖 =0 −{𝜌𝐸 𝑌 𝑖 𝑐 𝑇 𝑖 =1 + 1−𝜌 𝐸 𝑌 𝑖 𝑐 𝑇 𝑖 =0 } 6 𝐸 ∆ 𝑇 𝑖 =1 =𝐸 𝑌 𝑖 𝑡 𝑇 𝑖 =1 −𝐸 𝑌 𝑖 𝑐 𝑇 𝑖 =1 7 𝐸 ∆ 𝑇 𝑖 =0 =𝐸 𝑌 𝑖 𝑡 𝑇 𝑖 =0 −𝐸 𝑌 𝑖 𝑐 𝑇 𝑖 =0 ATE ATT ATU

Conditional independence (perfect stratification) (SELection on observables) If our observations include information on every one of the variables influencing likelihood of participation or differential responses*, then it is possible to avoid omitted variable bias and so achieve CI (PS, SOO). And if we have CI, this is like randomized assignment …. 𝐸 𝑌 𝑡 𝑇=1, 𝑿 = 𝐸 𝑌 𝑡 𝑇=0, 𝑿 𝐸 𝑌 𝑐 𝑇=1, 𝑿 = 𝐸 𝑌 𝑐 𝑇=0, 𝑿 * How often do we encounter such datasets in institutional research? The equations at the bottom capture the conditions that are achieved through random assignment

Assuming CI… Potential Outcome Yt Yc Actual Treatment Status (T) 1 = observable = not observable Y(t) Y(c)

Average Treatment effect (ATE) PSM offers a way to plug values into (5), (6) and (7) to obtain unbiased estimates. However, if we have CI, then why not just something like…. 𝑌 𝑖 =𝛼+𝛾 𝑇 𝑖 +𝜷′ 𝑿 𝑖 + 𝜀 𝑖 ? (and many other regression techniques) There is no firm and fast answer to this question. Decisions are based on pragmatic considerations. However….. Programs (treatments) are often designed for and targeted toward those who are expected to gain the most from participation (aka “positive selection”); the potential treatment effect can be different for participants and non-participants.

example

Pros and cons Regressions can be “easier” to run but harder to explain to a general audience PSM can be more time consuming to conduct but easier to explain to a general audience Regressions tend to perform better with large data sets while PSM tends to perform better with few observations provided the non-participant group has sufficient numbers of individuals with the key confounding variables Regressions have been used for many years and are well described mathematically with broad consensus on proper error terms PSM is newer and there is not consensus on optimal matching procedures or proper error terms Regression will use all cases with non-missing data while PSM may only a subset of cases from the pool of non-participants All analytic methods suffer if key variables are not available Conclusions can often be the same with either method

How to run PSM Create data file (95% of effort) Match participants and non-participants on a set of control variables to create a comparison group with similar proportions on all characteristics (i.e. comparison group would have a similar percent female, Hispanic, low income, etc. as compared to the participant group) This step is referred to as “balancing” and generally must be repeated several times to obtain balance on all variables of interest either by adjusting matching criteria or removing variables Run comparative analyses, which can include simple t-tests, post-PSM regressions, or other techniques Major packages that conduct PSM include STATA, R, and SAS STATA version 12 and older have psmatch2, v13 has teffects psmatch Note SPSS/PASW does not do PSM directly but there is an R plugin for SPSS http://arxiv.org/ftp/arxiv/papers/1201/1201.6385.pdf

An alternative strategy to deciding this battle (warning: Empiricists approaching)

Alternative perspectives Estimating program effects based on observational data can also be understood as… Delimiting the error “built-into” inductive reasoning about causes (the problem of induction) An “inverse problem” in the study of social dynamics. A missing data problem An optimization problem From these alternate perspectives there is a large menu of methods and extensions, including Euclidean, Mahalanobis and Gower’s distance, nearest neighbor, cosine similarity, kernal functions, genetic algorithms, imputation, dimensionality reduction, and other supervised/unsupervised learning algorithms A criterion that can be applied to regression, PSM and other methods is this: how do they perform at predicting new (future) observations? (false positives, false negatives, correlated errors) In empirical investigations we do not answer this question just once using new replications or sampling under varying conditions. It is applicable in any situation where predictions are made concerning new (possibly future) observations. We update our models based on new evidence. And in this respect, regression and PSM methods can both be used as tools of discovery; as ways to extend our understanding of the processes producing patterns we find in sets of observations (and in streams of information, generally). SO: CHOOSE THE METHOD/MODEL WHICH YIELDS THE SMALLEST PREDICTION ERROR. That may decide a battle in a particular setting (or occasion), but the war between methods will go on …. A number of the studies I reviewed used a comparison scheme in which a dataset is constructed for a certain value of ATE, and then estimates of ATE are recovered via PSM and regression. However OLS regression is typically used, and since it has been recognized for decades that OLS has limitations. Many techniques have been developed to address these limitations (ridge regression, quantile regression, bootstrapping, non-linear models….). “empirical investigations “ includes applied econ and “the quant side of business”.

“The inability to predict outliers implies the inability to predict the course of history” ― Nassim Nicholas Taleb, The Black Swan: The Impact of the Highly Improbable “If you insist on strict proof (or strict disproof) in the empirical sciences, you will never benefit from experience, and never learn from it how wrong you are.“ — Karl Raimund Popper, The Logic of Scientific Discovery: Logik Der Forschung (2002), 28.

Further reading Angrist, J. D., & Pischke, J. (2008). Mostly Harmless Econometrics: An Empiricists Companion Morgan, S., Harding, D. (2006) Matching Estimators of Causal Effects: From Stratification and Weighting to Practical Data Analysis Routines Caliendo and Kopeinig. 2005. Practical Guide for PSM www.caliendo.de/Papers/practical_revised_DP.pdf Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. Padgett, R.; Salisbury, M.; An, B.; & Pascarella, E. (2010). Required, Practical, or Unnecessary? An Examination and Demonstration of Propensity Score Matching Using Longitudinal Secondary Data. New Directions for Institutional Research – Assessment Supplement (pp. 29-42). San Francisco, CA: Jossey-Bass. Soledad Cepeda, M.; Boston, R.; Farrar, J., & Strom, B. (2003). Comparison of Logistic Regression versus Propensity Score When the Number of Events Is Low and There Are Multiple Confounders. American Journal of Epidemiology, 158, 280-287. http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html

Thank you Terrence Willett Director of Planning, Research, and Knowledge Systems Cabrillo College terrence@cabrillo.edu Nathan Pellegrin Data Processing Specialist Peralta District npellegrin@peralta.edu