NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic.

Slides:

Advertisements

Similar presentations

Educational Research: Causal-Comparative Studies

Advertisements

Introduction to Propensity Score Matching

Cross Cultural Research

Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.

David Fairris Tarek Azzam

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.

Culture and psychological knowledge: A Recap

Method Reading Group September 22, 2008 Matching.

Propensity Score Matching A Primer in R 1 David Zepeda Assistant Professor Supply Chain & Information Management Center for Health Policy.

Summary of Propensity Score Matching in Education

Today Concepts underlying inferential statistics

CAUSAL-COMPARATIVE RESEARCH Prepared for: Eddy Luaran Prepared by: Nur Hazwani Mohd Nor ( ) Noriziati Abd Halim ( ) Noor fadzilah.

Using Covariates in Experiments: Design and Analysis STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical.

Chapter 2 Research Methods. The Scientific Approach: A Search for Laws Empiricism: testing hypothesis Basic assumption: events are governed by some lawful.

Chapter 8 Experimental Research

Chapter 2: The Research Enterprise in Psychology

Propensity Score Matching: A Primer for Educational Researchers

ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?

Chapter 2: The Research Enterprise in Psychology

Chapter 2 Research Methods. The Scientific Approach: A Search for Laws Empiricism: testing hypothesis Basic assumption: events are governed by some lawful.

Determining Sample Size

Advanced Statistics for Interventional Cardiologists.

Propensity Score Matching

Research Methods Key Points What is empirical research? What is the scientific method? How do psychologists conduct research? What are some important.

by B. Zadrozny and C. Elkan

T tests comparing two means t tests comparing two means.

Cross-Cultural Research Methods. Methodological concerns with Cross-cultural comparisons  Equivalence  Response Bias  Interpreting and Analyzing Data.

Chapter 1: The Research Enterprise in Psychology.

The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.

Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee Melbourne Institute of Applied Economic and Social Research The University.

Chapter 2 AP Psychology Outline

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Major Types of Quantitative Studies Descriptive research –Correlational research –Evaluative –Meta Analysis Causal-comparative research Experimental Research.

Chapter 8 Causal-Comparative Research Gay, Mills, and Airasian

Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)

Funded through the ESRC’s Researcher Development Initiative Prof. Herb MarshMs. Alison O’MaraDr. Lars-Erik Malmberg Department of Education, University.

Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.

Evaluating the Efficacy of the Research Initiative for Scientific Enhancement (RISE) by using Propensity Scores to Identify a Matched Comparison Group.

Beyond surveys: the research frontier moves to the use of administrative data to evaluate R&D grants Oliver Herrmann Ministry of Business, Innovation.

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.

A Randomized Experiment Comparing Random to Nonrandom Assignment William R Shadish University of California, Merced and M.H. Clark Southern Illinois University,

Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.

Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.

Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.

Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.

How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.

Patricia Gonzalez, OSEP June 14, The purpose of annual performance reporting is to demonstrate that IDEA funds are being used to improve or benefit.

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.

Looking for statistical twins

Chapter 2 Research Methods.

Chapter 2: The Research Enterprise in Psychology

Lurking inferential monsters

Constructing Propensity score weighted and matched Samples Stacey L

Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).

Chapter Eight: Quantitative Methods

Impact evaluation: The quantitative methods with applications

Evaluating Impacts: An Overview of Quantitative Methods

David Mann David Stapleton (Mathematica Policy Research) Alice Porter

Regression in Practice: Observational studies with controls for pretests work better than you think Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008).

Hybrid Estimates for Rare Populations: Probability Surveys Augmented with Targeted Nonprobability Samples Jill A. Dever, PhD 2019 Joint Statistical Meetings.

Presentation transcript:

NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic Effects

Contact Information 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference Center for Interdisciplinary Research & Analysis (CIRA) Department of Educational Psychology University of North Texas

Program Outline Assessment Practices within Post-Secondary Education Challenges to Quasi-Experimental Evaluation and Assessment Methods Propensity Score Matching Heuristic Example 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Educational Assessment Increased importance of modeling university resources to institutional outcomes. From a student development perspective, this is often through evaluating program effects.  First Year Programming  Orientation, New Student Camps, Freshman Seminars  Co-curricular Activities  Greek Life, Student Activities, Community Involvement  Service-Learning Initiatives  Living Learning Communities 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Modeling Effects 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference In order to accurately assess programmatic effects, cause & effect needs to be established.  May use control/comparison groups (experimental design)  If quasi-experimental:  Participants have traditionally been matched on demographic or other relevant variables  Or matched on some pre-treatment outcome (examination of baseline differences)

Example from the Literature The effects of on and off-campus living arrangements were explored with regard to students’ openness to diversity.  The 13 variables in the model were analyzed using path modeling.  Spurious effects were modeled on background characteristics. Results indicated that living on-campus was directly associated with significantly higher levels of openness to diversity than off campus living. Pike, G. (2009). The differential effects of on- and off-campus living arrangements on students’ openness to diversity. Journal of Student Affairs Research & Practice, 46(4), /11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Commonly Reported Limitations “The fact that students self selected into different residential communities represents another potential limitation of the research. Females, minority students, and higher-ability students were over-represented in the research sample due to the under-representation of off- campus students. Although background differences were accounted for in the study, the possibility remains that the residence groups might have differed in ways that were not explored” (Pike, 2009, p. 639). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Empirical Problems with Self-Selection True randomization is rarely an option in educational assessment (Luellen, Shadish, & Clark, 2005; Grunwald & Mayhew, 2008). As a result, there is an abundance and often over-reliance on reported effects which may inadequately address variables which contribute to differences in treatment group selection. Non-randomized groups may systematically differ from one based on any number of covariates. Leads to effect size bias when interpreting treatment effects. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Experimental vs. Quasi-Experimental 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference In true randomization, groups can be directly compared to one another because systematic differences have been controlled through experimental design: Probability of group membership is equal (p =.50). In quasi-experimental designs, group differences exist from non-randomization and therefore cannot be compared directly to one another. Probability of group membership is not equal (p ≠.50)

The ANCOVA Problem 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference ANCOVA is often used to control for differences on an outcome of interest based on theoretically relevant covariates. Controlling for covariates on an outcome is theoretically different than matching participants on their likelihood to be in a treatment group (independent variable).  Covariates which control for outcome differences may or may not have anything to do with group membership or self- selection.

Solution to Quasi-Experimental Designs Propensity score matching (PSM) is used to estimate the true treatment effect and to reduce group bias based due to non-randomization.  Participants are matched across groups on their likelihood of group membership.  Recommended method by the U.S. Department of Education to improve the quality of quasi-experimental research (Glen, 2005).  Increasingly used in medical & economic research since mid 1980s. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Defining a Propensity Score Defined as the conditional probability of assignment to a particular treatment or control given a set of covariates (Rosenbaum & Rubin, 1983b). Propensity scores incorporate covariates into a singular scalar variable ranging from 0 to 1. This new scalar variable can then be used to match participants in treatment groups. Once matched, treatments effects should be more reflective of the true effect and analogous to interpretation of randomized designs 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Calculating Propensity Scores The most commonly used methods include using either logistic regression. Other methods include classification trees or ensemble methods such as bagging, boosted regression trees, and random forest (Shadish, Luellen, & Clark, 2006). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

PSM in the Literature Grunwald & Mayhew (2008) examined the development of moral reasoning in young adults and demonstrated a significant reduction is the overestimation of effects. Morgan (2001) used propensity score matching and demonstrated the effect of private school education on math and reading achievement is actually larger than findings in non-matched samples. Other similar studies have been demonstrated in economics (Dehejia & Wahba, 2002), medicine (Schafer & Kang, 2008), and sociology (Morgan & Harding, 2006). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

PSM in the Literature 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference Over 1,000 articles were found in JASTOR having used propensity score matching among sociology, economics, and medical journals, yet it remains virtually absent from educational research & assessment methods.

PSM in Higher Education Literature The following reflects a search for propensity score matching techniques in the literature between the years of JournalArticles using PSM Journal of College Student Development 0 Journal of Student Affairs Research & Practice 0 Journal of College & Character0 Journal of Higher Education1 Review of Higher Education2 Research in Higher Education4 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Heuristic Example College X believes participation in a LLC contributes to better academic performance (GPA). A sample of 30 students was collected. Data were examined to determine if academic performance among LLC was statistically & meaningfully different than those who do not participate in an LLC. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Pre-Matching Achievement Scores NMSDtdfpd Non Participants LLC Participants (3.21) Non- LLC (3.43) LLC Biased Treatment Effect 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Propensity Score Calculation Logistic Regression was performed using SPSS 18.0 using the following covariates to predict participation in an LLC**  In-State vs. Out of State  Legacy  PSAT Scores  SAT Scores  Gender Predicted probabilities were saved in the analysis **Covariates should be theoretically driven variables which contribute to group membership, not the outcome of interest. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Pre-Matching Propensity Scores NMSDtdfpd Non Participants LLC Participants (.380) Non- LLC (.565) LLC 0 1 Unlikely to be in LLC Likely to be in LLC Amount of Bias 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Propensity Score Matching Balance groups on covariates though either matching, regression adjustment, and stratification  Stratification across quintiles is the recommended and most common method.  Shown to reduce approximately 90% of bias due to covariates (Rubin & Rosenbaum, 1983b; Rubin & Rosenbaum, 1984; Shadish, Luellen, & Clark, 2005) Caliper matching can also substantially reduce bias (Rosenbaum and Rubin, 1985b).  A caliper of 0.25 standard deviations of the logit transformation of the propensity score can also work well to reduce bias (Stuart & Rubin, 2007, ¶4.3.3). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Matching Algorithms MatchIt in R (Ho, Imai, King, and Stuart, 2007) PSMATCH2 algorithm in STATA (Leuven & Sianesi, 2004) SUGI “GREEDY” Macro in SAS (D’Agostino, 1998), SPSS algorithm (Painter, 2009)  Core code written by Raynald Levesque and adapted for use with propensity matching by John Painter Feb 2004  Program developed and tested with SPSS 11.5  Procedure will find best match for each treatment case from the control cases  Control case is then removed and not reconsidered for subsequent matches 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Assessing Matched Samples Some ways of assessing balance (Rubin, 2001)  The standardized difference in the mean propensity score in the two groups should be near zero (d <.20),  The ratio of the variance of the propensity score in the two groups should be near one, preferably between 0.80 and /11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Pre-Matching Propensity Scores NMSDtdfpd Non Participants LLC Participants (.380) Non- LLC (.565) LLC 0 1 Unlikely to be in LLC Likely to be in LLC Amount of Bias 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Post-Matching Propensity Scores NMSDtdfpd Non Participants LLC Participants (.487) Non- LLC (.476) LLC 0 1 Unlikely to be in LLC Likely to be in LLC 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Histogram of Post-Matching PS Differences 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Pre-Matching Achievement Scores NMSDtdfpd Non Participants LLC Participants (3.21) Non- LLC (3.43) LLC Biased Treatment Effect 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Post-Matching Achievement Scores NMSDtdfpd Non Participants LLC Participants (3.32) Non- LLC (3.44) LLC True Treatment Effects 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

Limitations & Cautions Algorithms across various platforms make different assumptions about how to treat data. Matched data sets tend to be more homogenous than in randomized samples Pre-matched sample (n) and post-matched sample (n) will not equal and should be taken into account with regard to statistical power. Propensity score matching typically requires larger sample sizes. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

References D’Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of treatment to a non-randomized control group. Statistics in Medicine, 17, National Research Council (2000). Scientific research in education. Washington, D.C.: National Academy Press. Glenn, D. (2005, March). New federal policy favors randomized trials in education research. The Chronicle of Higher Education, Retrieved December 5, 2009 from Grunwald, H.E. & Mayhew, M.J. (2008). The use of propensity scores in identifying a comparison group in a quasi-experimental design: Moral reasoning development as an outcome. Research in Higher Education, 49(8), Ho D., Imai, K., King, G.,& Stuart, E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15, Leuven, E., & Sianesi, B. (2004). PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing, Statistical Software Components S432001, Boston College Department of Economics. Morgan, S. L. (2001). Counterfactuals, causal effect heterogeneity, and the Catholic school effect on learning. Sociology of Education, 74, 341–374. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

References Morgan, S., & Harding, D. (2006).Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods & Research, 35(1), DOI: / Painter, J. (2009). Jordan institute for families: Virtual research community. Retrieved from Pike, G. (2009). The differential effects of on- and off-campus living arrangements on students’ openness to diversity. Journal of Student Affairs Research & Practice, 46(4), Rosenbaum, P. R., & Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79(387), Rubin, D. B. (2001). Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services & Outcomes Research Methodology 2, 169–188. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

References Schafer, J. L., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13(4), doi: /a Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs (report from the Governing Board of the American Educational Research Association Grants Program). Washington, DC: American Educational Research Association. Shadish W. R., Luellen J. K., & Clark M. H. (2005). Propensity scores: An introduction and experimental test. Evaluation Review, 29(6), doi: / X Shadish W. R., Luellen J. K., & Clark M. H. (2006). Propensity scores and quasi- experiments: A testimony to the practical side of Lee Sechrest. In: Bootzin R.R., McKnight P.E. (Eds.), Strengthening research methodology: Psychological measurement and evaluation. American Psychological Association: Washington, DC, 143–157. Stuart, E. A., & Rubin, D. B. (2008). Matching methods for causal inference: Designing observational studies. In: Obsborne, J. (Eds.), Best practices in quantitative methods. Thousand Oaks, CA: Sage Publishing. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference