Presentation is loading. Please wait.

Presentation is loading. Please wait.

NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic.

Similar presentations


Presentation on theme: "NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic."— Presentation transcript:

1 NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic Effects

2 Contact Information 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference Center for Interdisciplinary Research & Analysis (CIRA) Department of Educational Psychology University of North Texas Forrest.lane@unt.edu

3 Program Outline Assessment Practices within Post-Secondary Education Challenges to Quasi-Experimental Evaluation and Assessment Methods Propensity Score Matching Heuristic Example 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

4 Educational Assessment Increased importance of modeling university resources to institutional outcomes. From a student development perspective, this is often through evaluating program effects.  First Year Programming  Orientation, New Student Camps, Freshman Seminars  Co-curricular Activities  Greek Life, Student Activities, Community Involvement  Service-Learning Initiatives  Living Learning Communities 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

5 Modeling Effects 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference In order to accurately assess programmatic effects, cause & effect needs to be established.  May use control/comparison groups (experimental design)  If quasi-experimental:  Participants have traditionally been matched on demographic or other relevant variables  Or matched on some pre-treatment outcome (examination of baseline differences)

6 Example from the Literature The effects of on and off-campus living arrangements were explored with regard to students’ openness to diversity.  The 13 variables in the model were analyzed using path modeling.  Spurious effects were modeled on background characteristics. Results indicated that living on-campus was directly associated with significantly higher levels of openness to diversity than off campus living. Pike, G. (2009). The differential effects of on- and off-campus living arrangements on students’ openness to diversity. Journal of Student Affairs Research & Practice, 46(4), 629-645. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

7 Commonly Reported Limitations “The fact that students self selected into different residential communities represents another potential limitation of the research. Females, minority students, and higher-ability students were over-represented in the research sample due to the under-representation of off- campus students. Although background differences were accounted for in the study, the possibility remains that the residence groups might have differed in ways that were not explored” (Pike, 2009, p. 639). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

8 Empirical Problems with Self-Selection True randomization is rarely an option in educational assessment (Luellen, Shadish, & Clark, 2005; Grunwald & Mayhew, 2008). As a result, there is an abundance and often over-reliance on reported effects which may inadequately address variables which contribute to differences in treatment group selection. Non-randomized groups may systematically differ from one based on any number of covariates. Leads to effect size bias when interpreting treatment effects. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

9 Experimental vs. Quasi-Experimental 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference In true randomization, groups can be directly compared to one another because systematic differences have been controlled through experimental design: Probability of group membership is equal (p =.50). In quasi-experimental designs, group differences exist from non-randomization and therefore cannot be compared directly to one another. Probability of group membership is not equal (p ≠.50)

10 The ANCOVA Problem 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference ANCOVA is often used to control for differences on an outcome of interest based on theoretically relevant covariates. Controlling for covariates on an outcome is theoretically different than matching participants on their likelihood to be in a treatment group (independent variable).  Covariates which control for outcome differences may or may not have anything to do with group membership or self- selection.

11 Solution to Quasi-Experimental Designs Propensity score matching (PSM) is used to estimate the true treatment effect and to reduce group bias based due to non-randomization.  Participants are matched across groups on their likelihood of group membership.  Recommended method by the U.S. Department of Education to improve the quality of quasi-experimental research (Glen, 2005).  Increasingly used in medical & economic research since mid 1980s. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

12 Defining a Propensity Score Defined as the conditional probability of assignment to a particular treatment or control given a set of covariates (Rosenbaum & Rubin, 1983b). Propensity scores incorporate covariates into a singular scalar variable ranging from 0 to 1. This new scalar variable can then be used to match participants in treatment groups. Once matched, treatments effects should be more reflective of the true effect and analogous to interpretation of randomized designs 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

13 Calculating Propensity Scores The most commonly used methods include using either logistic regression. Other methods include classification trees or ensemble methods such as bagging, boosted regression trees, and random forest (Shadish, Luellen, & Clark, 2006). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

14 PSM in the Literature Grunwald & Mayhew (2008) examined the development of moral reasoning in young adults and demonstrated a significant reduction is the overestimation of effects. Morgan (2001) used propensity score matching and demonstrated the effect of private school education on math and reading achievement is actually larger than findings in non-matched samples. Other similar studies have been demonstrated in economics (Dehejia & Wahba, 2002), medicine (Schafer & Kang, 2008), and sociology (Morgan & Harding, 2006). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

15 PSM in the Literature 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference Over 1,000 articles were found in JASTOR having used propensity score matching among sociology, economics, and medical journals, yet it remains virtually absent from educational research & assessment methods.

16 PSM in Higher Education Literature The following reflects a search for propensity score matching techniques in the literature between the years of 1996 - 2010 JournalArticles using PSM Journal of College Student Development 0 Journal of Student Affairs Research & Practice 0 Journal of College & Character0 Journal of Higher Education1 Review of Higher Education2 Research in Higher Education4 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

17 Heuristic Example College X believes participation in a LLC contributes to better academic performance (GPA). A sample of 30 students was collected. Data were examined to determine if academic performance among LLC was statistically & meaningfully different than those who do not participate in an LLC. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

18 Pre-Matching Achievement Scores NMSDtdfpd Non Participants163.21.3231.79528.084.660 LLC Participants143.43.565 (3.21) Non- LLC (3.43) LLC 3.0 4.0 Biased Treatment Effect 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

19 Propensity Score Calculation Logistic Regression was performed using SPSS 18.0 using the following covariates to predict participation in an LLC**  In-State vs. Out of State  Legacy  PSAT Scores  SAT Scores  Gender Predicted probabilities were saved in the analysis **Covariates should be theoretically driven variables which contribute to group membership, not the outcome of interest. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

20 Pre-Matching Propensity Scores NMSDtdfpd Non Participants16.380.2262.53428.017.942 LLC Participants14.565.161 (.380) Non- LLC (.565) LLC 0 1 Unlikely to be in LLC Likely to be in LLC Amount of Bias 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

21 Propensity Score Matching Balance groups on covariates though either matching, regression adjustment, and stratification  Stratification across quintiles is the recommended and most common method.  Shown to reduce approximately 90% of bias due to covariates (Rubin & Rosenbaum, 1983b; Rubin & Rosenbaum, 1984; Shadish, Luellen, & Clark, 2005) Caliper matching can also substantially reduce bias (Rosenbaum and Rubin, 1985b).  A caliper of 0.25 standard deviations of the logit transformation of the propensity score can also work well to reduce bias (Stuart & Rubin, 2007, ¶4.3.3). 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

22 Matching Algorithms MatchIt in R (Ho, Imai, King, and Stuart, 2007) PSMATCH2 algorithm in STATA (Leuven & Sianesi, 2004) SUGI 214-26 “GREEDY” Macro in SAS (D’Agostino, 1998), SPSS algorithm (Painter, 2009)  Core code written by Raynald Levesque and adapted for use with propensity matching by John Painter Feb 2004  Program developed and tested with SPSS 11.5  Procedure will find best match for each treatment case from the control cases  Control case is then removed and not reconsidered for subsequent matches 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

23 Assessing Matched Samples Some ways of assessing balance (Rubin, 2001)  The standardized difference in the mean propensity score in the two groups should be near zero (d <.20),  The ratio of the variance of the propensity score in the two groups should be near one, preferably between 0.80 and 1.25 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

24 Pre-Matching Propensity Scores NMSDtdfpd Non Participants16.380.2262.53428.017.942 LLC Participants14.565.161 (.380) Non- LLC (.565) LLC 0 1 Unlikely to be in LLC Likely to be in LLC Amount of Bias 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

25 Post-Matching Propensity Scores NMSDtdfpd Non Participants8.484.177.09214.928.047 LLC Participants8.476.158 (.487) Non- LLC (.476) LLC 0 1 Unlikely to be in LLC Likely to be in LLC 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

26 Histogram of Post-Matching PS Differences 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

27 Pre-Matching Achievement Scores NMSDtdfpd Non Participants163.21.3231.79528.084.660 LLC Participants143.43.565 (3.21) Non- LLC (3.43) LLC 3.0 4.0 Biased Treatment Effect 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

28 Post-Matching Achievement Scores NMSDtdfpd Non Participants83.32.249.81614.428.384 LLC Participants83.44.364 (3.32) Non- LLC (3.44) LLC 3.0 4.0 True Treatment Effects 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

29 Limitations & Cautions Algorithms across various platforms make different assumptions about how to treat data. Matched data sets tend to be more homogenous than in randomized samples Pre-matched sample (n) and post-matched sample (n) will not equal and should be taken into account with regard to statistical power. Propensity score matching typically requires larger sample sizes. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

30 References D’Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of treatment to a non-randomized control group. Statistics in Medicine, 17, 2265-2281.National Research Council (2000). Scientific research in education. Washington, D.C.: National Academy Press. Glenn, D. (2005, March). New federal policy favors randomized trials in education research. The Chronicle of Higher Education, Retrieved December 5, 2009 from http://www.chronicle.com. Grunwald, H.E. & Mayhew, M.J. (2008). The use of propensity scores in identifying a comparison group in a quasi-experimental design: Moral reasoning development as an outcome. Research in Higher Education, 49(8), 758-775. Ho D., Imai, K., King, G.,& Stuart, E. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis, 15, 199- 236. Leuven, E., & Sianesi, B. (2004). PSMATCH2: Stata module to perform full Mahalanobis and propensity score matching, common support graphing, and covariate imbalance testing, Statistical Software Components S432001, Boston College Department of Economics. Morgan, S. L. (2001). Counterfactuals, causal effect heterogeneity, and the Catholic school effect on learning. Sociology of Education, 74, 341–374. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

31 References Morgan, S., & Harding, D. (2006).Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods & Research, 35(1), 3-60. DOI: 10.1177/0049124106289164. Painter, J. (2009). Jordan institute for families: Virtual research community. Retrieved from http://ssw.unc.edu/VRC/Lectures/index.htm. Pike, G. (2009). The differential effects of on- and off-campus living arrangements on students’ openness to diversity. Journal of Student Affairs Research & Practice, 46(4), 629-645. Rosenbaum, P. R., & Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55. Rosenbaum, P. R., & Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79(387), 516-524 Rubin, D. B. (2001). Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services & Outcomes Research Methodology 2, 169–188. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference

32 References Schafer, J. L., & Kang, J. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated example. Psychological Methods, 13(4), 279-313. doi:10.1037/a0014268. Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating causal effects using experimental and observational designs (report from the Governing Board of the American Educational Research Association Grants Program). Washington, DC: American Educational Research Association. Shadish W. R., Luellen J. K., & Clark M. H. (2005). Propensity scores: An introduction and experimental test. Evaluation Review, 29(6), 530-558. doi:10.1177/0193841X0575596. Shadish W. R., Luellen J. K., & Clark M. H. (2006). Propensity scores and quasi- experiments: A testimony to the practical side of Lee Sechrest. In: Bootzin R.R., McKnight P.E. (Eds.), Strengthening research methodology: Psychological measurement and evaluation. American Psychological Association: Washington, DC, 143–157. Stuart, E. A., & Rubin, D. B. (2008). Matching methods for causal inference: Designing observational studies. In: Obsborne, J. (Eds.), Best practices in quantitative methods. Thousand Oaks, CA: Sage Publishing. 6/11/2010 Forrest Lane, NASPA Assessment & Retention Conference


Download ppt "NASPA ASSESSMENT & RETENTION CONFERENCE, JUNE 11, 2010 FORREST LANE UNIVERSITY OF NORTH TEXAS Why Propensity Score Matching should be used to Assess Programmatic."

Similar presentations


Ads by Google