Download presentation
Presentation is loading. Please wait.
1
Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
2
Agenda Motivating examples
Definition of causal effects with potential outcomes Definition of propensity scores Applied example of propensity scores Hands-on example in R Advanced Topics
3
Motivation Randomized experiment is considered “gold standard” for causal inference Randomization not always possible Trials where treatment is offered to community at large Participants do not permit randomization Ethical or legal considerations Naturally occurring phenomena Broken randomized experiments (attrition, non-compliance, treatment diffusion) The randomized controlled trial has been called the gold standard for causal inference, because a randomized experiment has desirable properties to infer causal relationships from observed associations. A properly conducted randomized experiment allows for causal claims. Now, we cannot always randomized treatments, but there still phenomena that we are interested in and that we would like to study. Does that mean that we are unable or even should not study a phenomenon because we can’t randomize? No – but we have to be careful about the implications of non-randomization. Here are a few examples of instances where we can’t randomize: Trials where treatment is offered to community at large – for example a social justice oriented program that is offered to everybody who needs it, and maybe our main interest is to help the community, but we still might be interested in evaluating the effectiveness of the program Stakeholders do not permit randomization – I live in Arizona and we currently have a lot of TV ads warning people about the dangers of using methemphatimes. The people from Montana who fund this study are mainly interested in putting these ads on TV, but they don’t want to randomize this (granted, it would be very hard to do). Ethical or legal considerations Naturally occurring phenomena – you obviously can’t randomly assign cities to be hit by Hurricane Katrina, but there are numerous studies out there that tried to assess effects on hurricane victims Broken randomized experiments – Even if you randomize a treatment, as soon as you leave the confinement of a laboratory the randomization can break down. Attrition, or even differential attrition can occur, non-compliance (meaning people don’t do what they are supposed to do under the treatment) can occur, or treatments can diffuse to the control group. This is an important category, because just because you randomize a treatment does not mean that everything will work out that way. As a consultant at the ASU School of Social Work I worked with a colleague that randomized a treatment for immigrant families , but the randomization occurred at the treatment delivery site. Upon inspecting the data it became clear that the treatment administrators actually gave the treatment more often to families that were in greater need.
4
Broken randomized experiments
5
Motivation Non-random assignment leads to group imbalances at pretest
Selection bias Confounding of treatment effects due to imbalanced covariates Any of the situations mentioned above leads us to not use randomization. But we know that non-random assignment can lead to group imbalances and that in turn can lead to confounds, if an unbalanced variable is also related to treatment. Wilson and Lipsey for example examined a total 302 meta-analyses mostly in the area of intervention research and looked for differences in randomized and non-randomized experiments. Even though there was no systematic bias across all 302 domains, there are areas of research were randomized and non-randomized studies yield different answers. Knowing that non-randomized studies can be sometimes misleading and yield different answers than randomized controlled trials – what are some strategies that can bolster our faith in drawing a causal inference from a non-randomized study?
6
Hormone Replacement therapy
1968 “Feminine Forever“ 2002 Women‘s Health Initative trial on hormone replacement therapy
7
Motivation Adjustment methods
ANCOVA / Regression Adjustment Matching Stratification Many covariates are needed to control for potentially confounding influences Traditionally, researchers in the social sciences have been using ANOCVA models and regression adjustment a lot. Matching and Stratification are methods that can be used but are less frequently employed. Usually, we want to include many covariates to rule out any potential confounding influence.
8
Motivation Assumptions of classic ANCOVA model
linearity no baseline by treatment interactions Region of common support in multi-dimensional space hard to assess extrapolation beyond data is sensitive to model adequacy But adding all these covariates can be a problem. ANCOVA or regression adjustment for that matter has assumptions that become increasingly hard to check as the number of covariates increase. First, we need to check that relationships to the outcome are linear. If this assumption is violated, treatment effects might be biased. Second, there can’t be no baseline by treatment interaction, again if this is violated then out treatment effects might be biased. In short, we need to model the relationship between outcome and the covariates properly to have confidence in our results. Another problem that can arise in multivariate adjustment approaches is that the region of common support is hard to assess. The region of common support is simply a region in multi-dimensional space in which control and treatment group overlap on all the covariates. Regression adjustment can actually extrapolate results far beyond region in which data is observed, which makes it very sensitive to model assumptions.
9
Motivation – Key Issues
Non-randomized studies are necessary Many covariates should be assessed to control for confounding influences High-dimensional regression adjustment has strong assumptions and distributional overlap is hard to check
10
Defining causal effects
Definition of causal effect is often lacking in applied social science Parameter estimates from any model (ANOVA, regression, structural equation model) may or may not be causally interpretable
11
Rubin Causal Model TREATMENT Unit –level Causal Effect CONTROL
12
Rubin Causal Model TREATMENT CONTROL Average Causal Effect
13
Rubin Causal Model TREATMENT CONTROL
Estimate of the Average Causal Effect
14
Rubin Causal Model CONTROL TREATMENT CONTROL TREATMENT
E(Yi1) = E(Yi1 | zi = 1) E(Yi0) = E(Yi0 | zi = 0) E(Yi1) ≠ E(Yi1 | zi = 1) E(Yi0) ≠ E(Yi0 | zi = 0)
15
Rubin Causal Model τ = 11.33-12.83 = -1.5 τ* = 11.33-13.33 = -2.0
Potential Outcomes Observed Outcomes T C 10 11 12 16 13 15 11.33 12.83 13.33 τ = = -1.5 τ* = = -2.0 E(Yi1) = E(Yi1 | zi = 1) E(Yi0) = E(Yi0 | zi = 0) Source: West and Thoemmes (2008)
16
Rubin Causal Model τ = 11.33-12.83 = -1.5 τ* = 11.66-11.00 = .66
Potential Outcomes Observed Outcomes T C 10 11 12 16 13 15 11.33 12.83 11.66 τ = = -1.5 τ* = = .66 E(Yi1) ≠ E(Yi1 | zi = 1) E(Yi0) ≠ E(Yi0 | zi = 0) Source: West and Thoemmes (2008)
17
Obtaining unbiased estimates
E(Yi1) = E(Yi1 | zi = 1) Randomized experiment E(Yi0) = E(Yi0 | zi = 0) E(Yi1) ≠ E(Yi1 | zi = 1) Non-randomized E(Yi0) ≠ E(Yi0 | zi = 0) experiment E(Yi1) = Ex{E(Yi1 | zi = 1, x)} Non-randomized E(Yi0) = Ex{E(Yi0 | zi = 0, x)} experiment with unconfoundedness assumption X contains all confounding covariates
18
Randomization Randomized experiment is gold standard for causal inference Covariate balance ensures that confounders cannot bias treatment effect Few assumptions Compliance No missing data No hidden treatment variations Independence of units (assignment of one unit does not influence outomce of another unit)
19
Non-randomized trial Lack of randomization can create imbalance PRIOR to treatment assignment Confounding occurs due to imbalance and relationship with outcome Bias can be corrected, but all confounders must be assessed no unique influence of confounder can be left out for unbiased effect estimate
20
Increasing use of Propensity Scores
Source: Web of Science
21
e(x) = p (z=1 | x) Propensity scores
z = treatment assignment 1 = treatment group 0 = control group Propensity score conditional on controlled for e(x) = p (z=1 | x) probability x = vector of covariates
22
e(x) = p (z=1 | x) Propensity scores
A single number summary based on all available covariates that expresses the probability that a given subject is assigned to the treatment condition, based on the values of the set of observed covariates
23
Propensity scores Actual assignment Actual assignment Control
Treatment Control Treatment Likelihood of receiving treatment Likelihood of receiving treatment
24
Example of balance property
original sample a b z e(x) .5 1 .33 .66 e(x) = p(z=1, x={0 0}) = .5 e(x) = p(z=1, x={1 0}) = .33 e(x) = p(z=1, x={0 1}) = .66 e(x) = p(z=1, x={1 1}) = 1 (a=1 | z=0) = .5 (b=1 | z=0) = 1/4 (a=1 | z=1) = .5 (b=1 | z=1) = .5
25
Example of balance property
matched sample p(z, x|e(x)) = p(z |e(x)) p(x |e(x)) a b z e*(x) .5 1 Examples for z=1 and x = {0 1} p(z=1, x={0 1}|e(x)) = 1/6 p(z=1 |e(x)) = .5 p(x={0 1} |e(x)) = .33 p(z |e(x)) p(x |e(x)) = (.5)(.33) = 1/6 (a=1 | z=0) = .5 (b=1 | z=0) = .5 (a=1 | z=1) = .5 (b=1 | z=1) = .5
26
Propensity scores Balance on the propensity score implies on average balance on all observed covariates Two units in the treatment and the control group that have the same propensity score are similar on all covariates. They only differ in terms of treatment received
27
Propensity score Propensity score models influence of confounders on treatment assignment In comparisons, ANCOVA models influence of confounders on outcome Confounder Treatment Outcome
28
Regression adjustment
Comparison Propensity scores Regression adjustment Tool to strengthen causal conclusions Models relationship between confounders and treatment Models relationship between confounders and outcome Assessment of overlap No assessment of overlap No assumption about functional form of propensity score Classic ANCOVA assumes lineartiy and absence of interaction, but can be extended Non-parametric conditioning (e.g., macthing) Parametric conditioning, functional form of regression adjustment Outcome variable unknown during propensity score analysis Outcome variable always part of the adjustment Sample size can be diminished, loss of power Sample size stays constant, power can increase due to covariates Hard, time-consuming Easy, widely implemented in software Subjective choices in modeling Widely accepted procedure
29
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Propensity score analysis is a multi-step process Researcher has choices at each step of the analysis
30
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Predicting Selection Confounder Predicting Outcome Treatment Outcome Select true confounders and covariates predictive of outcome
31
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Estimation of propensity scores can be achieved in numerous ways Logistic regression Discriminant analysis (Boosted) regression trees
32
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation = β0 + βi X Logistic regression model Outcome is treatment assignment Predictors are covariates can be overfitted to the sample, e.g. include interactions, higher order terms only interest is prediction and covariate balance
33
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Conditioning strategies Matching Weighting Regression adjustment
34
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Check of covariate balance t-test (not recommended) or standardized difference graphical assessment (e.g. Q-Q plot) Region of common support (distributional overlap) graphical assessment (e.g. histograms)
35
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Before Matching After Matching Logit Propensity Score Treatment Group Quantiles of both distributions are plotted against each other Logit Propensity Score Control Group
36
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Before Matching After Matching
37
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation
38
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation
39
Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Estimate of treatment effect Mean difference Standard error dependent on conditioning scheme
40
Applied Example Braver, Thoemmes, Moser, & Baham (in progress)
Can random invitation designs yield the same results as randomized controlled trials? Evaluation of a math treatment to teach rules of exponents – either administered as a randomized experiment or a random invitation design Currently in progress– Pilot data available
41
Design d* d Overall Sample RCT RI - Treatment RI - Control RCT RI - C
RCT - T RCT - C d* d = Attrition
42
Example Pretest General attitude towards math Altruism scale
Available time 19 covariates after forming factor scores
46
Linear regression adjustment Propensity score adjustment
Results RCT - T RCT - C Linear regression adjustment Effect 95% CI Sample Size .146* 176 Effect 95% CI Sample Size .146* 193 RI - C RI - T Propensity score adjustment ANCOVA with all covariates that went into propensity score estimation yields an effect of .185, which is in the opposite direction Effect 95% CI Sample Size .176* 193 Effect 95% CI Sample Size .148* 122
47
Mechanics of propensity score analysis
Can all of this be done in SPSS / SAS? Only parts of the analysis can be performed Mostly based on self-written macros R packages MatchIt and PSAgraphics offer best solutions Some experience / learning of R required Packages automate most of the analysis
48
Estimation of propensity score
Put covariates in model that are Theoretically important confounders Signifcantly related with treatment selection (unbalanced) Iterative process of including covariates and potentially higher order terms (interactions, polynomials)
49
Estimation of propensity score
Estimate PS Logistic regression Generlized additive model Classification tree / Regression tree / Recursive Partioning
50
Estiamtion of propensity score
Generalized additive model Instead of regular regression weight, a smoother is applied Graphic from SAS PROC GAM Imagine lowess smoother for each coefficient
51
Estimation of propensity score
Regression tree Splits sample at predictors that maximally seperate groups Final nodes are balanced on all variables Picture taken from XLMiner
52
Estimation choices GAMs are useful because they model non-linear relationships automatically Regression trees automatically detect non-linear relationships and interactions Little research on performance of these methods
53
Estimation choices Shadish et al. report that single regression trees do not outperform regular logistic regression Regression trees need to be pruned – unclear what kind of pruning achieves good overall balance in dataset Work on boosted regression trees (McCaffrey et al.)
54
Conditioning Choices are: Matching (in many different variations)
Stratification / Subclassification Weighting (done outside MatchIt)
55
Conditioning Stratification
straightforward method to classify sample into strata based on estimated PS in each strata covariates should be approximately balanced number of strata can be varied – Cochran suggested 5 strata to remove 90% bias of a single confounding covariate Stratification is easy to implement Residual bias can occur
56
Conditioning Matching Exact matching Nearest Neighbor matching
Optimal matching Full matching other matching algorithms possible
57
Matching Exact matching Only units that are identical are matched
with each other Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997
58
Matching Nearest Neighbor Match units that are approximately the same
order caliper replacement ratio Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997
59
Matching order caliper replacement ratio
start at largest, smallest, or random matched unit will not be matched to another unit local minimum caliper define the maximum distance between two neighbors, .1 of a standard deviation of the PS replacement unit can be recycled to be matched, weights necessary ratio one treated unit can be matched to more than one control (e.g., 1:2, 1:3)
60
Matching Optimal Matching
Match units that are approximately the same, but allow matches to be broken, if better match is possible Find global minimum ratio Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997
61
Matching Full Matching
Allows in some regions of the propensity score to match several controls to one treated and in other regions to match several treated to one control Useful if distributions are highly imbalanced Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997
62
Summary of matching choices
Methods offer tradeoffs between bias and variance Incomplete vs. Inexact matching Exact matching has (in theory) no bias, but can have large variance due to diminished sample size Less exact matching methods may have small residual bias, but less variance due to inclusion of more subjects
63
Other issues Discarding units prior to matching
Usually not necessary if caliper is definded Excluding only one side changes quantity of interest
64
Overlap Graphic from T.Love, ASA Workshop
65
Annotated R code (reference)
library(foreign) library(MatchIt) library(PSAgraphics) #read in dataset using Rcmdr# psa <- read.spss("C:/Users/fthoemmes/Desktop/ps workshop/testdatapsa.sav", use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE) #prima facie effect# pf <- lm(y~z, data=psa) summary(pf) ##matching# ##model to be used to predict z# #(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16, ##type of matching# #method= "nearest", ##discard option# #discard="both", ##caliper option# #caliper = .2, ##dataset# #data = psa) #same code in one single line# match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method= "nearest", discard="none", caliper = .1, data = psa) #summary of the matched sample# summary(match1) plot(match1,type="QQ") plot(match1,type="hist") plot(match1,type="jitter") #additional plot of standardized differences smatch1<-summary(match1,standardize=TRUE) plot(smatch1) #write out data dmatch1 <-match.data(match1) #additional graphics# #put variables in objects continuous<-dmatch1$X1 treatment<-dmatch1$z #create strata from estimated PS dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions', labels=FALSE) strata<-dmatch1$strata #box plot comparing balance of variables across strata box.psa(continuous, treatment, strata) #treatment effect of matched sample m1 <- lm(y~z, data=dmatch1) summary(m1)
66
Annotated R output (reference)
> pf <- lm(y~z, data=psa) > summary(pf) Call: lm(formula = y ~ z, data = psa) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** z e-13 *** OLS regression Outcome regressed on treatment Prima facie treatment effect 1.476, p < .01
67
Annotated R output (reference)
Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance X X X X X X X X X X Summary of balance for matched data: distance X X X X X X X X X X Balance on all covariates Unmatched Sample and matched sample Means, SD, Mean Diff, differences based on Q-Q plot (mean, median, max)
68
Annotated R output (reference)
Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance X X X X X X X X X X Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 Percent balance improvment Measure of balance Sample sizes before and after matching
69
Graphics Q-Q plot for each variable to check balance
Straight line following 45° diagonal is desired plot(match1,type="QQ")
70
Graphics Histograms to examine distribution of propensity score
Split by treatment and control Before and after matching plot(match1,type="hist")
71
Graphics Jittered dotplot
Shows regions of propensity score that were matched plot(match1,type="jitter")
72
Graphics Plot of standardized differences pre- and post-matching
smatch1<-summary(match1,standardize=TRUE) plot(smatch1)
73
Graphics #write out data dmatch1 <-match.data(match1) #additional graphics# #put variables in objects continuous<-dmatch1$X1 treatment<-dmatch1$z #create strata from estimated PS dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions', labels=FALSE) strata<-dmatch1$strata #box plot comparing balance of variables across strata box.psa(continuous, treatment, strata)
74
Annotated R output (reference)
Call: lm(formula = y ~ z, data = dmatch1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) z *** Treatment effect after matching now .789, p < .05 Treatment effect still in same direction but greatly diminished
75
Advantages of Propensity Scores
Collapses multivariate problem into single dimensional problem No stringent assumptions about functional form Model checks allow easy assessment of balance Clearly defined region of common support (no extrapolation)
76
Limitations Unmeasured covariates can still bias effect estimates
Propensity score function can be challenging to estimate If assumptions of ANCOVA are fully met, propensity scores offer little gain
77
Propensity scores Method is another tool for applied researchers to adjust for confounding influences Propensity scores have some advantages and disadvantages over traditional regression adjustment In applied context, choice of confounding variables and reliabilty of measurement will be more critical than choice of adjustment method!
78
Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Thank you Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.