Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Propensity Score Analysis A tool for causal inference in non-randomized studies Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Agenda Motivating examples
Definition of causal effects with potential outcomes Definition of propensity scores Applied example of propensity scores Hands-on example in R Advanced Topics

Motivation Randomized experiment is considered “gold standard” for causal inference Randomization not always possible Trials where treatment is offered to community at large Participants do not permit randomization Ethical or legal considerations Naturally occurring phenomena Broken randomized experiments (attrition, non-compliance, treatment diffusion) The randomized controlled trial has been called the gold standard for causal inference, because a randomized experiment has desirable properties to infer causal relationships from observed associations. A properly conducted randomized experiment allows for causal claims. Now, we cannot always randomized treatments, but there still phenomena that we are interested in and that we would like to study. Does that mean that we are unable or even should not study a phenomenon because we can’t randomize? No – but we have to be careful about the implications of non-randomization. Here are a few examples of instances where we can’t randomize: Trials where treatment is offered to community at large – for example a social justice oriented program that is offered to everybody who needs it, and maybe our main interest is to help the community, but we still might be interested in evaluating the effectiveness of the program Stakeholders do not permit randomization – I live in Arizona and we currently have a lot of TV ads warning people about the dangers of using methemphatimes. The people from Montana who fund this study are mainly interested in putting these ads on TV, but they don’t want to randomize this (granted, it would be very hard to do). Ethical or legal considerations Naturally occurring phenomena – you obviously can’t randomly assign cities to be hit by Hurricane Katrina, but there are numerous studies out there that tried to assess effects on hurricane victims Broken randomized experiments – Even if you randomize a treatment, as soon as you leave the confinement of a laboratory the randomization can break down. Attrition, or even differential attrition can occur, non-compliance (meaning people don’t do what they are supposed to do under the treatment) can occur, or treatments can diffuse to the control group. This is an important category, because just because you randomize a treatment does not mean that everything will work out that way. As a consultant at the ASU School of Social Work I worked with a colleague that randomized a treatment for immigrant families , but the randomization occurred at the treatment delivery site. Upon inspecting the data it became clear that the treatment administrators actually gave the treatment more often to families that were in greater need.

Broken randomized experiments

Motivation Non-random assignment leads to group imbalances at pretest
Selection bias Confounding of treatment effects due to imbalanced covariates Any of the situations mentioned above leads us to not use randomization. But we know that non-random assignment can lead to group imbalances and that in turn can lead to confounds, if an unbalanced variable is also related to treatment. Wilson and Lipsey for example examined a total 302 meta-analyses mostly in the area of intervention research and looked for differences in randomized and non-randomized experiments. Even though there was no systematic bias across all 302 domains, there are areas of research were randomized and non-randomized studies yield different answers. Knowing that non-randomized studies can be sometimes misleading and yield different answers than randomized controlled trials – what are some strategies that can bolster our faith in drawing a causal inference from a non-randomized study?

Hormone Replacement therapy
1968 “Feminine Forever“ 2002 Women‘s Health Initative trial on hormone replacement therapy

Motivation Adjustment methods
ANCOVA / Regression Adjustment Matching Stratification Many covariates are needed to control for potentially confounding influences Traditionally, researchers in the social sciences have been using ANOCVA models and regression adjustment a lot. Matching and Stratification are methods that can be used but are less frequently employed. Usually, we want to include many covariates to rule out any potential confounding influence.

Motivation Assumptions of classic ANCOVA model
linearity no baseline by treatment interactions Region of common support in multi-dimensional space hard to assess extrapolation beyond data is sensitive to model adequacy But adding all these covariates can be a problem. ANCOVA or regression adjustment for that matter has assumptions that become increasingly hard to check as the number of covariates increase. First, we need to check that relationships to the outcome are linear. If this assumption is violated, treatment effects might be biased. Second, there can’t be no baseline by treatment interaction, again if this is violated then out treatment effects might be biased. In short, we need to model the relationship between outcome and the covariates properly to have confidence in our results. Another problem that can arise in multivariate adjustment approaches is that the region of common support is hard to assess. The region of common support is simply a region in multi-dimensional space in which control and treatment group overlap on all the covariates. Regression adjustment can actually extrapolate results far beyond region in which data is observed, which makes it very sensitive to model assumptions.

Motivation – Key Issues
Non-randomized studies are necessary Many covariates should be assessed to control for confounding influences High-dimensional regression adjustment has strong assumptions and distributional overlap is hard to check

Defining causal effects
Definition of causal effect is often lacking in applied social science Parameter estimates from any model (ANOVA, regression, structural equation model) may or may not be causally interpretable

Rubin Causal Model TREATMENT Unit –level Causal Effect CONTROL

Rubin Causal Model TREATMENT CONTROL Average Causal Effect

Rubin Causal Model TREATMENT CONTROL
Estimate of the Average Causal Effect

Rubin Causal Model CONTROL TREATMENT CONTROL TREATMENT
E(Yi1) = E(Yi1 | zi = 1) E(Yi0) = E(Yi0 | zi = 0) E(Yi1) ≠ E(Yi1 | zi = 1) E(Yi0) ≠ E(Yi0 | zi = 0)

Rubin Causal Model τ = 11.33-12.83 = -1.5 τ* = 11.33-13.33 = -2.0
Potential Outcomes Observed Outcomes T C 10  11 12 16 13 15 11.33 12.83 13.33 τ = = -1.5 τ* = = -2.0 E(Yi1) = E(Yi1 | zi = 1) E(Yi0) = E(Yi0 | zi = 0) Source: West and Thoemmes (2008)

Rubin Causal Model τ = 11.33-12.83 = -1.5 τ* = 11.66-11.00 = .66
Potential Outcomes Observed Outcomes T C 10  11 12 16 13 15 11.33 12.83 11.66 τ = = -1.5 τ* = = .66 E(Yi1) ≠ E(Yi1 | zi = 1) E(Yi0) ≠ E(Yi0 | zi = 0) Source: West and Thoemmes (2008)

Obtaining unbiased estimates
E(Yi1) = E(Yi1 | zi = 1) Randomized experiment E(Yi0) = E(Yi0 | zi = 0) E(Yi1) ≠ E(Yi1 | zi = 1) Non-randomized E(Yi0) ≠ E(Yi0 | zi = 0) experiment E(Yi1) = Ex{E(Yi1 | zi = 1, x)} Non-randomized E(Yi0) = Ex{E(Yi0 | zi = 0, x)} experiment with unconfoundedness assumption X contains all confounding covariates

Randomization Randomized experiment is gold standard for causal inference Covariate balance ensures that confounders cannot bias treatment effect Few assumptions Compliance No missing data No hidden treatment variations Independence of units (assignment of one unit does not influence outomce of another unit)

Non-randomized trial Lack of randomization can create imbalance PRIOR to treatment assignment Confounding occurs due to imbalance and relationship with outcome Bias can be corrected, but all confounders must be assessed  no unique influence of confounder can be left out for unbiased effect estimate

Increasing use of Propensity Scores
Source: Web of Science

e(x) = p (z=1 | x) Propensity scores
z = treatment assignment 1 = treatment group 0 = control group Propensity score conditional on controlled for e(x) = p (z=1 | x) probability x = vector of covariates

e(x) = p (z=1 | x) Propensity scores
A single number summary based on all available covariates that expresses the probability that a given subject is assigned to the treatment condition, based on the values of the set of observed covariates

Propensity scores Actual assignment Actual assignment Control
Treatment Control Treatment Likelihood of receiving treatment Likelihood of receiving treatment

Example of balance property
original sample a b z e(x) .5 1 .33 .66 e(x) = p(z=1, x={0 0}) = .5 e(x) = p(z=1, x={1 0}) = .33 e(x) = p(z=1, x={0 1}) = .66 e(x) = p(z=1, x={1 1}) = 1 (a=1 | z=0) = .5 (b=1 | z=0) = 1/4 (a=1 | z=1) = .5 (b=1 | z=1) = .5

Example of balance property
matched sample p(z, x|e(x)) = p(z |e(x)) p(x |e(x)) a b z e*(x) .5 1 Examples for z=1 and x = {0 1} p(z=1, x={0 1}|e(x)) = 1/6 p(z=1 |e(x)) = .5 p(x={0 1} |e(x)) = .33 p(z |e(x)) p(x |e(x)) = (.5)(.33) = 1/6 (a=1 | z=0) = .5 (b=1 | z=0) = .5 (a=1 | z=1) = .5 (b=1 | z=1) = .5

Propensity scores Balance on the propensity score implies on average balance on all observed covariates Two units in the treatment and the control group that have the same propensity score are similar on all covariates. They only differ in terms of treatment received

Propensity score Propensity score models influence of confounders on treatment assignment In comparisons, ANCOVA models influence of confounders on outcome Confounder Treatment Outcome

Regression adjustment
Comparison Propensity scores Regression adjustment Tool to strengthen causal conclusions Models relationship between confounders and treatment Models relationship between confounders and outcome Assessment of overlap No assessment of overlap No assumption about functional form of propensity score Classic ANCOVA assumes lineartiy and absence of interaction, but can be extended Non-parametric conditioning (e.g., macthing) Parametric conditioning, functional form of regression adjustment Outcome variable unknown during propensity score analysis Outcome variable always part of the adjustment Sample size can be diminished, loss of power Sample size stays constant, power can increase due to covariates Hard, time-consuming Easy, widely implemented in software Subjective choices in modeling Widely accepted procedure

Propensity Score Workflow
Selection Estimation Conditioning Model Checks Effect Estimation Propensity score analysis is a multi-step process Researcher has choices at each step of the analysis

Selection Estimation Conditioning Model Checks Effect Estimation Predicting Selection Confounder Predicting Outcome Treatment Outcome Select true confounders and covariates predictive of outcome

Selection Estimation Conditioning Model Checks Effect Estimation Estimation of propensity scores can be achieved in numerous ways Logistic regression Discriminant analysis (Boosted) regression trees

Selection Estimation Conditioning Model Checks Effect Estimation = β0 + βi X Logistic regression model Outcome is treatment assignment Predictors are covariates can be overfitted to the sample, e.g. include interactions, higher order terms only interest is prediction and covariate balance

Selection Estimation Conditioning Model Checks Effect Estimation Conditioning strategies Matching Weighting Regression adjustment

Selection Estimation Conditioning Model Checks Effect Estimation Check of covariate balance t-test (not recommended) or standardized difference graphical assessment (e.g. Q-Q plot) Region of common support (distributional overlap) graphical assessment (e.g. histograms)

Selection Estimation Conditioning Model Checks Effect Estimation Before Matching After Matching Logit Propensity Score Treatment Group Quantiles of both distributions are plotted against each other Logit Propensity Score Control Group

Selection Estimation Conditioning Model Checks Effect Estimation Before Matching After Matching

Selection Estimation Conditioning Model Checks Effect Estimation

Selection Estimation Conditioning Model Checks Effect Estimation Estimate of treatment effect Mean difference Standard error dependent on conditioning scheme

Applied Example Braver, Thoemmes, Moser, & Baham (in progress)
Can random invitation designs yield the same results as randomized controlled trials? Evaluation of a math treatment to teach rules of exponents – either administered as a randomized experiment or a random invitation design Currently in progress– Pilot data available

Design d* d Overall Sample RCT RI - Treatment RI - Control RCT RI - C
RCT - T RCT - C d* d = Attrition

Example Pretest General attitude towards math Altruism scale
Available time 19 covariates after forming factor scores

Linear regression adjustment Propensity score adjustment
Results RCT - T RCT - C Linear regression adjustment Effect 95% CI Sample Size .146* 176 Effect 95% CI Sample Size .146* 193 RI - C RI - T Propensity score adjustment ANCOVA with all covariates that went into propensity score estimation yields an effect of .185, which is in the opposite direction Effect 95% CI Sample Size .176* 193 Effect 95% CI Sample Size .148* 122

Mechanics of propensity score analysis
Can all of this be done in SPSS / SAS? Only parts of the analysis can be performed Mostly based on self-written macros R packages MatchIt and PSAgraphics offer best solutions Some experience / learning of R required Packages automate most of the analysis

Estimation of propensity score
Put covariates in model that are Theoretically important confounders Signifcantly related with treatment selection (unbalanced) Iterative process of including covariates and potentially higher order terms (interactions, polynomials)

Estimate PS Logistic regression Generlized additive model Classification tree / Regression tree / Recursive Partioning

Estiamtion of propensity score
Generalized additive model Instead of regular regression weight, a smoother is applied Graphic from SAS PROC GAM Imagine lowess smoother for each coefficient

Regression tree Splits sample at predictors that maximally seperate groups Final nodes are balanced on all variables Picture taken from XLMiner

Estimation choices GAMs are useful because they model non-linear relationships automatically Regression trees automatically detect non-linear relationships and interactions Little research on performance of these methods

Estimation choices Shadish et al. report that single regression trees do not outperform regular logistic regression Regression trees need to be pruned – unclear what kind of pruning achieves good overall balance in dataset Work on boosted regression trees (McCaffrey et al.)

Conditioning Choices are: Matching (in many different variations)
Stratification / Subclassification Weighting (done outside MatchIt)

Conditioning Stratification
straightforward method to classify sample into strata based on estimated PS in each strata covariates should be approximately balanced number of strata can be varied – Cochran suggested 5 strata to remove 90% bias of a single confounding covariate Stratification is easy to implement Residual bias can occur

Conditioning Matching Exact matching Nearest Neighbor matching
Optimal matching Full matching other matching algorithms possible

Matching Exact matching Only units that are identical are matched
with each other Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997

Matching Nearest Neighbor Match units that are approximately the same
order caliper replacement ratio Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997

Matching order caliper replacement ratio
start at largest, smallest, or random matched unit will not be matched to another unit local minimum caliper define the maximum distance between two neighbors, .1 of a standard deviation of the PS replacement unit can be recycled to be matched, weights necessary ratio one treated unit can be matched to more than one control (e.g., 1:2, 1:3)

Matching Optimal Matching
Match units that are approximately the same, but allow matches to be broken, if better match is possible Find global minimum ratio Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997

Matching Full Matching
Allows in some regions of the propensity score to match several controls to one treated and in other regions to match several treated to one control Useful if distributions are highly imbalanced Control Treated .124 .226 .126 .289 .211 .365 .389 .415 .517 .566 .656 .789 .733 .856 .821 .997

Summary of matching choices
Methods offer tradeoffs between bias and variance Incomplete vs. Inexact matching Exact matching has (in theory) no bias, but can have large variance due to diminished sample size Less exact matching methods may have small residual bias, but less variance due to inclusion of more subjects

Other issues Discarding units prior to matching
Usually not necessary if caliper is definded Excluding only one side changes quantity of interest

Overlap Graphic from T.Love, ASA Workshop

Annotated R code (reference)
library(foreign) library(MatchIt) library(PSAgraphics) #read in dataset using Rcmdr# psa <- read.spss("C:/Users/fthoemmes/Desktop/ps workshop/testdatapsa.sav", use.value.labels=FALSE, max.value.labels=Inf, to.data.frame=TRUE) #prima facie effect# pf <- lm(y~z, data=psa) summary(pf) ##matching# ##model to be used to predict z# #(z ~ x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14+x15+x16, ##type of matching# #method= "nearest", ##discard option# #discard="both", ##caliper option# #caliper = .2, ##dataset# #data = psa) #same code in one single line# match1 <- matchit(z ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10 , method= "nearest", discard="none", caliper = .1, data = psa) #summary of the matched sample# summary(match1) plot(match1,type="QQ") plot(match1,type="hist") plot(match1,type="jitter") #additional plot of standardized differences smatch1<-summary(match1,standardize=TRUE) plot(smatch1) #write out data dmatch1 <-match.data(match1) #additional graphics# #put variables in objects continuous<-dmatch1$X1 treatment<-dmatch1$z #create strata from estimated PS dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions', labels=FALSE) strata<-dmatch1$strata #box plot comparing balance of variables across strata box.psa(continuous, treatment, strata) #treatment effect of matched sample m1 <- lm(y~z, data=dmatch1) summary(m1)

Annotated R output (reference)
> pf <- lm(y~z, data=psa) > summary(pf) Call: lm(formula = y ~ z, data = psa) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** z e-13 *** OLS regression Outcome regressed on treatment Prima facie treatment effect 1.476, p < .01

Summary of balance for all data: Means Treated Means Control SD Control Mean Diff eQQ Med eQQ Mean eQQ Max distance X X X X X X X X X X Summary of balance for matched data: distance X X X X X X X X X X Balance on all covariates Unmatched Sample and matched sample Means, SD, Mean Diff, differences based on Q-Q plot (mean, median, max)

Percent Balance Improvement: Mean Diff. eQQ Med eQQ Mean eQQ Max distance X X X X X X X X X X Sample sizes: Control Treated All Matched Unmatched Discarded 0 0 Percent balance improvment Measure of balance Sample sizes before and after matching

Graphics Q-Q plot for each variable to check balance
Straight line following 45° diagonal is desired plot(match1,type="QQ")

Graphics Histograms to examine distribution of propensity score
Split by treatment and control Before and after matching plot(match1,type="hist")

Graphics Jittered dotplot
Shows regions of propensity score that were matched plot(match1,type="jitter")

Graphics Plot of standardized differences pre- and post-matching
smatch1<-summary(match1,standardize=TRUE) plot(smatch1)

Graphics #write out data dmatch1 <-match.data(match1) #additional graphics# #put variables in objects continuous<-dmatch1$X1 treatment<-dmatch1$z #create strata from estimated PS dmatch1$strata <- bin.var(dmatch1$distance, bins=5, method='proportions', labels=FALSE) strata<-dmatch1$strata #box plot comparing balance of variables across strata box.psa(continuous, treatment, strata)

Call: lm(formula = y ~ z, data = dmatch1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) z *** Treatment effect after matching now .789, p < .05 Treatment effect still in same direction but greatly diminished

Advantages of Propensity Scores
Collapses multivariate problem into single dimensional problem No stringent assumptions about functional form Model checks allow easy assessment of balance Clearly defined region of common support (no extrapolation)

Limitations Unmeasured covariates can still bias effect estimates
Propensity score function can be challenging to estimate If assumptions of ANCOVA are fully met, propensity scores offer little gain

Propensity scores Method is another tool for applied researchers to adjust for confounding influences Propensity scores have some advantages and disadvantages over traditional regression adjustment In applied context, choice of confounding variables and reliabilty of measurement will be more critical than choice of adjustment method!

Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University
Thank you Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Similar presentations

Presentation on theme: "Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University

Similar presentations

Presentation on theme: "Summer Statistics Workshop 2010 Felix Thoemmes Texas A&M University"— Presentation transcript:

Similar presentations

About project

Feedback