Presentation is loading. Please wait.

Presentation is loading. Please wait.

Propensity Score Matching Lava Timsina Kristina Rabarison CPH 786-001 Doctoral Seminar Fall 2012.

Similar presentations


Presentation on theme: "Propensity Score Matching Lava Timsina Kristina Rabarison CPH 786-001 Doctoral Seminar Fall 2012."— Presentation transcript:

1 Propensity Score Matching Lava Timsina Kristina Rabarison CPH 786-001 Doctoral Seminar Fall 2012

2 Introduction Program evaluation Counterfactual outcome – what would have happened to the participants in absence of treatment Statistical techniques Propensity score

3 Concept of PSM Identify neighborhoods that are as similar as possible to each other with respect to the probability of receiving the treatment (Chiavegatto Filho, Kawachi, & Gotlieb, 2012). The average treatment effects is then measured based on the mean difference in outcomes across these comparison and treatment groups.

4 Experimental Vs Non-experimental Experimental evaluation – Random assignment to treatment and control  control group can be regarded as counterfactual.

5 Motivation to Propensity Score Matching Non-experimental evaluation – Random assignment may not be possible in nonexperimental evaluation methods (Heinrich et al., 2010). – Assignment to treatment is often nonrandom and hence may bias participation and treatment outcomes. – Treatment units are matched with their “similar” counterparts that differ only in the treatment under study. – Extent of matching is challenging

6 Propensity score matching allows this matching problem to be reduced to a single dimension. Let me restate that propensity score is defined as the probability that a unit in the combined sample of treated and control units receive the treatment, given a set of observed covariates.

7 Assumptions for PSM PSM holds under two assumptions (Khandker, 2010; Rosenbaum & Rubin, 1983): – Conditional Independence or Unconfounded Assumption – Common Support or Overlap Condition

8 Conditional Independence or Unconfounded Assumption: – conditional on observable covariates, the outcomes are independent of treatment In absence of randomization, the groups may differ not only in the treatment status, but also in other covariates. Thus it is necessary to control for these covariates to avoid potential biases. There is a set of covariates observable to the researcher, such that after controlling for these covariates, the potential outcomes (r 1,r 0 ) are independent of the treatment status: (r 1,r 0 ) ⊥ Z|X

9 Common Support Condition Assumption – This condition ensures that treatment observations have comparison observations “nearby” in the propensity score distribution. For each value of X, there is positive probability of being both treated and untreated: 0<P(Z=1|X)<1 – Also called overlap condition

10 Steps of PSM (European Commission, 2009; Khandker, 2010): 1.Estimating a model of program participation 2.Defining the region of common support and balancing tests 3.Matching participants to nonparticipants 4.Estimating the Average Effect and its Standard Error

11 Steps of PSM (European Commission, 2009; Khandker, 2010): 1.Estimating a model of program participation i.Samples of participants and nonparticipants should be pooled, ii.Participation Z should be estimated on all the observed covariates X in the data that are likely to determine participation.  Probit or logit model of program participation  This predicted outcome represents the estimated probability of participation or propensity score for every sampled participants and non-participants.

12 Steps of PSM (European Commission, 2009; Khandker, 2010): 2.Defining the region of common support and balancing tests –The region of common support needs to be defined where distributions of the propensity score for treatment and comparison group overlap.

13 Steps of PSM (European Commission, 2009; Khandker, 2010): Some of the participant and nonparticipant observation falling outside the region of common support may have to be dropped.

14 Steps of PSM (European Commission, 2009; Khandker, 2010): Balancing tests can be conducted to check whether, within each quantile of the propensity score distribution, the treatment and comparison groups have similar average propensity scores and the mean of X. That is, the distributions of the treated group and the control group must be similar: p̂(X|Z=1) = p̂(X|Z=0)

15 Steps of PSM (European Commission, 2009; Khandker, 2010): 3.Matching participants to nonparticipants – Matching participants to nonparticipants on the basis of propensity score can be done using different matching technique.

16 Techniques of matching 1.Nearest-neighbor (NN) matching – Each treatment unit is matched to the comparison unit with the closest propensity score. Matching can be done with or without replacement. – In this method, both treatment and control groups are first randomly sorted. Then the first treatment unit is selected to find its closest control match based on the absolute value of the difference between the propensity score (or logit of the propensity score) of the selected treatment and that of the control under consideration. The closest control unit is selected as a match.

17 Techniques of matching 2.Caliper or radius matching – This method is similar to NN matching except it adds restriction. Both treatment and control units are randomly sorted and then the first treated unit is selected to find its closest control match in terms of the propensity score but only if the control’s propensity score is within a certain radius (caliper). – This avoids bad matching and ensures that the matched pairs are within a certain range of propensity scores.

18 Techniques of matching 3.Stratification or interval matching – Partitions the common support into different strata (or intervals) and calculates the program’s impact within each interval. – The treated and control groups are ranked on the basis of their propensity scores, and then grouped into K intervals (strata). – Then the impact for each k th stratum is evaluated. – The overall impact is the weighted average of the strata effects, with weights proportional to the number of treated units in each stratum.

19 Techniques of matching 4.Kernel and local linear matching – Kernel matching uses weighted averages of all individuals in the control group to construct the counterfactual outcome. – Weights depend on the distance between each individual from the control group and the participant observation for which the counterfactual is estimated. – The kernel function assigns higher weight to observations close in terms of propensity score to a treated individual and lower weight on more distant observations.

20 Steps of PSM (European Commission, 2009; Khandker, 2010): 4.Estimating the Average Effect and its Standard Error – compute the sample averages of the two groups and calculate the difference – Standard errors can be computed using bootstrapping methods.

21 Example Research question: Does homelessness affect physical health, as measured by pcs score from the SF-36?

22 A glimpse at the dataset proc print data=ref.help (obs=10); var id pcs mcs age female i1 homeless; run; ObsIDPCSMCSAGEFEMALEI1HOMELESS 1 158.413725.1120370130 2 236.036926.6703370561 3 374.80636.762926000 4 461.931743.967939150 5 537.345621.6758320101 6 646.475255.509047140 7 724.515021.7930491130 8 865.13809.1605280121 9 938.270922.0297501711 10 22.610636.1438390201

23 Modeling as linear regression proc reg data=ref.help; model pcs=homeless; run; Number of Observations Read 453 Number of Observations Used 453 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept 149.000830.6880271.22<.0001 HOMELESS 1-2.064051.01292-2.040.0422 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept 158.212242.5667522.68<.0001 HOMELESS 1-1.147070.99794-1.150.2510 AGE 1-0.265930.06410-4.15<.0001 FEMALE 1-3.955191.15142-3.440.0006 I1 1-0.080790.02538-3.180.0016 MCS 10.070320.038071.850.0654 proc reg data=ref.help; model pcs = homeless age female i1 mcs; run; quit;

24 Creating propensity scores proc logistic data=ref.help desc; model homeless = age female i1 mcs; output out=propen pred=propensity; run; Association of Predicted Probabilities and Observed Responses Percent Concordant 64.9 Somers' D 0.302 Percent Discordant 34.7 Gamma 0.304 Percent Tied 0.4 Tau-a 0.151 Pairs 50996 c 0.651

25 Looking for the assumption of common support proc means data=propen; class homeless; var propensity; run; Analysis Variable : propensity Estimated Probability HOMELESS N ObsNMeanStd DevMinimumMaximum 0244 0.42967040.11662900.21367910.7876000 1209 0.49837500.13820130.26350310.9642827

26 proc univariate data=propen; class homeless; var propensity; histogram propensity; run; Looking for the assumption of common support

27 Restricting analysis to common support region Parameter Estimates VariableLabelDF Parameter Estimate Standard Errort ValuePr > |t| Intercept 154.199451.9826527.34<.0001 HOMELESS 1-1.196131.03893-1.150.2502 propensity Estimated Probability1-12.099094.33385-2.790.0055 proc reg data=propen; where propensity<0.8; model pcs=homeless propensity; run; quit;

28 Macros data prop2; set propen; if propensity<0.8; run; %include "C:\Documents and Settings\LRTI222\Desktop\vmatch.sas"; %include "C:\Documents and Settings\LRTI222\Desktop\dist.sas"; %include "C:\Documents and Settings\LRTI222\Desktop\nobs.sas"; %dist(data=prop2, group=homeless, id=id, mvars=propensity, wts=1, vmatch=Y, a=1, b=1, lilm=201, dmax=0.1, outm=mp1_b, summatch=n, printm=N, mergeout=mpropen);

29 Before & After Matching proc means data=propen mean; class homeless; var age female i1 mcs; run; HOMELESS N ObsVariableMean 0244 AGE FEMALE I1 MCS 35.0409836 0.2745902 13.5122951 32.4868303 1209AGE FEMALE I1 MCS 36.3684211 0.1913876 23.0382775 30.7308549 proc means data=mpropen mean; where matched; class homeless; var age female i1 mcs; run; HOMELESS N ObsVariableMean 0201 AGE FEMALE I1 MCS 35.6218905 0.1791045 15.9154229 31.4815123 1201AGE FEMALE I1 MCS 36.1492537 0.1990050 19.9452736 30.9176772

30 Propensity score matched analysis proc reg data=mpropen; where matched; model pcs=homeless; run; quit; Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept 148.952730.7620064.24<.0001 HOMELESS 1-1.793861.07763-1.660.0968


Download ppt "Propensity Score Matching Lava Timsina Kristina Rabarison CPH 786-001 Doctoral Seminar Fall 2012."

Similar presentations


Ads by Google