Matching Procedures: Propensity Score

Matching Procedures: Propensity Score
Héctor Lamadrid Center for Population Health Research National Institute of Public Health, Mexico We will now cover matching, an important techinque for dealing with non-experimental designs.

What is matching for? In an ideal setting, we will always have a random assignment of the treatment. This makes estimating the programme effect straightforward and relativley easy… However: Often this is not possible, or we are called to evaluate once the programme is running… What to do? We always need to look for the best possible proxy of the counterfactual We slresdy explained that the counterfactual is the holy grail of impact evaluation, but we seldom have the chance to conduct experiments. What to do then? How do we get a good proxy?

Matching Suppose that individuals are voluntarily enrolling (or not enrolling) in a programme. It is very likely that those who decide to enroll are very different to those who don’t A simple comparison between both groups will surely be very biased. We already made it clear that simple comparisons are likely biased because of selectivity processes wich make the groups non-comparable.

Matching What to do then? Take one subject A, who enrolled….
Look for other subject who is almost identical to subject A, but who DID NOT ENROLL, lets call him subject B. He will be our proxy of the counterfactual Calculate the difference in the outcome between both subjects. Repeat for all the treated subjects… Voilá! We get an estimate of the Average Treatment Effect! However, what if we somehow could find, for each treated person, a person who is almost exacty the same (same age, same SES, same marital status, same religion, etc. ) but that did not get the treatment. We could then use that person as the proxy of the counterfactualand take the difference of its outcome and the outcome of the treated individual. If we do this for all the subjects in the sample and average the differences, we would get the average treatment effect.

Matching All this sounds great but note that subject B must be almost identical to A, that means we desire both to have the exact same values of MANY variables related to the outcome. The problem is that being “almost identical” means finding a person with the exact same values of many covariates.

Matching on 1 variable Example 1: pairs=4
If we were to match on one variable, the task would be relativey easy…

possible covariate patterns: it is very hard to get a good match!
Matching on 3 variables Example 2: ? However, what happens if we have more covariates? In this example we ae three covariates, with each having four pssible values. This means we have 4 x 4 x 4 = 64 possible covariate patterns! Assuming these are equally distributed in the popuation (which is not often the case), we would find one control out of every 64 subjects. This doesn’t even consider the case where some variables have more than four possible outcomes (or are numerical), and even if that were not the case, matching on only 4 covariates is unlikely to be very “convincing” .This problem is called the “dimensionality curse”. With only 3 covariates, and with only 4 possible values each, we have 43=64 possible covariate patterns: it is very hard to get a good match! Dimensionality problem

Matching So we could try to match on several important variables but adding more variables makes it exponentially more difficult to find a match. So… Rosenbaum & Rubin came up with this idea of collapsing all information of covariates in a single measure: The “Propensity Score” One solution to this problem is the Propensity Score Matching or PSM: this is a techinque that collapses information of many covariates on a single vaariable called the “Propensity score”, which is the estimated probability that an individual (or household or any other unit of analysis) receives the program.

The propensity score Employs a predicted probability of group membership—e.g., treatment vs. control group--based on observed predictors, usually obtained from logistic or probit regression to create a counterfactual proxy. Propensity scores may be used for matching or as covariates in a regression model—alone or with other matching variables or covariates. The predicted probability is estimted through a probit or logit model, in which the left hand side variable is program participation. After obtaining the propensity score, one can use that to match somene in the treated group to someone untreated but with the same probability of participating. This is much easier than trying to match on a large number of covariates.

Propensity Score The propensity score is the estimated probability that an individual participates in the programme given a set of observable characteristics X. p(X)=Pr(T=1|X) It can be thought of a “summary” of multiple factors that may make the comparison groups (treatment & controls) different As any other probability, it lies between the values 0 and 1. Once again: the propensity score is the estimated probability that somebody joins the program. Since the probability is constructed based on covariate information, it is in fact, a summary of those covariates.

Propensity Score Useful for matching:
Instead of matching on an enormous set of covariates, we can match two individuals who have the same probability of being enrolled in the programme! We have to make a couple of important assumptions: Conditional on X, the treatment assignment is independent of the potential outcomes. There is a common support area (an overlap of the distribution of p(X) in the treatment & control groups) A couple of important assumptions/issues to bear in mind when working with propensity scores are: that the treatment is independent of potential outcomes, conditional on the X’s; in practical terms, this means that all covariates that are relevant for both the selecction process and the outcome, where included in the propensity model. The second issue is that there has to be overlap in the distribution of the propensity in both groups, that is: for every treated subject, there must be someone in the contrl group that has the same (or very close) propensity score.

Regions of common support
In this graphical representation of the propensity score, we observe that naturally the distribution of the propensity is closer to one in the treated group; this speaks well of the model, however, the problem is that thre is a subset of those with very high probability of participation (far right in the upper graph), who have no potential controls. This might have to be excluded from the analysis, sadly.

PS Matching Disadvantages:
Large samples are required Group overlap must be substantial (which often reduces the effective sample size) Most important: Bias may remain because matching only controls for observed variables (to the extent that they are perfectly measured), and not for unobservable. The main limitations of the PSM strategy are the large sample size needed, and the need to have overlap or common support. But the most important thing is that PSM can give evaluators a false sense of confidence; this occurs becaause covariate balance checka after matching tend to give spectacular results. However, we should never forget that balance is only achieved in covariates that were included in the PSM estimation process, and, as opposed to randomized designs, there is absolutely no guarantee that balance will occur in the unobservables.

General procedure STEP 1 Obtain the propensity score
Run logistic or probit regression: Y=1 if subject participates in the programme, else Y=0 Get estimated probabilities X’s need not be affected by the programme (use baseline information) STEP 2 Select non participants and match them to participants on the basis of p(X) similarity. (There are several algorithms to do this) In order to get the propensity score matching, one has too perform a two-step procedure. The first is the estimation of the propensity score itself. The guide to select the covariates is the same that we use in any other evaluation design: the concetual framework. The same care has to be taken, to avoid including covariates that are affected by the program, to avoid endogeneity bias due to reverse causality. Step 2 is the matching procedure proper, several algortihms do the matching for you…

STEP 3: Check balance between the samples If samples are not balanced return to revise the PS model Pr(T=1 | X): include more variables & interactions The third step is to check the performance of the procedure by looking at the imbalances in observed covariates after the matching. If the balance is not satisfactory you need to go back to step 1.

STEP 4 Estimate the ATT: The 4th step is to estimate the Average Treatment Effect on the Treatedor ATT. The formula above describes the general procedure: the difference in the outcome variable between members of the matched pair or group of treated and controls, the formula features weights that vary between different procedures. For example, in some procedures those who are nearest the treated unit receive more weights than those who are far. Different procedures are briefly mentioned in the following slides.

Different ways to determine w:
nearest neighbor: k nearest neighbors: kernel weights: These are formulas of weightning procedures, we will not get into the statistical details as they can be found in dedicated theoretical literature. Suffce to say that nearest neighbor matchin is the simplest procedure, choosing a number of units which are closer to the treates unit in terms of the propensity score, each control receives a weight that is the inverse of the number of controls per treated unit. Other procedures, utilize all the information available, but give more weight the closer a particular control is to the treated unit, all the controls get weight though.

local linear matching:

Implementation in STATA
psmatch2: psmatch2 d, pscore(px) out(y) options for matching (one-to-one, K nearest neighbors, kernel, radius, spline, llr, mahalanobis) attnd, atts, attnw, attk, attr nnmatch: nnmatch y d x1 (only nearest neighbor) pscore: alternative to estimate PS and check balance In stata, there are several algorithms implemented to do the matching. A popular one is psmatch2, wih features different algorithm procedures and a bootstrap for the estimation of the standard errors.

psmatch2: nnmatch: advantages: several possible matching algorithms
disadvantages: needs to use bootstrap to correctly estimate standard errors nnmatch: advantages: Exact estimation of the standard error disadvantages : Only matches using the nearest neighbour algorithm The nnmatch command is another useful tool, however limited to nearest neighbor matching.

Sample balance After matching we need to check if …
The average values of X are approximately equal in the treatment & control groups. This is supposed to verify the Conditional Independence Assumption. In STATA we have: pstest x1 x2 x3, t(d) ps(pscore) support(varname) The conditional independence assumption, or more commonly called “the balance”, is checked with the pstest command. This command can compare the imbalances before vs. After the matching, and the improvement is often dramatic.

Issues to have in mind Number of pairs: More pairs: Replacement:
↑ bias ↓ variance Replacement: with: ↑ variance ↓ bias without : Several other issue to have in mind: if you use more controls, you will decease the imprecision of the estimates but increase the bias since you will match to controls that are farthest way from the treated unit. You can choose to use sampling with replacement and without replacement; replacement will transate into less bias because a control can be the best one for mre than one treated unit; however this wil increase imprecision since using te same control several times will not introduce more information into the model.

Recomendations for a good PSM
Identify well all the mechanisms that might be affecting participation in the programme. Use the same data source to identify participants & non- participants. Try to identify a sample with a large enough sample of eligibles not participating in the programme. Try to not over-specify the PS model. Ironically a perfect participation prediction is not good! No common support: no matching! Guidelines for a good PSM are seen above. Of course, the main guide to the selection model is the conceptual framework. One cautionary note: if the propensity model is “too good” there will be no common support. This means that covariates that are relevant to the selection process but are not related to the outcome might have to be excluded from the the selection model.

Advantages of PSM Avoids the linearity assumption required by OLS or other regression models (no functional form is assumed). Easy to implement Robust This are the good things bout PSM, the frist point refers to the fact that “usual” regression models make implicit or explicit assumptions about the functional form of the relationshipe between the explanatory variables and the outcomes. With PSM, that is not an issue.

Disadvantages Does not deal with unobservables
Requires a common support: No common support, no matching Requires a lot of information to model the participation process Finally, the main disadvantages are the large sample required and the fact that unobservabes are not taken into account in the estimation process, which could lead to biased estimates, while at the same time giving a false sense of confidence.

Matching Procedures: Propensity Score

Similar presentations

Presentation on theme: "Matching Procedures: Propensity Score"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Matching Procedures: Propensity Score

Similar presentations

Presentation on theme: "Matching Procedures: Propensity Score"— Presentation transcript:

Similar presentations

About project

Feedback