Download presentation
Presentation is loading. Please wait.
1
Almost-Exact Matching for Causal Inference
Cynthia Rudin Associate Professor Departments of Computer Science, Electrical and Computer Engineering and Statistical Science Duke University I’m not talking about decision trees, I talk about them too much so I decided to change at the last minute to causal inference. When you’re working with observational data you ideally want to find identical twins in the treatment and control groups. But you can’t, because very few individuals are really alike in high dimensions. So we want to match them on as many relevant covariates as possible and that’s what my talk is about.
2
FLAME: Fast Large-scale Almost Matching Exactly
(with Sudeepa Roy, Alexander Volfovsky, Tianyu Wang, Awa Dieng, Yameng Liu)
3
Matching in Potential Outcomes Framework
n x p n x 1 n x 1 {0,1} X, Y, T observational data, SUTVA, strong ignorability I always say that causal inference is half supervised. For each treatment unit, you know their outcome, for each control unit, you know their outcome, but you never get the *treatment effect* for any individual. You never know what would happen if you gave the treatment to that control unit. Matching is a popular way to handle causal inference problems, because can be really interpretable. Causal Inference is ”half supervised” Matching is interpretable. “Identical” twins. Most matching methods don’t try to match exactly.
4
covariates: age, gender, heart conditions, blood pressure, toenail length, eyeball width, etc
treated patient Marietta [ F cm cm ] control patient 1 Lee Ann [ F cm cm ]
5
FLAME: Fast Large-scale Almost Matching Exactly
Goal: Match treatment and control units using as many important covariates as possible Handle large data sets Work fast BasicExactMatch subroutine uses an efficient database query:
6
FLAME: Fast Large-scale Almost Matching Exactly
Use a holdout training set to determine How important a variable is for prediction error Whether a subset of variables predicts sufficiently well
7
FLAME: Fast Large Almost Matching Exactly
Algorithm: Start with exact matching on all covariates. Find as many matched groups as possible (using BasicExactMatch). Eliminate the least important covariate Repeat, find as many matched groups as possible each time we eliminate a covariate. Match Quality on training set: MQ = -PredictionError + C*BalancingFactor Prediction error constraint: Always keep enough covariates to be able to predict the outcome well. Balance constraint: Do not ever eliminate too many points from either treatment or control groups.
9
Some (Insightful) Experiments
No ground truth so need simulated data U=5 (no noise) 20K units, 10K treatment, 10K control
10
Regression cannot handle model misspecification
11
Some (Insightful) Experiments
(no noise)
13
On the dataset I will show next
16
Collapsing FLAME Try to match each treatment unit to at least one control on as many variables as possible. Match on all variables. Temporarily remove variable 1, get matches, put it back. Temporarily remove variable 2, get matches, put it back. : Temporarily remove variables 1 AND 2, get matches, put them back. Temporarily remove variables 1 AND 3, get matches, put them back
17
Breaking the Cycle of Drugs and Crime in the United States
First covariate eliminated Alabama, Florida, and Washington Participants were chosen to receive screening shortly after arrest and participate in a drug intervention under supervision, control group from same population. Last covariate eliminated
18
CORELS Rule List for treatment effect:
Positive estimated treatment effect Negative estimated treatment effect Siong Thye Goh
19
Takeaway Most matching methods can't handle irrelevant variables.
FLAME leverages ideas from ML + databases scalable fast accurate passes sanity checks FLAME's code is here:
20
Thanks FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference (with Sudeepa Roy, Alexander Volfovsky, Tianyu Wang) Code:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.