Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California,

Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California, Berkeley CIMPOD, Washington DC,, Feb 25 th -26 th, 2016

1.Causal Model 3. Data 2. Question Statistical Model 4. Identified? 5. Estimand Convenience assumptions 6. Estimator 7. Interpretation Example: The Roadmap in Action Y N

Ex: Impact of a Prevention Intervention on HIV Incidence (Simulated Data) 100 communities- differ in HIV risk factors HIV prevalence Circumcision prevalence Trading center present Prevention package non-randomly assigned Community level outcome: 3 year HIV Incidence

The Roadmap in Action 1.Causal model C: Baseline HIV Risk Factors HIV prevalence Circumcision prevalence Trading center present I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U

Pearl: Structural Equations Liberated! 1.Causal model: Same thing written as non- parametric structural equations (Pearl) C: Baseline HIV Risk Factors I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U

The Roadmap in Action 2. Causal Question C: Baseline HIV Risk Factors HIV prevalence Circumcision prevalence Trading center present I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U

The Roadmap in Action 2. Causal Question C: Baseline HIV Risk Factors HIV prevalence Circumcision prevalence Trading center present I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U Prevention Package(I=1) Y I=1 : Counterfactual Cumulative Incidence with Intervention U

The Roadmap in Action 2. Causal Question C: Baseline HIV Risk Factors HIV prevalence Circumcision prevalence Trading center present I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U Prevention Package(I=0) U Y I=0 : Counterfactual Cumulative Incidence without Intervention

The Roadmap in Action 2. Causal Question Target Causal Parameter: Average treatment effect Difference between average counterfactual 3 year HIV incidence if all communities had received the prevention package versus all communities had not received the prevention package E(Y I=1 )-E(Y I=0 )

The Roadmap in Action 3. Observed Data 100 randomly sampled communities On each we measure: C: Baseline confounders I: receipt of the prevention package Y: 3 year cumulative incidence Observe 100 independent and identically distributed copies of O=(C,I,Y)

The Roadmap in Action 4. Identification Do we know enough to translate our causal question to a statistical question? C: Baseline HIV Risk Factors I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U

The Roadmap in Action 4. Identification Do we know enough to translate our causal question to a statistical question? C: Baseline HIV Risk Factors I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U NO

The Roadmap in Action 4. Identification: Convenience Assumptions Under what additional assumptions can we translate our causal question to a statistical question? C: Baseline HIV Risk Factors I: HIV Prevention Package Y: 3 Year HIV Cumulative Incidence U U U No unmeasured confounding

The Roadmap in Action 5. Statistical Model and Estimand 1.Statistical model – Absent any other knowledge, observed data O=(C,I,Y) might have any distribution – Non-parametric statistical model 2.Statistical quantity to estimate (estimand) – Under our causal model + assumptions, average treatment effect = observed difference in mean outcome within confounder strata, standardized to distribution of confounders

1.Causal Model 3. Data 2. Question Statistical Model 4. Identified? 5. Estimand Convenience assumptions 6. Estimator 7. Interpretation The Roadmap in Action Y N Causal Reasoning=Science!

1.Causal Model 3. Data 2. Question Statistical Model 4. Identified? 5. Estimand Convenience assumptions 6. Estimator 7. Interpretation Causality, Statistics, and Science Y N Statistics=Science!

The Roadmap in Action 6. Estimation Choosing an estimator is a statistical problem – For a given model and estimand, many choices – One estimator is not “more causal” than another Estimators do have important differences in their statistical properties – Even for point treatment settings All methods are NOT created equal – Simpler/more familiar is NOT necessarily better! Complexity can be used to improve science, not just for intimidation – Statistics is not Cooking! As scientists you should care about getting your statistics right

What do we want from an estimator? Low Bias- on average, close to the truth Low Variance- precise Reliable Inference- signal vs. noise – Confidence interval coverage – Type I error control Truth: RR=1.5 95% CI contains 1.5? YYYYYYYYYYYYNYYYYYYY YYYYNYYYYYYYYYYNYYYY YYYYYYYYYNYYYYYYYYYY YYYYYYYYYYYYYYYYYYYY YYYNYYYYYYYYYYYYYYYY Truth: RR=1.0 Reject null (p<0.05)? NNNYNNNNNNNNNNNNNNNN NNNNNNNNNNNNYNNNNNNN NNNNNNYNNNNNNNNNNYNN NNNNNNNNNNNNNNNNNNNN NNNNNNNNNNYNNNNNNNNN Repetitions of experiment

What’s wrong with standard parametric regression approaches? No single regression coefficient = estimand – Ex: Time-dependent confounding – Ex. Marginal treatment effect if true outcome regression is non linear (eg logistic regression) We don’t know enough to specify them correctly – Performance depends on respecting the limits of our knowledge! – Misspecification-> bias and misleading inference Wrong answers and wrong conclusions Increasing sample size makes things worse

Regression Misspecification-> Bias and Wrong Inference Performance metricCorrectly Specified regression Mis-Specified regression (linear main terms) Intervention reduces HIV incidence 2.6% Bias (Mean estimate-Truth) <0.01%-1.7% Variance0.003%0.02% 95% CI Coverage95%68.2% Intervention has no effect Type I Error Control (% experiments where conclude an effect) 5%33.2%

SEARCH HIV Prevention Trial – www.searchendaids.com – 89% baseline population testing coverage Determinants of baseline HIV testing uptake? – Without causal assumptions: adjusted predictors Many covariates: Region, age, gender, occupation, marital, education, wealth, mobility Parametric regression… how to specify? – Logistic? Poisson? Which variables? Which interactions? Chamie et al, Lancet HIV, 2016

Why not just choose based on the data? Don’t make any a priori assumptions, just choose the estimator that works best with your data Good idea That is just what you should do but… Be Careful Statistical inference relies on having a well-defined experiment An estimator is an algorithm (ie a computer program)

Dangers of ad hoc analytic decisions Run a bunch of regressions and choose the one with 1.Smallest p values? 2.Results that make the most sense?  Misleading (under) estimate of uncertainty – Not accounting for model selection process  Bias – Humans are good at creating narratives from complexity – Tendency to confirm what we expect to find

Herd of p-values spotted approaching significance! “It was amazing! The α-male, a majestic 0.06, was seen slowly but surely approaching significance, followed closely by a small group of marginal p-values…After seeing p-values approaching significance, what we really want to observe is p-values retreating from significance. But that kind of behavior has never been reported” Collectively Unconscious; Nov 3, 2012

Dangers of ad hoc analytic decisions Under-estimate of uncertainty and bias As long there is “art” in statistics, we will continue to make a lot of wrong inferences Truth: RR=1.5 95% CI contains 1.5? YYYNNNNNYYYYNYYYNNNY YYYYNYYYYYYYYYYNYYYY YYNNYYYYYNYYYYYYYYYY YYYYYYYYYYYYYNNNNNYY YYYNYYYNNNNNNNYYYYYY Truth: RR=1.0 Reject null (p<0.05)? NNNYNNNNNNNNYYYYYNNNN NNNYYYYYYNNNNYNNNNNNN NNNNNNYNNYNNNNNNNYNNY NYYYYYNNNNNNNYYYYYNNN NNYYYYNNNNYNNNNNNNNNN Repetitions of experiment

Does this mean we must abandon flexibility? Rigidly pre-specify a single parametric regression model and stick to it, even if the data tell us it makes no sense?  More bad answers/misleading inference To make good decisions we must learn from our data in a flexible way – More true than ever in the Big Data Era However, we must do so in a way that preserves our ability to draw valid inferences

Use the data to choose… But in a rigorous (supervised) way 1.Pre- specify candidate estimators – Ex: Different parametric regression models – Ex: Machine learning methods Neural networks, Random forests, LASSO, etc…. 2.Pre-specify rigorous, automated way to choose between them With these ingredients, our estimator includes the selection process

Super Learning “Competition” of algorithms – Parametric models – Data-adaptive (ex. Random forest, Neural nets) Best “team” wins – Convex combination of algorithms Performance judged on independent data – V-fold cross validation (Internal data splits) van der Laan, Polley, 2007

V-fold Cross Validation

Super Learner- Using high dimensional electronic adherence data (MEMS caps) to build an optimal predictor of virologic failure Petersen et al, JAIDS 2015

Super Learner- Improves on alternative machine learning algorithms Pirracchio et al, Lancet Resp Med 2015

Problem Solved? Not yet…Optimization for the wrong target Super Learner (and other machine learning methods) aim to do a good job at prediction/classification However, if used in isolation don’t let us make reliable inferences about causally motivated parameters Not targeting the question of interest  Too much bias  Misleading confidence intervals/hypothesis tests

Targeted Maximum Likelihood Estimation General statistical methodology – For a range of causally and non-causally motivated statistical quantities – Uses state-of-the art machine learning (Super Learner) – Updates output in a targeted way Reduce bias Regain statistical properties for reliable inference Efficient (minimal asymptotic variance) – If nuisance parameters estimated well Often nice robustness properties van der Laan, Rose, Springer 2011

TMLE for Average Treatment Effect 1.Use Super Learner to flexibly estimate the outcome regression: E(Y|I,C) – Ex: Conditional expectation of outcome (HIV incidence) given prevention package and covariates (HIV and circumcision prevalence, trading center) 2.Use Super Learner to flexibly estimate the propensity score: P(I=1|C) – Ex: Probability of receiving the exposure (HIV prevention package) given covariates van der Laan, Rose, Springer 2011

TMLE for our Simulated Example 3.Update the initial outcome regression fit – Fit a standard logistic regression of the outcome on a single covariate Initial outcome regression fit (step 1) is used as offset Single “clever” covariate is a function of the inverse propensity score – MLE of the coefficient on the “clever” covariate used to update the initial outcome regression – Update results in targeted bias reduction for estimand van der Laan, Rose, Springer 2011

TMLE for our Simulated Example 4.Using the updated outcome regression fit, generate a predicted outcome for each observation with and without the intervention – Ex: Predict HIV incidence for each community with the prevention intervention and without it 5.Take the difference of these predicted outcomes and average van der Laan, Rose, Springer 2011

TMLE is Double Robust and Efficient Double Robust: If either outcome regression or propensity score estimated consistently TMLE is consistent – Converges to truth as sample size -> ∞ – Two chances to get it right – In practice: meaningful reduction in bias Efficient: If both outcome regression and propensity score estimated consistently at reasonable rates, TMLE has lowest possible variance – Among reasonable estimators as sample size -> ∞ – In practice: meaningful reduction in variance

Double Robustness: Simulated Example TMLE Outcome Regression IPTW Outcome Regression and Propensity Score Correct

Double Robustness: Simulated Example Outcome Regression Misspecified Propensity Score Misspecified TMLE Outcome Regression IPTW

Does it matter in practice? Not always, but sometimes – Estimates from standard approach and TMLE sometimes very similar – But sometimes, estimates and inference can change Example: HIV testing uptake in SEARCH Trial – Goal: estimate the relative risk of not testing, adjusting for other covariates 1.Poisson regression 2.TMLE

Ex. HIV Testing Uptake in SEARCH TMLE: RR: 0.84 (95% CI 0.80,0.89) Adults with a primary education more likely to test than those with less than a primary education Poisson: RR: 0.99 (95% CI 0.94, 1.05) No difference

TMLE: Beyond simple single time point… TMLE is a general method; broad applications – Longitudinal problems with time-dependent confounding – Parameters of (longitudinal) marginal structural models – Dynamic regimes (personalized treatment/adaptive strategies) – Informative censoring – RCTs (including SMART designs) for improved efficiency Estimands, estimators and implementation differ R packages implementing all of the above are available (ltmle, tmle, SuperLearner)

Summary Super Learner – Method to combine and thereby improve on existing machine learning algorithms and parametric regressions for prediction/classification problems TMLE – One of several double robust estimators (Ex: A-IPW) DR estimators share a number of attractive properties – Efficient (under assumptions) – Can reduce bias and variance – Naturally integrate machine learning (Super Learner) for outcome regression and propensity score to improve performance

These are just technical details, don’t worry about it I am worried: I know that a poor choice of estimator -> incorrect inferences and lower power-> bad science PS: Should applied researchers care about statistics, or just focus on the causal piece?

This is too hard, you won’t understand it Try me! I want to understand the properties and the tradeoffs Try me! I want to understand the properties and the tradeoffs PS: Should applied researchers care about statistics, or just focus on the causal piece?

Trust me Work with me PS: Should applied researchers care about statistics, or just focus on the causal piece?

Additional Resources R Code for simulated example: – works.bepress.com/maya_petersen/71/ works.bepress.com/maya_petersen/71/ R packages ltmle, tmle and SuperLearner – cran.r-project.org/web/packages/ cran.r-project.org/web/packages/ Course with introductory materials on these methods (Lectures, R Labs, Assignments) – www.ucbbiostat.com www.ucbbiostat.com

PIs: Diane Havlir, Moses Kamya Maya Petersen Statistician:Laura Balzer Vice-Chair: Edwin Charlebois Virologist: Teri Liegler KEMRI: Elizabeth Bukusi KEMRI:/UCSF: Craig Cohen UCSF: Tamara Clark Gabe Chamie Vivek Jain Carol Camlin Starley Shade UC Berkeley:Mark van der Laan Wenjing Zheng David Bangsberg (MGH) The MACH-14 Collaboration Mark van der Laan Joshua Schwab \ Laura Balzer

Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California,

Similar presentations

Presentation on theme: "Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California,

Similar presentations

Presentation on theme: "Super Learning & Targeted Maximum Likelihood Estimation Maya Petersen MD, PhD Div. of Biostatistics, School of Public Health, University of California,"— Presentation transcript:

Similar presentations

About project

Feedback