Improving Overlap Farrokh Alemi, Ph.D.

Slides:



Advertisements
Similar presentations
Seven Deadly Sins of Dyadic Data Analysis David A. Kenny February 14, 2013.
Advertisements

Cross Sectional Designs
Random Assignment Experiments
Other Analysis of Variance Designs Chapter 15. Chapter Topics Basic Experimental Design Concepts  Defining Experimental Design  Controlling Nuisance.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Chance, bias and confounding
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Interactions in Regression.
Clustered or Multilevel Data
Experimental Group Designs
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Randomized Experiments STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Problems with the Design and Implementation of Randomized Experiments By Larry V. Hedges Northwestern University Presented at the 2009 IES Research Conference.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Adaptive randomization
Public Policy Analysis ECON 3386 Anant Nyshadham.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
F UNCTIONAL L IMITATIONS IN C ANCER S URVIVORS A MONG E LDERLY M EDICARE B ENEFICIARIES Prachi P. Chavan, MD, MPH Epidemiology PhD Student Xinhua Yu MD.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
Instructional Objectives:
The Effect of the 2016 Presidential Election on Humana Stock
Experiment Basics: Designs
Applied Biostatistics: Lecture 2
Antidepressant Use Among Working Age Canadians:
Data Collection Principles
Chapter 8: Inference for Proportions
Normal Distribution Farrokh Alemi Ph.D.
Stratified Covariate Balancing Using R
Log Linear Modeling of Independence
Probability Calculus Farrokh Alemi Ph.D.
SQL for Predicting from Likelihood Ratios
SQL for Calculating Likelihood Ratios
SQL for Cleaning Data Farrokh Alemi, Ph.D.
Cross Sectional Designs
Stratification Matters: Analysis of 3 Variables
Saturday, August 06, 2016 Farrokh Alemi, PhD.
Test of Independence in 3 Variables
Date Functions Farrokh Alemi, Ph.D.
Propagation Algorithm in Bayesian Networks
The European Statistical Training Programme (ESTP)
Matching Methods & Propensity Scores
Comparing two Rates Farrokh Alemi Ph.D.
Cursors Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi
Expectation And Variance of Random Variables
Wednesday, September 21, 2016 Farrokh Alemi, PhD.
Selecting the Right Predictors
The Aga Khan University
Benchmarking Clinicians using Data Balancing
Propagation Algorithm in Bayesian Networks
Product moment correlation
Salah Merad Methodology Division, ONS
Regression Analysis.
Step 5: Analysis of Causal Diary
Xbar Chart By Farrokh Alemi Ph.D
Expectation Farrokh Alemi Ph.D.
Expectation And Variance of Random Variables
Benchmarking Clinicians using Data Balancing
Time-between Control Chart for Exercise By Farrokh Alemi Ph.D
Chapter 13: Item nonresponse
Stratified Covariate Balancing Using R
Wednesday, October 05, 2016 Farrokh Alemi, PhD.
Categorical Data By Farrokh Alemi, Ph.D.
Presentation transcript:

Improving Overlap Farrokh Alemi, Ph.D. This presentation reviews ordinary regression. This brief presentation was organized by Dr. Alemi.

Overlap The problem with stratification is that as the number of covariates increases fewer and fewer cases and controls match. As the number of covariates increases, the number of cases per stratum decreases, and combinations of covariates become quite rare. In these circumstances, it is possible that a large portion of the cases may not have matching controls and therefore are not used, reducing the generalizability of the findings.

𝑐 𝑀 𝑐 𝑄 𝑐 𝑐 𝑀 𝑐 𝑄 𝑐 + 𝑐 (1−𝑀 𝑐 𝑄 𝑐 ) % Overlap= 𝑐 𝑀 𝑐 𝑄 𝑐 𝑐 𝑀 𝑐 𝑄 𝑐 + 𝑐 (1−𝑀 𝑐 𝑄 𝑐 ) % Percent overlap reports the percent of cases that are matched to controls. In this equation, c is an index to cases, and 𝑀 𝑐 is 1 when the case is matched to a control and 0 otherwise.

𝑐 𝑀 𝑐 𝑄 𝑐 𝑐 𝑀 𝑐 𝑄 𝑐 + 𝑐 (1−𝑀 𝑐 𝑄 𝑐 ) % Overlap= 𝑐 𝑀 𝑐 𝑄 𝑐 𝑐 𝑀 𝑐 𝑄 𝑐 + 𝑐 (1−𝑀 𝑐 𝑄 𝑐 ) % The parameter 𝑄 𝑐 indicates the percent of covariates in the case that were matched to the controls. If, in the case, all covariates were matched then 𝑄 𝑐 =100.

Unmatched Controls are Ignored Note that the percent of overlap does not depend on controls that were not matched. The intent of the analysis is to examine the effect of treated patients; thus what matters is matching to the cases; unmatched controls don’t affect the treatment effect and therefore can be ignored.

Better than 80% When the percent of overlap is low (e.g. lower than 80%), then findings cannot be generalized as many cases are not matched to controls.

Reduce Stratification & Increase Overlap The remainder of this lecture discusses how one can reduce stratification, increase overlap between cases and controls, and improve the generalizability of stratified covariate balancing

Parents in Markov Blanket Expected Values Parents in Markov Blanket Synthetic Controls 1 2 3 We discuss three strategies. The first is to modify the case so a partial match can be made. This approach calculates the outcome for the modified approach using expected values. The second approach is to use Parents in Markov Blanket of treatment to identify irrelevant covariates and dropping these covariates from the analysis. The third approach is to add in new synthetic controls.

1. Partial Match through Expected Values The first method we discuss has to alter the case and its outcomes to levels that can be matched to the controls. In this method, percent of overlap is increased by partially matching the covariates among the unmatched cases. The unmatched case is altered to the largest portion of the case that matches to at least one control. The matched covariates are referred to as the shared component. The outcome for the altered cases is set to the expected outcome for all cases that share the common component.

Partial Match: Expected Values Partial Match Case Male 70 Years Unable to walk Outcome Over Ages Control   Outcome Over Ages For example, suppose a male, 70 years old patient, with walking disability has no match among the controls. The closest match we can find are male patients with walking disabilities. Then the patients age is dropped from the analysis. The outcome for the new case is the average for all male disabled cases, which includes individuals in different age groups. This new case is matched to male patients unable to walk among the controls.

Partial Match: Expected Values Partial Match Case Male 70 Years Unable to walk Outcome Over Ages Control   Outcome Over Ages The outcome is changed to the expected values associated with the shared component. The percent of overlap is improved in these cases by the portion of covariates matched, in this case by 2 out of 3..

Parents in Markov Blanket 2. Partial Match through Parents in Markov Blanket The second method of improving overlap is to drop irrelevant variables through identifying parents in the Markov Blanket of Treatment

Parents in Markov Blanket Partial Match: Parents in Markov Blanket Markov Blanket of treatment is a set of covariates that block the effect of other covariates on treatment. The Markov Blanket include parents, children and co-parents, named for direct causes, effects, and direct causes of effects, respectively.

Parents in Markov Blanket Partial Match: Parents in Markov Blanket Parents in the Markov Blanket are identified by focusing the analysis on independent variables that occur prior to treatment. In electronic health records healthcare events are time stamped and it is relatively easy to identify what has occurred prior to the treatment. Obviously age, gender and demographics are set at birth so they occur prior to treatment. Medical history and comorbidities also occur prior to treatment.

Parents in Markov Blanket LASSO Regression: Parents in Markov Blanket Many algorithms for identifying the parents in Markov Blanket exist, here we focus on one of these algorithms that uses LASSO regression. LASSO regression is a type of regression that limits variables that have a statistically significant impact to those that have a large effect size.

Parents in Markov Blanket LASSO Regression: Parents in Markov Blanket 𝑇 𝑖 = 𝛽 0 + 𝛽 1 𝑋 1𝑖 + 𝛽 2 𝑋 2𝑖 +…+ 𝛽 𝑟 𝑋 𝑟𝑖 + 𝑒 𝑖 Prior to conducting the regression, we exclude covariates that occur after treatment. This steps remove covariates in the causal path from treatment to outcome. For example, complications of treatment are excluded from the list of independent variables.

Regress Treatment on Prior Events LASSO Regression: Parents in Markov Blanket Regress Treatment on Prior Events Next, the treatment variable is regressed on independent variables that occur before the treatment, for example, patient demographics, medical history, or comorbidities.

Significant & Large Main Effects LASSO Regression: Parents in Markov Blanket Significant & Large Main Effects Parents in the Markov Blanket consist of covariates that (a) have a statistically significant impact on the outcome and (b) have an effect size greater than a pre-set cutoff value. Stratification focuses on parents in Markov Blanket and ignores other covariates. The procedure allows for matching to relevant covariates and ignoring irrelevant variables. It has been shown to reduce the number of covariates by 3 to 4000 folds depending on the number of variables in the initial data.

3. Add Synthetic Controls The third method of improving overlap is to add synthetic controls. The approach is similar in idea to oversampling rare events. It tries to add in from the existing sample what might have occurred for missing controls.

Create Model of Outcomes for Control Patients Add Synthetic Cases Create Model of Outcomes for Control Patients The analysis is not done on all of the data. It focuses only on control patients, because the missing controls must reflect the pattern of outcomes among the controls. Missing outcome for the control case is estimated from a model of the data, usually using regression or 2 nearest cases. In these models it is important to take into account interactions among the covariates

Features of Unmatched Case Add Synthetic Cases Predict Outcome for Features of Unmatched Case The model is used to predict the outcome for the missing control by evaluating the model at covariates in the unmatched cases.

Add Synthetic Cases Survival=−1+ 1−.8 Male 1−.4 Unable to Walk 1−.2 Unable to bathe 1−.9 Unable to Toilet 1−.3 above 74 years old . For example, suppose that a male, 70-year old, resident who is unable to walk is an unmatched case. We need to look for a control that would match this case. Also assume that for control patients the outcome, survival rate, is predicted by the equation shown here. Now that the predicted outcome is available, the case can be added to controls and analysis repeated with the synthetic controls matching to previously unmatched cases.

Several methods exist for improving overlap between cases and Controls Take Home Message is that different methods exist for improving match between cases and controls.