Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Lecture 8: Hypothesis Testing
Mark Pletcher 6/9/2011 Prognostic and Genetic Tests.
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
1 Propensity Scores Methodology for Receiver Operating Characteristic (ROC) Analysis. Marina Kondratovich, Ph.D. U.S. Food and Drug Administration, Center.
OPTN Modifications to Heart Allocation Policy Implemented July 12, 2006 Changed the allocation order for medically urgent (Status 1A and 1B) patients Policy.
St Marys Hospital Ingrid V. Bassett, MD, MPH Massachusetts General Hospital Harvard Medical School May 25, 2010 Who Starts ART in Durban, South Africa?
HEART-LUNG TRANSPLANTATION Overall 2010 ISHLT J Heart Lung Transplant Oct; 29 (10):
Regulation of Consumer Tests in California AAAS Meeting June 1-2, 2009 Beatrice OKeefe Acting Chief, Laboratory Field Services California Department of.
Measurements and Their Uncertainty 3.1
CALENDAR.
TOP2A IS AN INDEPENDENT PREDICTOR OF SURVIVAL IN UNSELECTED BREAST CANCER Amit Pancholi Molecular Profiling of Breast Cancer: Predictive Markers of Long.
Detection of Insincere Grips: Multivariate Analysis Approach Dr Bhoomiah Dasari University of Southampton Southampton SO17 1BJ United Kingdom
The 5S numbers game..
Measures of disease frequency (II). Calculation of incidence Strategy #2 ANALYSIS BASED ON PERSON-TIME CALCULATION OF PERSON-TIME AND INCIDENCE RATES.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Break Time Remaining 10:00.
The basics for simulations
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Oil & Gas Final Sample Analysis April 27, Background Information TXU ED provided a list of ESI IDs with SIC codes indicating Oil & Gas (8,583)
Regression with Panel Data
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Survival Analysis In many medical studies, the primary endpoint is time until an event occurs (e.g. death, remission) Data are typically subject to censoring.
Chapter 10 Estimating Means and Proportions
Lecture 3 Validity of screening and diagnostic tests
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
When you see… Find the zeros You think….
CMR of Non-ischemic Dilated and Restrictive Cardiomyopathies
Overview of Genevestigator
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Clinical Trial Results. org Valvular Heart Disease and the Use of Dopamine Agonists for Parkinson’s Disease Renzo Zanettini, M.D.; Angelo Antonini, M.D.;
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
7/16/08 1 New Mexico’s Indicator-based Information System for Public Health Data (NM-IBIS) Community Health Assessment Training July 16, 2008.
: 3 00.
5 minutes.
Static Equilibrium; Elasticity and Fracture
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
How do we delay disease progress once it has started?
Select a time to count down from the clock above
16. Mean Square Estimation
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Principles of Genetic Epidemiology Kirsten Ohm Kyvik.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Multiple Choice Questions for discussion
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
BIOSTATISTICS Lecture 2. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and creating methods.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Bootstrap and Model Validation
Mamounas EP et al. Proc SABCS 2012;Abstract S1-10.
Björn Bornkamp, Georgina Bermann
Presentation transcript:

Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur, L.J. Wei (Harvard University)

Outline Background and motivation Developing and evaluating prediction rules based on a set of markers for Continuous or binary outcome Censored event time outcome Evaluating the incremental value of a biomarker over the entire population various sub-populations Incorporating the patient level precision of the prediction Prediction intervals/sets Remarks

Background and Motivation Diagnosis Prognosis Treatment Personalized medicine: using information about a person’s biological and genetic make up to tailor strategies for the prevention, detection and treatment of disease Important step: develop prediction rules that can accurately predict health outcome or diagnosis of clinical phenotype

Background and Motivation Subject Characteristics Biomarkers Genetic Markers Predictor Z Outcome Y Disease status Time to event Treatment Response Accurate prediction of disease outcome and treatment response, however, are complex and difficult tasks. Developing prediction rules involve Identifying important predictors Evaluating the accuracy of the prediction Evaluating the incremental value of new markers

Background and Motivation AIDS Clinical Trial : ACTG320 Study objective: to compare 3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir 2-drug regimen (n=577): Zidovudine + Lamivudine Identify biomarkers for predicting treatment response How well can we predict the treatment response? Is RNA needed? Age, CD4 week 0,  CD4 week 8 RNA week 0,  RNA week 8 Predictor Z  CD4 week 24 Outcome Y ?

Background and Motivation  CD4 week 24 Predictors Association Coefficients for RNA significant? Is RNA needed? Regression Analysis:

Background and Motivation AIDS Clinical Trial AgeRNA week 0  RNA week 8 CD4 week 0  CD4 week 8 Estimate SE Pvalue Regression Coefficient Coefficient for  RNA week 8 highly significant  RNA needed for a more precise prediction of responses??

Background and Motivation Y =  CD4 week 8 Z=Predictors Is RNA needed? Does adding RNA improve the prediction? prediction procedure 1.Prediction rule: based on regression models 2.The distance between and Y?

Developing Prediction Rules Based on a Set of Markers Regression approach to approximate Y | Z Continuous or binary outcome: Generalize linear regression Survival outcome: Proportional Hazards model Time-specific prediction models Regression modeling as a vehicle: the procedure has to be valid when the imposed statistical model is not the true model!

Developing and Evaluating Prediction Rules Predict Y with Z based on the prediction model Evaluate the performance of the prediction by the average “distance” between and Y The utility or cost to predicting Y as is The average “distance” is Examples: Absolute prediction error: Total “Cost” of Risk Stratification: d 01 d 02 d 03 d 11 d 31 Y = 0 Y = 1

Evaluating and Comparing Prediction Rules The performance of the prediction model/rule with can be estimated by Prediction Model/Rule Comparison: Prediction with E(Y | Z) = g 1 (a’Z) vs E(Y | W) = g 2 (b’W) Compare two models/rules by comparing and

Variability in the prediction errors: Estimate  = 50, SE = 1? SE = 50? Inference about D and  = D 1 – D 2 Confidence intervals based on large sample approximations to the distribution of Variability in the Estimated Prediction Performance Measures

Bias Correction Bias issue in the apparent error type estimators Bias correction via Cross-validation: Data partition  T k, V k For each partition Obtain based on observations in T k Obtain based on observations in V k Obtain cross-validated estimator and have the same limiting distribution

Example: AIDS Clinical Trial Objective: identify biomarkers to predict the treatment response Outcome: Y =  CD4 week 24 Predictors Z: Age, CD4 week 0,  CD4 week 8, RNA week 0,  RNA week 8  ’Z Working Model: E(Y|Z) =  ’Z

Example: AIDS Clinical Trial Incremental Value of RNA Full Model w/o RNA Apparent51 (2.7*)52 (2.7) 10-fold CV5253 2n/3 CV53 Apparent[46, 56][47, 57] 10-fold CV[47, 57][48, 58] 2n/3 CV[48, 58] Gain Due to RNA -0.61(0.61) [-2.0, 0.4] [-1.5, 0.9] * : Std Error Estimates Estimates 95% C.I.

Incremental Value of RNA within Various Sub-populations

Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM) Prognostic importance of the left ventricular dysfunction –Thune et al (2005) : Diamond study –Trace study (Kober et al 2005, NEJM) Designed to determine whether patients w/ left ventricular dysfunction soon after myocardial infarction benefit from long- term oral ACE inhibition Between 1990 and 1992, a total of 6676 patients with myocardial infarction were screened with echocardiography A total of 5921 subjects had available data

Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM) Routine Markers include: –Age –creatine (CRE) –occurrence of heart failure (CHF) –history of diabetes (DIA), –history of hypertension (HYP), –cardiogenic shock after MI (KS) We are interested in evaluating in the incremental value of wall motion index (WMI)

AgeCRECHFDIAHYPKSWMI Est SE Pvalue Does WMI improve the prediction of 5-year survival? Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)

OME Routine Markers w/o WMI0.28 Markers Including WMI0.26 Population Gain Attributed to WMI0.02 Population Average Incremental Value of WMI Predicting 5-year Survival 5-year mortality rate = 42%

D1D1 D2D2

Gain Due to WMI

 = 1  = 4  = 9 Gain Due to WMI with respect to D 

Example Breast Cancer Gene Expression Study Objective: construct a new classifier that can accurately predict future disease outcome van’t Veer et al (2002) established a classifier based on a 70-gene profile good- or poor-prognosis signature based on their correlation with the previously determined average profile in tumors from patients with good prognosis Classify subjects as  Good prognosis if Gene score > cut-off  Poor prognosis if Gene score < cut-off van de Vijver et al (2002) evaluated the accuracy of this classifier by using hazard ratios and signature specific Kaplan Meier curves

Example Breast Cancer Gene Expression Study Data consist of 295 Subjects Outcome T: time to death Predictors: Lymph-Node Status, Estrogen Receptor Status, gene score We are interested in Constructing prediction rules for identify subjects who would survive t-year, Y = I(T  t)=1. Evaluating the incremental value of the Gene Score.

Model Apparent Error Naïve0.30 (0.031) Clinical only0.28 (0.033) Clinical +Gene Score0.25 (0.036) Van de Vijver0.35 (0.050) 10-fold CV Random CV Example: Breast Cancer Data Predicting 10-year Survival

Evaluating the Prediction Rule Based on Various Accuracy Measures For a future patient with T 0 and Z 0, we predict Classification accuracy measures Sensitivity Specificity Prediction accuracy measures

 Naïve o Clinical  Clinical + Gene  van de Vijver Example: Breast Cancer Data Predicting 10-year Survival

Example: Breast Cancer Data To compare Model II: g(a + Node + ER) Model III: g(a + Node + ER + Gene) Choosing cut-off values for each model to achieve SE = 69% which is an attainable value for Model II, then Model II  SP = 0.45, PPV = 0.35, NPV = 0.77 Model III  SP = 0.75, PPV = 0.54, NPV = % CI for the difference in SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19]

Prediction Interval Accounting for the Precision of the Prediction Based on a prediction model predict the response summarize the corresponding population average accuracy What if the population average accuracy of 70% is not satisfactory? How to achieve 90% accuracy? What if can predict Y 0 more precisely for certain Z 0, while on the other hand fails to predict Y 0 accurately? Account for the precision of the prediction? Identify patients would need further assessment?

Predicted Risk = 0.04Predicted Risk = 0.51 Classic Rule: Risk of Death < 0.50  Survivor {Y=0} Risk of Death ≥ 0.50  Non-survivor {Y=1} {1} {0}

Prediction Interval To account for patient-level prediction error, one may instead predict such that The optimal interval for the population with Z 0  is : estimated conditional density function

Example: Breast Cancer Study Data: 295 patients Response: 10 year survival Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene Score Model Possible prediction sets: {  }, {0}, {1}, {0,1} Classic prediction: considers {0}, {1} only.

Predicted Risk = 0.51 Predicted Risk = % Prediction Set: {0,1}90% Prediction Set: {0}

Example: Breast Cancer Study Prediction Sets Based on Clinical + Gene Score (0%) (63%) (37%) 4% 39% 57%

Proper choice of the accuracy/cost measure Classification accuracy vs predictive values Utility function: what is the consequence of predicting a subject with outcome Y as With an expensive or invasive marker Should it be applied to the entire population? Is it helpful for a certain sub-population? Should the cost of the marker be considered when evaluating its value? Remarks