Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,

Similar presentations


Presentation on theme: "Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,"— Presentation transcript:

1 Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur, L.J. Wei (Harvard University)

2 Outline Background and motivation Developing and evaluating prediction rules based on a set of markers for Continuous or binary outcome Censored event time outcome Evaluating the incremental value of a biomarker over the entire population various sub-populations Incorporating the patient level precision of the prediction Prediction intervals/sets Remarks

3 Background and Motivation Diagnosis Prognosis Treatment Personalized medicine: using information about a person’s biological and genetic make up to tailor strategies for the prevention, detection and treatment of disease Important step: develop prediction rules that can accurately predict health outcome or diagnosis of clinical phenotype

4 Background and Motivation Subject Characteristics Biomarkers Genetic Markers Predictor Z Outcome Y Disease status Time to event Treatment Response Accurate prediction of disease outcome and treatment response, however, are complex and difficult tasks. Developing prediction rules involve Identifying important predictors Evaluating the accuracy of the prediction Evaluating the incremental value of new markers

5 Background and Motivation AIDS Clinical Trial : ACTG320 Study objective: to compare 3-drug regimen (n=579): Zidovudine + Lamivudine + Indinarvir 2-drug regimen (n=577): Zidovudine + Lamivudine Identify biomarkers for predicting treatment response How well can we predict the treatment response? Is RNA needed? Age, CD4 week 0,  CD4 week 8 RNA week 0,  RNA week 8 Predictor Z  CD4 week 24 Outcome Y ?

6 Background and Motivation  CD4 week 24 Predictors Association Coefficients for RNA significant? Is RNA needed? Regression Analysis:

7 Background and Motivation AIDS Clinical Trial AgeRNA week 0  RNA week 8 CD4 week 0  CD4 week 8 Estimate-0.550.08-12.060.030.68 SE0.355.532.800.070.10 Pvalue0.120.990.000.720.00 Regression Coefficient Coefficient for  RNA week 8 highly significant  RNA needed for a more precise prediction of responses??

8 Background and Motivation Y =  CD4 week 8 Z=Predictors Is RNA needed? Does adding RNA improve the prediction? prediction procedure 1.Prediction rule: based on regression models 2.The distance between and Y?

9 Developing Prediction Rules Based on a Set of Markers Regression approach to approximate Y | Z Continuous or binary outcome: Generalize linear regression Survival outcome: Proportional Hazards model Time-specific prediction models Regression modeling as a vehicle: the procedure has to be valid when the imposed statistical model is not the true model!

10 Developing and Evaluating Prediction Rules Predict Y with Z based on the prediction model Evaluate the performance of the prediction by the average “distance” between and Y The utility or cost to predicting Y as is The average “distance” is Examples: Absolute prediction error: Total “Cost” of Risk Stratification: d 01 d 02 d 03 d 11 d 31 Y = 0 Y = 1

11 Evaluating and Comparing Prediction Rules The performance of the prediction model/rule with can be estimated by Prediction Model/Rule Comparison: Prediction with E(Y | Z) = g 1 (a’Z) vs E(Y | W) = g 2 (b’W) Compare two models/rules by comparing and

12 Variability in the prediction errors: Estimate  = 50, SE = 1? SE = 50? Inference about D and  = D 1 – D 2 Confidence intervals based on large sample approximations to the distribution of Variability in the Estimated Prediction Performance Measures

13 Bias Correction Bias issue in the apparent error type estimators Bias correction via Cross-validation: Data partition  T k, V k For each partition Obtain based on observations in T k Obtain based on observations in V k Obtain cross-validated estimator and have the same limiting distribution

14 Example: AIDS Clinical Trial Objective: identify biomarkers to predict the treatment response Outcome: Y =  CD4 week 24 Predictors Z: Age, CD4 week 0,  CD4 week 8, RNA week 0,  RNA week 8  ’Z Working Model: E(Y|Z) =  ’Z

15 Example: AIDS Clinical Trial Incremental Value of RNA Full Model w/o RNA Apparent51 (2.7*)52 (2.7) 10-fold CV5253 2n/3 CV53 Apparent[46, 56][47, 57] 10-fold CV[47, 57][48, 58] 2n/3 CV[48, 58] Gain Due to RNA -0.61(0.61) -0.64 -0.28 [-2.0, 0.4] [-1.5, 0.9] * : Std Error Estimates Estimates 95% C.I.

16 Incremental Value of RNA within Various Sub-populations

17 Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM) Prognostic importance of the left ventricular dysfunction –Thune et al (2005) : Diamond study –Trace study (Kober et al 2005, NEJM) Designed to determine whether patients w/ left ventricular dysfunction soon after myocardial infarction benefit from long- term oral ACE inhibition Between 1990 and 1992, a total of 6676 patients with myocardial infarction were screened with echocardiography A total of 5921 subjects had available data

18 Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM) Routine Markers include: –Age –creatine (CRE) –occurrence of heart failure (CHF) –history of diabetes (DIA), –history of hypertension (HYP), –cardiogenic shock after MI (KS) We are interested in evaluating in the incremental value of wall motion index (WMI)

19 AgeCRECHFDIAHYPKSWMI Est.055-.010.759.718.1871.153-1.097 SE.004.002.067.101.073.163.083 Pvalue.000.010.000 Does WMI improve the prediction of 5-year survival? Trandolapril Cardiac Evaluation Study (Kober et al 2005, NEJM)

20 OME Routine Markers w/o WMI0.28 Markers Including WMI0.26 Population Gain Attributed to WMI0.02 Population Average Incremental Value of WMI Predicting 5-year Survival 5-year mortality rate = 42%

21 D1D1 D2D2

22

23 Gain Due to WMI

24  = 1  = 4  = 9 Gain Due to WMI with respect to D 

25 Example Breast Cancer Gene Expression Study Objective: construct a new classifier that can accurately predict future disease outcome van’t Veer et al (2002) established a classifier based on a 70-gene profile good- or poor-prognosis signature based on their correlation with the previously determined average profile in tumors from patients with good prognosis Classify subjects as  Good prognosis if Gene score > cut-off  Poor prognosis if Gene score < cut-off van de Vijver et al (2002) evaluated the accuracy of this classifier by using hazard ratios and signature specific Kaplan Meier curves

26 Example Breast Cancer Gene Expression Study Data consist of 295 Subjects Outcome T: time to death Predictors: Lymph-Node Status, Estrogen Receptor Status, gene score We are interested in Constructing prediction rules for identify subjects who would survive t-year, Y = I(T  t)=1. Evaluating the incremental value of the Gene Score.

27 Model Apparent Error Naïve0.30 (0.031) Clinical only0.28 (0.033) Clinical +Gene Score0.25 (0.036) Van de Vijver0.35 (0.050) 10-fold CV Random CV 0.290.30 0.28 0.270.28 Example: Breast Cancer Data Predicting 10-year Survival

28 Evaluating the Prediction Rule Based on Various Accuracy Measures For a future patient with T 0 and Z 0, we predict Classification accuracy measures Sensitivity Specificity Prediction accuracy measures

29  Naïve o Clinical  Clinical + Gene  van de Vijver Example: Breast Cancer Data Predicting 10-year Survival

30 Example: Breast Cancer Data To compare Model II: g(a + Node + ER) Model III: g(a + Node + ER + Gene) Choosing cut-off values for each model to achieve SE = 69% which is an attainable value for Model II, then Model II  SP = 0.45, PPV = 0.35, NPV = 0.77 Model III  SP = 0.75, PPV = 0.54, NPV = 0.85 95% CI for the difference in SP: [0.11, 0.45], PPV: [0.01, 0.24], NPV: [0.06, 0.19]

31 Prediction Interval Accounting for the Precision of the Prediction Based on a prediction model predict the response summarize the corresponding population average accuracy What if the population average accuracy of 70% is not satisfactory? How to achieve 90% accuracy? What if can predict Y 0 more precisely for certain Z 0, while on the other hand fails to predict Y 0 accurately? Account for the precision of the prediction? Identify patients would need further assessment?

32 Predicted Risk = 0.04Predicted Risk = 0.51 Classic Rule: Risk of Death < 0.50  Survivor {Y=0} Risk of Death ≥ 0.50  Non-survivor {Y=1} {1} {0}

33 Prediction Interval To account for patient-level prediction error, one may instead predict such that The optimal interval for the population with Z 0  is : estimated conditional density function

34 Example: Breast Cancer Study Data: 295 patients Response: 10 year survival Predictors: Lymph-Node Status, Estrogen Receptor Status, Gene Score Model Possible prediction sets: {  }, {0}, {1}, {0,1} Classic prediction: considers {0}, {1} only.

35 Predicted Risk = 0.51 Predicted Risk = 0.04 90% Prediction Set: {0,1}90% Prediction Set: {0}

36 Example: Breast Cancer Study Prediction Sets Based on Clinical + Gene Score (0%) (63%) (37%) 4% 39% 57%

37 Proper choice of the accuracy/cost measure Classification accuracy vs predictive values Utility function: what is the consequence of predicting a subject with outcome Y as With an expensive or invasive marker Should it be applied to the entire population? Is it helpful for a certain sub-population? Should the cost of the marker be considered when evaluating its value? Remarks


Download ppt "Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,"

Similar presentations


Ads by Google