Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences

Slides:



Advertisements
Similar presentations
IM Symposium: VBCM Doug Thompson PhD Tom Cavin ASA, MAAA August 2012.
Advertisements

Grant review at NIH for statistical methodology Jeremy M G Taylor Michelle Dunn Marie Davidian.
If we use a logistic model, we do not have the problem of suggesting risks greater than 1 or less than 0 for some values of X: E[1{outcome = 1} ] = exp(a+bX)/
Breakout Session 4: Personalized Medicine and Subgroup Selection Christopher Jennison, University of Bath Robert A. Beckman, Daiichi Sankyo Pharmaceutical.
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Surgery v Radiation in Prostate Cancer Prasanna Sooriakumaran MD PhD & Peter Wiklund MD PhD.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
Advanced Statistics for Interventional Cardiologists.
Simple Linear Regression
A 14-gene prognosis signature predicts metastasis risk in node-negative, estrogen receptor-positive, Tamoxifen-treated breast cancer in different ethnogeographic.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Precision Medicine A New Initiative. The Concept of Precision Medicine (PM) The prevention and treatment strategies that take individual variability into.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
D:/rg/folien/ms/ms-USA ppt F 1 Assessment of prediction error of risk prediction models Thomas Gerds and Martin Schumacher Institute of Medical.
The Broad Institute of MIT and Harvard Classification / Prediction.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Bringing Genomics Home Your DNA: A Blueprint for Better Health Dr. Brad Popovich Chief Scientific Officer Genome British Columbia March 24, 2015 / Vancouver,
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Statistical Issues in the Analysis of Patient Outcomes April 11, 2003 Elizabeth Garrett Oncology Biostatistics Acknowledgement: Thanks to Ron Brookmeyer.
Lecture 12: Cox Proportional Hazards Model
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
Introduction Sample Size Calculation for Comparing Strategies in Two-Stage Randomizations with Censored Data Zhiguo Li and Susan Murphy Institute for Social.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Impact of Sales Force Structure Change on Products Performance Pilot Study Business Intelligence Solutions June,
© Guidant 2005 Surrogate Endpoints and Non-randomized Trials Roseann White Humble Biostatistician.
Jin MENG Shen FU (DPD 08) Biology 2 - Head/Neck and CNS Tumors
Medical Technology and Practice Patterns Institute 4733 Bethesda Ave., Suite #510 Bethesda, MD Phone: Fax: Comparison of.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
J Clin Oncol 30: R2 윤경한 / Prof. 김시영 Huan Jin, Dongsheng Tu, Naiqing Zhao, Lois E. Shepherd, and Paul E. Goss.
Bootstrap and Model Validation
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Health Care Data Analytics
JMP Discovery Summit 2016 Janet Alvarado
Harvard T.H. Chan School of Public Health
KAIR 2013 Nov 7, 2013 A Data Driven Analytic Strategy for Increasing Yield and Retention at Western Kentucky University Matt Bogard Office of Institutional.
An Empirical Comparison of Supervised Learning Algorithms
Complex Genomic Trait Predictions to Accelerate Plant Breeding Programs Kelci Miclaus1, Luciano da Costa e Silva1 , and Lauro Jose Moreira Guimaraes2.
Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular.
Boosting and Additive Trees (2)
S117: Acute Setting Predictive Analytics Sharon E. Davis, MS
Vincent Granville, Ph.D. Co-Founder, DSC
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
Optimal Dynamic Treatment Regimes
Direct or Remotely sensed
Dr. Morgan C. Wang Department of Statistics
Annals of Internal Medicine • Vol. 167 No. 12 • 19 December 2017
What is Regression Analysis?
Topic: Medicine of the future Reading: Harbron, Chris (2006)
A Novel Evaluation Methodology for Assessing Off-Policy Learning Methods in Contextual Bandits Negar Hassanpour and Russ Greiner Department of Computing.
Analytics: Its More than Just Modeling
Class Prediction Based on Gene Expression Data Issues in the Design and Analysis of Microarray Experiments Michael D. Radmacher, Ph.D. Biometric Research.
Technology & innovation in health information systems - a lever for health system strengthening? Peter Stephens, MA, PhD.
Model generalization Brief summary of methods
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
A machine learning approach to prognostic and predictive covariate identification for subgroup analysis David A. James and David Ohlssen Advanced Exploratory.
Björn Bornkamp, Georgina Bermann
Machine learning analysis for predicting survival in stage III non-small cell lung cancer patients receiving definitive chemotherapy and proton radiation.
Detecting Treatment by Biomarker Interaction with Binary Endpoints
Logical Inference on Treatment Efficacy When Subgroups Exist
Hong Zhang, Judong Shen & Devan V. Mehrotra
Presentation transcript:

Solving Wide Predictive Modeling Problems With Clinical and Genomic Data Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences SAS Institute, Inc.

Precision Medicine Initiative Introduction Outline Precision Medicine Initiative Predictive Models and the Impact of “Big” Data Tools for Model Assessment Subgroup Analysis Live Demonstration

Biological data to drive research for tailored therapies Introduction Precision Medicine initiative Biological data to drive research for tailored therapies Wide-range of application areas, including oncology and pharmacogenomics Better predict… Treatment outcomes Responders Survival or Time-to-Event Rich data mining environment

Rich set of methodology for prediction problems Popular methods: Predictive Modeling Methods and big data Rich set of methodology for prediction problems Popular methods: Continuous: GLM, PLS, Kernel methods (e.g. Ridge, Radial-Basis), Trees (e.g. Forest, Gradient Boosting), Quantile Regression Discrete: Logistic, Discriminant, KNN Censored: Life Regression, Cox Proportional Hazards, Buckley-James “Big” biological data => Wide prediction problem! Serious risk of overfitting

Simple  Complex Filtering Techniques Predictive Modeling Predictor Reduction Simple  Complex Filtering Techniques Known biology Statistical testing Clustering Forest models or linear regression model selection Optimization Combination of algorithms + predictor reduction = MILLIONS of potential models Critical to perform filtering within a cross-validated framework to prevent OVERFITTING and generalization bias in your models

Data Hold Out: K-fold, leave L-out, leave P-percent-out, etc… Model Assessment Cross-validation Model comparison Data Hold Out: K-fold, leave L-out, leave P-percent-out, etc… Hold Out Methods: Simple Random, Random Partition, Stratified, etc.. Performance Metrics: RMSE, Harrell’s C, AUC, Correlation, etc…

Specialized Prediction problems Subgroup analysis Identify subjects most-likely to respond to treatment Benefits in study design / safety / ethics Subgroup Guidance (CPMP, 2014) Classification and Regression Trees popular models (Zink et al., 2015) 0.5 1 P(Improve if NOT Treated) P(Improve if Treated) INCURABLE GET WELL ANYWAY DRUG MAKES YOU WORSE DRUG CURES YOU

JMP Genomics and JMP Clinical Predictive Modeling Reviews Example Data JMP Life Sciences Live Demonstration JMP Genomics and JMP Clinical Predictive Modeling Reviews Example Data Sepsis prediction in hospitals with metabolite and protein data Survival prediction in prostate cancer with clinical trials data

Discovery and prediction Hospital Biomarker utility to predict sepsis survival

Subgroup Analysis Interaction Trees All Randomized Subjects Linear, Logistic or Cox Model f(yi) = β0 + β1xi + β2Treatmenti + β3Treatmenti*xi Significant interaction implies differential treatment effect between subgroups defined by binary covariate All Randomized Subjects Biomarker 1 Absent Biomarker 2 Absent Biomarker 2 Present Biomarker 3 Absent Biomarker 3 Present Biomarker 1 Present Split based on p-value of treatment by covariate interaction term Su et al. (2009)

Virtual Twins (Foster et al., 2011) Subgroup analysis Virtual twins Virtual Twins (Foster et al., 2011) Fit forest model and tree model to response and counter-factual data estimated treatment effects

Subgroup identification Optimal treatment regimes Subgroup analysis Optimal treatment Regimes Subgroup identification “the right patients for a given drug” Optimal treatment regimes “the best drug for a given patient” Zhang et al. (2011) methodology to fit a response regression model and propensity score logistic model to create pseudo binary response and weight (augmented inverse probability weighted estimators or AIPWE) Use as input into predictive modeling routines including cross-validated designs (Freidlan et al., 2009)