Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lucila Ohno-Machado, MD, PhD Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology.

Similar presentations


Presentation on theme: "Lucila Ohno-Machado, MD, PhD Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology."— Presentation transcript:

1 Lucila Ohno-Machado, MD, PhD machado@dsg.harvard.edu Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology Introduction to HST 951 Medical Decision Support

2 Welcome Objectives Provide a practical approach to medical decision support Put a strong emphasis on computer-based applications that utilize concepts from the fields of artificial intelligence and statistics Focus on principled predictive modeling in biomedicine Audience Background in quantitative methods is desirable Undergraduates Graduate students and post-doctoral fellows (MDs) in medical informatics

3 Goals Model Selection Data Pre-Processing Data Pre-Processing Model Construction Model Construction System Evaluation System Evaluation Decision Support Cycle

4 Types of Models What type of support is needed? “Exploratory analysis” “Confirmatory analysis” (gold-standard) Clustering Classification

5 Inputs Age34 2Gender 4.6.5.8.2.1.3.7.2 “Probability of Cancer” 0.6  .4.2  Mitoses Neural Networks Inputs Coefficients Output Independent variables Prediction Age34 1Gender 4.5.8.4 0.6  “Probability of cancer” p = 1 1 + e -(  + cte) Mitoses Logistic Regression CART Rough Sets Models

6 Requirements, Strengths and Weaknesses, Application Examples Naïve Bayes Bayesian Networks Logistic Regression Neural Networks Classification Trees Rough Set Models Support Vector Machines Clustering (Hierarchical and Partitioning)

7 Evaluation and Comparisons Classification Calibration (plots, goodness-of-fit) Discrimination (ROC areas) Explanation (variable selection) Outliers, influential observations (case selection) Clustering Distance metrics Homogeneity Inter-cluster distance

8 nl disease threshold 1.0 3.01.7 FN TN FP TP “D” “nl” nl D 40 10 50 Sensitivity = 40/50 =.8 Specificity = 40/50 =.8

9 ROC curve “D” “nl” nl D 50 30 0 20 50 70 30 “D” “nl” nl D 40 10 50 “D” “nl” nl D 40 50 10 0 50 40 60 Sensitivity 1 - Specificity 0 1 1 Threshold 1.4 Threshold 1.7 Threshold 2.0

10 ROC Curves 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.10.20.30.40.50.60.70.80.9 1 Sensitivity 1-Specificity LR NN RS

11 Sum of system’s estimates Sum of real outcomes 0 1 1 overestimation Calibration Curves

12

13 Important Topics Decision Analysis Cost-effectiveness analysis Design of Experiments Real-World Applications Blocking inferences: quantifying anonymity

14 Examples of Projects

15 Students have worked in the past in different domains Diagnosis of –Coronary Artery Disease –Breast Cancer –Melanoma Prognosis in –Interventional Cardiology –Spinal Cord Injury –AIDS –Pregnancy

16 Data Mining and Predictive Modeling in (Bio) Medical Databases

17 0.75 0.77 0.79 0.81 0.83 0.85 0.87 0.89 0.91 123456 year Area under ROC 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 balance LogisticNeural Net We emphasize comparison of different models 0.8 y = e -(  X) Logistic Regression

18 Modeling the Risk of Major In-Hospital Complications Following Percutaneous Coronary Interventions Frederic S. Resnic, Lucila Ohno-Machado, Gavin J. Blake, Jimmy Pavliska, Andrew Selwyn, Jeffrey J. Popma ACC, 2000

19 Methods Consecutive BWH patients, 1/97 through 2/99 ­ randomly divided into training (n = 1,877) and test (n = 927) sets Outcomes: death and combined death, CABG or MI (MACE) Validation using independent dataset: 3/99 - 12/99 (n = 1,460)

20 HistoryPresentationAngiographicProceduralOperator/Lab ageacute MIoccludednumber lesionsannual volume gender primarylesion typemultivesseldevice experience diabetes rescue (A,B1,B2,C)number stentsdaily volume iddmCHF classgraft lesionstent types (8)lab device history CABGangina classvessel treatedclosure device experience Baseline creatinine Cardiogenic shock ostialgp 2b3a antagonists unscheduled case CRI failed CABG dissection post ESRD rotablator hyperlipidemia atherectomy angiojet max pre stenosis Data Source: max post stenosis Medical Record no reflow Clinician Derived Dataset: Attributes

21 Study Population Cases 2,804 1,460 Women 909 (32.4%) 433 (29.7%) 1/97-2/99 3/99-12/99 Development Set Validation Set Age > 74yrs 595 (21.2%) 308 (22.5%) Acute MI 250 (8.9%) 144 (9.9%) Primary 156 (5.6%) 95 (6.5%) Shock 62 (2.2%) 20 (1.4%) Class 3/4 CHF 176 (6.3%) 80 (5.5%) gp IIb/IIIa antagonist 1,005 (35.8%) 777 (53.2%) Death 67 (2.4%) 24 (1.6%) Death, MI, CABG (MACE) 177 (6.3%) 96 (6.6%) p=.066 p=.340 p=.311 p=.214 p=.058 p=.298 p<.001 p=.110 p=.739

22 Inputs Coefficients Output Independent variables Prediction Age34 1Gender 4.5.8.4 0.6  “Probability of cancer” p = 1 1 + e -(  + cte) Mitoses Logistic Regression Logistic regression These models are based on statistics and can only discover linear relationships among the data

23 Probability of complication 0.6 age IDDM CHF class type  number procedure Complications in Coronary Intervention

24 Logistic and Score Models for Death Odds Ratio p-value 2.51 0.02 2.12 0.05 2.06 0.13 8.41 0.00 5.93 0.03 0.57 0.20 0.53 0.12 7.53 0.00 1.70 0.17 2.78 0.04 Age > 74yrs B2/C Lesion Acute MI Class 3/4 CHF Left main PCI IIb/IIIa Use Stent Use Cardiogenic Shock Unstable Angina Tachycardic Chronic Renal Insuf. 2.58 0.06 Logistic Regression Model

25 Logistic and Score Models for Death Odds Ratio p-value 2.51 0.02 2.12 0.05 2.06 0.13 8.41 0.00 5.93 0.03 0.57 0.20 0.53 0.12 7.53 0.00 1.70 0.17 2.78 0.04 Age > 74yrs B2/C Lesion Acute MI Class 3/4 CHF Left main PCI IIb/IIIa Use Stent Use Cardiogenic Shock Unstable Angina Tachycardic Chronic Renal Insuf. 2.58 0.06 Logistic Regression Model betaRisk coefficientValue 0.9212 0.7521 0.7241 2.1294 1.7793 -0.554 -0.626 2.0194 0.5311 1.0222 0.9482 Prognostic Risk Score Model

26 Neural networks These are mathematical models that can discover non-linear relationships among the data

27 Neural networks for predicting death and complications disease free death other complications age IDDM CHF class type number procedure

28 Death Models Validation Set: 1460 Cases ROC Area LR: 0.840 Score: 0.855 aNN: 0.835 ROC = 0.50

29 Risk Score of Death: BWH Experience Unadjusted Overall Mortality Rate = 2.1% Mortality Risk Number of Cases 62% 26% 7.6% 2.9% 1.6% 1.3% 0.4% 1.4%

30 Regression Trees These are models that partition the data using one variable at a time, and can model non- linear relationships among data

31 Diagnosis of Melanoma (Michael Binder, Greg Sharp et al., 1999)

32 Dermatoscopy

33 asymmetry border detail “benigh” color “malig” border detail < 2 R A detail Y “malig” > 10 “benign” detail <2 Y

34 Performance using ABCD rule

35 Rough Sets These are mathematical models that derive rules for grouping cases based on boolean logic

36 Multiple subsamples of a large table are created and combined for rule extraction If [(number>2) and …] then Complication = true Rules

37 Comparison of Practical Prediction Models for Ambulation Following Spinal Cord Injury (Rowland et al, 1998)

38 Study Population Spinal Cord Injury Model Systems of Care Database Admitted to one of 24 federally funded designated regional SCI care systems 17,861 patients who sustained a spinal cord injury between 1973 and 1997 1755 patients had data for LEMS scores, 1993 to 1997 1138 had complete data for variables of interest

39 SCI Mortality NN Design Input & Output Admission Info (9 items) system days injury days age gender racial/ethnic group level of neurologic fxn ASIA impairment index UEMS LEMS Ambulation (1 item) yes no

40 Results: ROC Curve Area

41 Results: ROC Curves

42 Other methods Support Vector Machines, multiple variations of the nearest neighbor algorithm, etc.

43 Heart Attack Alert Program (Wang et al., 2001)

44 Cox’s Models for Prediction time (years)

45 Genetic Algorithms Search mechanism Used for variable selection (model construction) Case selection (regression diagnostics) Multidisorder diagnosis

46 People Brigham and Women’s Hospital Children’s Hospital EECS MIT School of Public Health Partners Information Systems

47 Administrivia Grading based on 30% homeworks (almost every week)/participation 30% midterm, open notes 40% project (no final exam) Lectures on the WWW for reference Handouts with Prof. Szolovits’ assistant at NE-43 r416

48 Questions/Suggestions machado@dsg.harvard.edu isaac_kohane@harvard.edu psz@mit.edu


Download ppt "Lucila Ohno-Machado, MD, PhD Division of Health Sciences and Technology Harvard Medical School Massachusetts Institute of Technology."

Similar presentations


Ads by Google