Michael A. Kohn, MD, MPP 10/28/2010 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules.

Slides:



Advertisements
Similar presentations
Michael A. Kohn, MD, MPP 6/9/2011 Combining Tests and Multivariable Decision Rules.
Advertisements

How would you explain the smoking paradox. Smokers fair better after an infarction in hospital than non-smokers. This apparently disagrees with the view.
Logistic Regression Psy 524 Ainsworth.
Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,
Receiver Operating Characteristic (ROC) Curves
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Baye’s Rule and Medical Screening Tests. Baye’s Rule Baye’s Rule is used in medicine and epidemiology to calculate the probability that an individual.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Statistics for Health Care
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Sample Size Determination
EVIDENCE BASED MEDICINE
Cohort Studies Hanna E. Bloomfield, MD, MPH Professor of Medicine Associate Chief of Staff, Research Minneapolis VA Medical Center.
Edward P. Sloan, MD, MPH, FACEP EMRA /FERNE Case Conference: The ED Management of TIA, AIS and ICH Patients.
DATASET INTRODUCTION 1. Dataset: Urine 2 From Cleveland Clinic
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Statistics in Screening/Diagnosis
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Multiple Choice Questions for discussion
Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules.
Stats Tutorial. Is My Coin Fair? Assume it is no different from others (null hypothesis) When will you no longer accept this assumption?
DEB BYNUM, MD AUGUST 2010 Evidence Based Medicine: Review of the basics.
Evidence Based Diagnosis Mark J. Pletcher, MD MPH 6/28/2012 Combining Tests.
Lecture 4: Assessing Diagnostic and Screening Tests
Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Tom Newman (based on previous lectures by.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 2: Diagnostic Classification.
Vanderbilt Sports Medicine How to practice and teach EBM Chapter 3 May 3, 2006.
Dichotomous Tests Thomas B. Newman, MD, MPH September 27, 2012 Thanks to Josh Galanter and Michael Shlipak 1.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
“PREDICTIVE MODELING” CoSBBI, July Jennifer Hu.
EBCP. Random vs Systemic error Random error: errors in measurement that lead to measured values being inconsistent when repeated measures are taken. Ie:
Evaluation of Diagnostic Tests
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
. Ruling in or out a disease Tests to rule out a disease  You want very few false negatives  High sensitivity 
Diagnosis: EBM Approach Michael Brown MD Grand Rapids MERC/ Michigan State University.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Appraising A Diagnostic Test
Assessing Information from Multilevel (Ordinal) and Continuous Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn,
Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.
Stats Facts Mark Halloran. Diagnostic Stats Disease present Disease absent TOTALS Test positive aba+b Test negative cdc+d TOTALSa+cb+da+b+c+d.
1. Statistics Objectives: 1.Try to differentiate between the P value and alpha value 2.When to perform a test 3.Limitations of different tests and how.
Assessing Information from Multilevel (Ordinal) Tests ROC curves and Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/4/2007.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Assessing Information from Multilevel and Continuous Tests Likelihood Ratios for results other than “+” or “-” Michael A. Kohn, MD, MPP 10/13/2011.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Multiple Tests, Multivariable Decision Rules, and Prognostic Tests Michael A. Kohn, MD, MPP 10/25/2007 Chapter 8 – Multiple Tests and Multivariable Decision.
HSS4303B – Intro to Epidemiology Feb 8, Agreement.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
EVALUATING u After retrieving the literature, you have to evaluate or critically appraise the evidence for its validity and applicability to your patient.
Biostatistics Case Studies 2008 Peter D. Christenson Biostatistician Session 6: Classification Trees.
PTP 560 Research Methods Week 12 Thomas Ruediger, PT.
Diagnostic Likelihood Ratio Presented by Juan Wang.
Diagnosis:Testing the Test Verma Walker Kathy Davies.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
Critical Appraisal Course for Emergency Medicine Trainees Module 5 Evaluation of a Diagnostic Test.
Bootstrap and Model Validation
Diagnostic Test Studies
Jeffrey A. Kuller, MD; Sean C. Blackwell, MD
Diagnosis II Dr. Brent E. Faught, Ph.D. Assistant Professor
Refining Probability Test Informations Vahid Ashoorion MD. ,MSc,
Evidence Based Diagnosis
Presentation transcript:

Michael A. Kohn, MD, MPP 10/28/2010 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules

Outline of Topics Prognostic Tests –Differences from diagnostic tests –Quantifying prediction: calibration and discrimination –Value of prognostic information –Comparing predictions –Example: ABCD2 Score Combining Tests/Diagnostic Models –Importance of test non-independence –Recursive Partitioning –Logistic Regression –Variable (Test) Selection –Importance of validation separate from derivation

Prognostic Tests (Ch 7)* Differences from diagnostic tests Validation/Quantifying Accuracy (calibration and discrimination) Assessing the value of prognostic information Comparing predictions by different people or different models *Will not discuss time-to-event analysis or predicting continuous outcomes. (Covered in Chapter 7.)

Chance determines whether you get the disease Spin the needle

Diagnostic Test 1) Spin needle to see if you develop disease. 2) Perform test for disease. 3) Gold standard determines true disease state. (Can calculate sensitivity, specificity, LRs.)

Prognostic Test 1) Perform test to predict the risk of disease. 2) Spin needle to see if you develop disease. 3) How do you assess the validity of the predictions?

Example: Mastate Cancer Once developed, always fatal. Can be prevented by mastatectomy. Two oncologists separately assign each of N individuals a risk for developing mastate cancer in the next 5 years.

PatientID Oncologist 1's Predicted Probability Oncologist 2's Predicted Probability Mastate Cancer within 5 years 120% 0 250%20%0 335%20%0 450%20%1 535%20% %20%1 1050%20%0 1150%20%1 1235%20%0

How do you assess the validity of the predictions?

Oncologist 1 assigns risk of 50% How many like this? How many get mastate cancer? Spin the needles.

Oncologist 1 assigns risk of 35% How many like this? How many get mastate cancer? Spin the needles.

Oncologist 1 assigns risk of 20% How many like this? How many get mastate cancer? Spin the needles.

How accurate are the predicted probabilities? Break the population into groups Compare actual and predicted probabilities for each group Calibration* *Related to Goodness-of-Fit and diagnostic model validation.

Calibration Oncologist 1's Predicted Risk Observed Proportion Observed - Predicted 50%5/1631.3% -18.8% 35%3/1618.8% -16.3% 20%2/1612.5% -7.5% Oncologist 2's Predicted Risk Observed Proportion Observed - Predicted 20%10/4820.8% +0.8%

Calibration

How well can the test separate subjects in the population from the mean probability to values closer to zero or 1? May be more generalizable Often measured with C-statistic (AUROC) Discrimination

Oncologist 1D+ D- Risk = 50%550%1129% Risk = 35%330%1334% Risk = 20%220%1437% Total10 100% %

Discrimination

AUROC = 0.63

True Risk Oncologist 1: 20% Oncologist 2: 20% True Risk: 11.1% Oncologist 1: 35% Oncologist 2: 20% True Risk: 16.7% Oncologist 1: 50% Oncologist 2: 20% True Risk: 33.3%

True Risk -- Calibration True RiskObserved Proportion Observed - Predicted 33.3%5/1631.3% -2.1% 16.7%3/1618.8% 2.1% 11.1%2/1612.5% 1.4%

True Risk -- Calibration

True Risk -- Discrimination True RiskD+ D- 33.3%550%1129% 16.7%330%1334% 11.1%220%1437% Total10 100% %

True Risk -- Discrimination

AUROC = 0.63

ROC curve depends only on rankings, not calibration

Random event occurs AFTER prognostic test. 1) Perform test to predict the risk of disease. 2) Spin needle to see if you develop disease. Only crystal ball allows perfect prediction.

True Risk: 11.1%True Risk: 16.7%True Risk: 33.3% Maximum AUROC Maximum AUROC = 0.65

Diagnostic TestPrognostic Test Purpose Chance Event Occurs to Patient Study Design Test Result Maximum Obtainable AUROC Diagnostic versus Prognostic Tests Identify Prevalent Disease Predict Incident Disease/Outcome Prior to TestAfter Test Cross-SectionalCohort +/-, ordinal, continuous Risk (Probability) <1 (not clairvoyant) 1 (gold standard)

Value of Prognostic Information Why do you want to know risk of mastate cancer? To decide whether to do a mastatectomy.

Value of Prognostic Information It is 4 times worse to die of mastate gland cancer than to have a mastatectomy. C death = 4C mastatectomy Should do mastatectomy when P × C death > C mastatectomy P > C mastatectomy / C death P > 1/4 Fine Point: If it is 4 times worse to die of mastate cancer that to live AND have a mastatectomy, then the NET cost of a death is 4C – C = 3C. Threshold odds equal C:B or 1:3.

Oncologist 1: 20% < 25% NO Mastatectomy 11 out of 100 die of mastate cancer, no mastatectomies Oncologist 1: 35% > 25% Mastatectomy 83 out of 100 unnecessary; no mastate cancer deaths Oncologist 1: 50% > 25% Mastatectomy 67 out of 100 unnecessary; no mastate cancer deaths Value of Prognostic Information 300 patients (100 per risk group)

Oncologist 2: 20% < 25% No Mastatectomy 11 out of 100 die of mastate cancer; no mastatectomies Oncologist 2: 20% < 25% No Mastatectomy 17 out of 100 die; no mastatectomies Oncologist 2: 20% < 25% No Mastatectomy 33 out of 100 die; no mastatectomies Value of Prognostic Information 300 patients (100 per risk group)

True Risk: 11% < 25% No Mastatectomy 11 out of 100 die of mastate cancer; no mastatectomies True Risk: 17% < 25% No Mastatectomy 17 out of 100 die; no mastatectomies True Risk: 33% > 25% Mastatectomy 67 out of 100 unnecessary; no mastate cancer deaths Value of Prognostic Information 300 patients (100 per risk group)

Mastatecto mies Deaths from Mastate Cancer Mastatectomy "Equivalents" Death “Equivalents” Oncologist Oncologist True Risk Value of True Risk Estimate Relative to Oncologists 1 and 2 = 33 “mastatectomy equivalents“ and 8 “death equivalents. Value of Prognostic Information 300 patients (100 per risk group)

Comparing Predictions Identify cohort. Obtain predictions (or information necessary for prediction) at inception. Provide uniform treatment to cohort or at least treat independent of (blinded to) prediction. Determine outcomes. Scenario: What would have happened if treatment were based on predicted risk?

Doctors and patients like prognostic information But hard to assess its value Most objective approach is decision- analytic. Consider: What decision is to be made? Costs of errors? Cost of test? Value of Prognostic Information

Common Problems with Studies of Prognostic Tests See Chapter 7

Comparing Predictions Compare ROC Curves and AUROCs Reclassification Tables*, Net Reclassification Improvement (NRI), Integrated Discrimination Improvement (IDI) See Jan. 30, 2008 Issue of Statistics in Medicine* (? and EBD Edition 2 ?) *Pencina et al. Stat Med Jan 30;27(2):157-72;

Risk FactorPoints Age ≥ 60 years1 Blood Pressure SBP ≥ 140 or DBP ≥ 901 Clincal features of TIA Unilateral weakness (with or without speech impairment)2 Speech impairment without unilateral weakness1 Duration TIA duration ≥ 60 minutes2 TIA duration minutes1 Diabetes Diabetes diagnosed by a physician1 Total ABCD2 Score0 – 7 ABCD2 Johnston SC, et al. Lancet Jan 27;369(9558):

ABCD2 (Calibration) Johnston SC, et al. Lancet Jan 27;369(9558): Score % of TIA Patients 90-Day Stroke Risk 0-334%3.1% 4-545%9.8% 6-721%17.8%

ABCD2 (Discrimination) Johnston SC, et al. Lancet Jan 27;369(9558): Score 90-Day Stroke No 90- Day Stroke LR %19.0% %44.7% %36.3% %

ABCD2 (Discrimination) Johnston SC, et al. Lancet Jan 27;369(9558):

ABCD2 (Discrimination) Johnston SC, et al. Lancet Jan 27;369(9558):

Better Discrimination

Replace This Johnston SC, et al. Lancet Jan 27;369(9558):

With This

Replace This Johnston SC, et al. Lancet Jan 27;369(9558):

With This

What to with the ABCD2 score? Recommendation is to admit TIA patients with ABCD2 > 5, and consider admission for ABCD Could give tPA if they have a stroke. Accelerated work-up. (? evidence that accelerated work-up actually improves outcomes.)

Importance of test non- independence Recursive Partitioning Logistic Regression Variable (Test) Selection Importance of validation separate from derivation (calibration and discrimination revisited) Combining Tests/Diagnostic Models

Combining Tests Example Prenatal sonographic Nuchal Translucency (NT) and Nasal Bone Exam as dichotomous tests for Trisomy 21* *Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the week scan." Ultrasound Obstet Gynecol 23(3):

If NT ≥ 3.5 mm Positive for Trisomy 21* *What’s wrong with this definition?

In general, don’t make multi-level tests like NT into dichotomous tests by choosing a fixed cutoff I did it here to make the discussion of multiple tests easier I arbitrarily chose to call ≥ 3.5 mm positive

One Dichotomous Test Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm < 3.5 mm Total Do you see that this is (212/333)/(478/5223)? Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)

Nuchal Translucency Sensitivity = 212/333 = 64% Specificity = 4745/5223 = 91% Prevalence = 333/( ) = 6% (Study population: pregnant women about to undergo CVS, so high prevalence of Trisomy 21) PPV = 212/( ) = 31% NPV = 4745/( ) = 97.5%* * Not that great; prior to test P(D-) = 94%

Clinical Scenario – One Test Pre-Test Probability of Down’s = 6% NT Positive Pre-test prob: 0.06 Pre-test odds: 0.06/0.94 = LR(+) = 7.0 Post-Test Odds = Pre-Test Odds x LR(+) = x 7.0 = 0.44 Post-Test prob = 0.44/( ) = 0.31

NT Positive Pre-test Prob = 0.06 P(Result|Trisomy 21) = 0.64 P(Result|No Trisomy 21) = 0.09 Post-Test Prob = ? Slide Rule

Nasal Bone Seen NBA=“No” Neg for Trisomy 21 Nasal Bone Absent NBA=“Yes” Pos for Trisomy 21

Second Dichotomous Test Nasal Bone Tri21+ Tri21-LR Absent Yes No Total Do you see that this is (229/333)/(129/5223)?

Pre-Test Probability of Trisomy 21 = 6% NT Positive for Trisomy 21 (≥ 3.5 mm) Post-NT Probability of Trisomy 21 = 31% Nasal Bone Absent Post-NBA Probability of Trisomy 21 = ? Clinical Scenario –Two Tests Using Probabilities

Clinical Scenario – Two Tests Pre-Test Odds of Tri21 = NT Positive (LR = 7.0) Post-Test Odds of Tri21 = 0.44 Nasal Bone Absent (LR = 27.8?) Post-Test Odds of Tri21 =.44 x 27.8? = 12.4? (P = 12.4/(1+12.4) = 92.5%?) Using Odds

Clinical Scenario – Two Tests Pre-Test Probability of Trisomy 21 = 6% NT ≥ 3.5 mm AND Nasal Bone Absent

Question Can we use the post-test odds after a positive Nuchal Translucency as the pre-test odds for the positive Nasal Bone Examination? i.e., can we combine the positive results by multiplying their LRs? LR(NT+, NBE +) = LR(NT +) x LR(NBE +) ? = 7.0 x 27.8 ? = 194 ?

Answer = No NTNBE Trisomy 21 +% Trisomy 21 -%LR Pos 15847%360.7% 69 PosNeg5416%4428.5% 1.9 NegPos7121%931.8% 12 Neg 5015%465289% 0.2 Total % % Not /( ) = 81%, not 92.5%

Non-Independence Absence of the nasal bone does not tell you as much if you already know that the nuchal translucency is ≥ 3.5 mm.

Clinical Scenario Pre-Test Odds of Tri21 = NT+/NBE + (LR =68.8) Post-Test Odds = x 68.8 = 4.40 (P = 4.40/(1+4.40) = 81%, not 92.5%) Using Odds

Non-Independence

Non-Independence of NT and NBA Apparently, even in chromosomally normal fetuses, enlarged NT and absence of the nasal bone are associated. A false positive on the NT makes a false positive on the NBE more likely. Of normal (D-) fetuses with NT < 3.5 mm only 2.0% had nasal bone absent. Of normal (D-) fetuses with NT ≥ 3.5 mm, 7.5% had nasal bone absent. Some (but not all) of this may have to do with ethnicity. In this London study, chromosomally normal fetuses of “Afro-Caribbean” ethnicity had both larger NTs and more frequent absence of the nasal bone. In Trisomy 21 (D+) fetuses, normal NT was associated with the presence of the nasal bone, so a false negative on the NT was associated with a false negative on the NBE.

Non-Independence Instead of looking for the nasal bone, what if the second test were just a repeat measurement of the nuchal translucency? A second positive NT would do little to increase your certainty of Trisomy 21. If it was false positive the first time around, it is likely to be false positive the second time.

Reasons for Non- Independence Tests measure the same aspect of disease. One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly and the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.

Reasons for Non- Independence Heterogeneity of Disease (e.g. spectrum of severity)*. Heterogeneity of Non-Disease. (See EBD page 158.) *In this example, Down’s syndrome is the only chromosomal abnormality considered, so disease is fairly homogeneous

Unless tests are independent, we can’t combine results by multiplying LRs

Ways to Combine Multiple Tests On a group of patients (derivation set), perform the multiple tests and (independently*) determine true disease status (apply the gold standard) Measure LR for each possible combination of results Recursive Partitioning Logistic Regression *Beware of incorporation bias

Determine LR for Each Result Combination NTNBATri21+%Tri21-%LR Post Test Prob* Pos 15847%360.7%6981% PosNeg5416%4428.5%1.911% NegPos7121%931.8%1243% Neg 5015% % 0.21% Total % % *Assumes pre-test prob = 6%

Sort by LR (Descending) NTNBATri21+%Tri21-%LR Pos 15847%360.70%69 NegPos7121%931.80%12 PosNeg5416% %1.9 Neg 5015% %0.2

Apply Chapter 4 – Multilevel Tests Now you have a multilevel test (In this case, 4 levels.) Have LR for each test result Can create ROC curve and calculate AUROC Given pre-test probability and treatment threshold probability (C/(B+C)), can find optimal cutoff.

Create ROC Table NTNBETri21+Sens Tri SpecLR 0% Pos 15847%360.70%69 NegPos7168%933%12 PosNeg5484%44211%1.9 Neg 50100% %0.2

AUROC = 0.896

Optimal Cutoff NTNBELR Post-Test Prob Pos NegPos PosNeg Neg Assume Pre-test probability = 6% Threshold for CVS is 2%

Determine LR for Each Result Combination 2 dichotomous tests: 4 combinations 3 dichotomous tests: 8 combinations 4 dichotomous tests: 16 combinations Etc. 2 3-level tests: 9 combinations 3 3-level tests: 27 combinations Etc.

Determine LR for Each Result Combination How do you handle continuous tests? Not always practical for groups of tests.

Recursive Partitioning Measure NT First

Recursive Partitioning Examine Nasal Bone First

Do Nasal Bone Exam First Better separates Trisomy 21 from chromosomally normal fetuses If your threshold for CVS is between 11% and 43%, you can stop after the nasal bone exam If your threshold is between 1% and 11%, you should do the NT exam only if the NBE is normal.

Recursive Partitioning Examine Nasal Bone First CVS if P(Trisomy 21 > 5%)

Recursive Partitioning Same as Classification and Regression Trees (CART) Don’t have to work out probabilities (or LRs) for all possible combinations of tests, because of “tree pruning”

Recursive Partitioning Does not deal well with continuous test results* *when there is a monotonic relationship between the test result and the probability of disease

Logistic Regression Ln(Odds(D+)) = a + b NT NT+ b NBA NBA + b interact (NT)(NBA) “+” = 1 “-” = 0 More on this later in ATCR!

Why does logistic regression model log-odds instead of probability? Related to why the LR Slide Rule’s log-odds scale helps us visualize combining test results.

Probability of Trisomy 21 vs. Maternal Age

Ln(Odds) of Trisomy 21 vs. Maternal Age

Combining 2 Continuous Tests > 1% Probability of Trisomy 21 < 1% Probability of Trisomy 21

Choosing Which Tests to Include in the Decision Rule Have focused on how to combine results of two or more tests, not on which of several tests to include in a decision rule. Variable Selection Options include: Recursive partitioning Automated stepwise logistic regression Choice of variables in derivation data set requires confirmation in a separate validation data set.

Variable Selection Especially susceptible to overfitting

Need for Validation: Example* Study of clinical predictors of bacterial diarrhea. Evaluated 34 historical items and 16 physical examination questions. 3 questions (abrupt onset, > 4 stools/day, and absence of vomiting) best predicted a positive stool culture (sensitivity 86%; specificity 60% for all 3). Would these 3 be the best predictors in a new dataset? Would they have the same sensitivity and specificity? *DeWitt TG, Humphrey KF, McCarthy P. Clinical predictors of acute bacterial diarrhea in young children. Pediatrics. Oct 1985;76(4):

Need for Validation Develop prediction rule by choosing a few tests and findings from a large number of candidates. Takes advantage of chance variations* in the data. Predictive ability of rule will probably disappear when you try to validate on a new dataset. Can be referred to as “overfitting.” e.g., low serum calcium in 12 children with hemolytic uremic syndrome and bad outcomes

VALIDATION No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule. Beware of studies that use a “validation set” to tweak the model. This is really just a second derivation step.

Prognostic Tests and Multivariable Diagnostic Models Commonly express results in terms of a probability --risk of the outcome by a fixed time point (prognostic test) --posterior probability of disease (diagnostic model) Need to assess both calibration and discrimination.

Validation Dataset Measure all the variables needed for the model. Determine disease status (D+ or D-) on all subjects.

VALIDATION Calibration -- Divide dataset into probability groups (deciles, quintiles, …) based on the model (no tweaking allowed). -- In each group, compare actual D+ proportion to model-predicted probability in each group.

VALIDATION Discrimination Discrimination -- Test result is model-predicted probability of disease. -- Use “Walking Man” to draw ROC curve and calculate AUROC.

Outline of Topics Prognostic Tests –Differences from diagnostic tests –Quantifying prediction: calibration and discrimination –Comparing predictions –Value of prognostic information –Example: ABCD2 Combining Tests/Diagnostic Models –Importance of test non-independence –Recursive Partitioning –Logistic Regression –Variable (Test) Selection –Importance of validation separate from derivation