Presentation is loading. Please wait.

Presentation is loading. Please wait.

Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules.

Similar presentations


Presentation on theme: "Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules."— Presentation transcript:

1 Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules

2 Outline of Topics Prognostic Tests –Differences from diagnostic tests –Quantifying prediction: calibration and discrimination –Comparing predictions –Value of prognostic information Combining Tests/Diagnostic Models –Importance of test non-independence –Recursive Partitioning –Logistic Regression –Variable (Test) Selection –Importance of validation separate from derivation

3 Prognostic Tests Differences from diagnostic tests Validation/Quantifying Accuracy (calibration and discrimination) Comparing predictions by different people or different models Assessing the value of prognostic information

4 Difference from Diagnostic Tests Diagnostic tests are for prevalent disease; prognostic tests are for incident outcomes. Studies of prognostic tests have a longitudinal rather than cross-sectional time dimension.* (Fix a future time point and determine whether the dichotomous outcome has occurred at that point, e.g., death or recurrence at 5 years.) Prognostic test “result” is often a probability of having the outcome by the future time point (e.g. risk of death or recurrence by 5 years). *But studies of diagnostic tests that use clinical follow-up as a gold standard also are longitudinal.

5 Problems with estimating risk of outcome by a fixed future time point Equates all outcomes prior to the time point and all outcomes after the time point. (Death at 1 month is the same as death at 4 years and 11 months; 5-year-1- month survival is the same as > 10-year survival). Cannot analyze subjects lost to follow-up prior to the time point. Time-to-event analysis (proportional hazards) often important/necessary, but it’s covered elsewhere in your curriculum.

6 Predicting Continuous Outcomes Time to death/recurrence Birth weight Weight loss/gain

7 Predicting Continuous Outcomes Glare, P., K. Virik, et al. (2003). "A systematic review of physicians' survival predictions in terminally ill cancer patients." Bmj 327(7408): 195-8.

8 Predicting Continuous Outcomes Can calculate Outcome actual - Outcome predicted for each individual.* Summarize with mean and SD of individual differences. Plot individual differences vs. actual outcome. Looks like a Bland-Altman plot. (And that’s all I’m going to say about predicting continuous outcomes.) *This does not make sense for dichotomous outcomes.

9 Prognostic Tests and Multivariable Diagnostic Models Commonly express results in terms of a probability --risk of the outcome by a fixed time point (prognostic test) --posterior probability of disease (diagnostic model) Need to assess both calibration and discrimination.

10 Example* Oncologists estimated the probability of “cure” (5-year disease-free survival) in each of 96 cancer patients. After 5 years, 70 (of the 96) died or had recurrence, and 26 (27%) were “cured.” *Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol 50(1): 21-9.

11 PatientID Oncologist's Predicted Probability Disease Free survival 11%0 225%0 350%0 490%1 560%0 640%0 745%0 835%0 9 1 1010%0 1175%1 1255%0

12 How do you assess the validity of the predictions?

13 How accurate are the predicted probabilities? –Break the population into groups –Compare actual and predicted probabilities for each group Calibration* *Related to Goodness-of-Fit and diagnostic model validation, which will be discussed shortly.

14 Calibration Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol 50(1): 21-9.

15 How well can the test separate subjects in the population from the mean probability to values closer to zero or 1? May be more generalizable Often measured with C-statistic (AUROC) Discrimination

16 Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol 50(1): 21-9.

17 Discrimination Mackillop, W. J. and C. F. Quirt (1997). "Measuring the accuracy of prognostic judgments in oncology." J Clin Epidemiol 50(1): 21-9.

18 Calibration vs. Discrimination Perfect calibration, no discrimination: –Oncologist assigned 27% probability of cure to each of the 96 patients. Perfect discrimination, poor calibration –Mean* of oncologist-assigned “cure” probabilities was 50%, but every patient who died or had a recurrence was assigned a cure probability ≤ 40% and every patient who survived was assigned a probability ≥ 60%. * ∑p i n i / N (It was actually 30% in the study.)

19 Calibration

20 Discrimination 100% 80% 60% 40%

21 Comparing Predictions Compare ROC Curves and AUROCs Reclassification Tables*, Net Reclassification Improvement (NRI), Integrated Discrimination Improvement (IDI) See Jan. 2008 Issue of Statistics in Medicine** (? and EBD Edition 2 ?) *Problem 8-1 has a reclassification table. **Pencina et al. Stat Med. 2008 Jan 30;27(2):157-72;

22 Value of Prognostic Information Why do you want to know prognosis? -- ALS, slow vs rapid progression -- GBM, expected survival -- Na-MELD Score vs. Na-MELD + Ascites

23 Value of Prognostic Information To inform treatment or other clinical decisions To inform (prepare) patients and their families To stratify by disease severity in clinical trials Altman, D. G. and P. Royston (2000). "What do we mean by validating a prognostic model?" Stat Med 19(4): 453-73.

24 Doctors and patients like prognostic information But hard to assess its value Most objective approach is decision- analytic. Consider: –What decision is to be made? –Costs of errors? –Cost of test? Value of Prognostic Information

25 Common Problems with Studies of Prognostic Tests See Chapter 7

26 –Importance of test non-independence –Recursive Partitioning –Logistic Regression –Variable (Test) Selection –Importance of validation separate from derivation (calibration and discrimination revisited) Combining Tests/Diagnostic Models

27 Combining Tests Example Prenatal sonographic Nuchal Translucency (NT) and Nasal Bone Exam as dichotomous tests for Trisomy 21* *Cicero, S., G. Rembouskos, et al. (2004). "Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11- 14-week scan." Ultrasound Obstet Gynecol 23(3): 218-23.

28 If NT ≥ 3.5 mm Positive for Trisomy 21* *What’s wrong with this definition?

29

30 In general, don’t make multi-level tests like NT into dichotomous tests by choosing a fixed cutoff I did it here to make the discussion of multiple tests easier I arbitrarily chose to call ≥ 3.5 mm positive

31 One Dichotomous Test Trisomy 21 Nuchal D+ D- LR Translucency ≥ 3.5 mm212 4787.0 < 3.5 mm12147450.4 Total3335223 Do you see that this is (212/333)/(478/5223)? Review of Chapter 3: What are the sensitivity, specificity, PPV, and NPV of this test? (Be careful.)

32 Nuchal Translucency Sensitivity = 212/333 = 64% Specificity = 4745/5223 = 91% Prevalence = 333/(333+5223) = 6% (Study population: pregnant women about to undergo CVS, so high prevalence of Trisomy 21) PPV = 212/(212 + 478) = 31% NPV = 4745/(121 + 4745) = 97.5%* * Not that great; prior to test P(D-) = 94%

33 Clinical Scenario – One Test Pre-Test Probability of Down’s = 6% NT Positive Pre-test prob: 0.06 Pre-test odds: 0.06/0.94 = 0.064 LR(+) = 7.0 Post-Test Odds = Pre-Test Odds x LR(+) = 0.064 x 7.0 = 0.44 Post-Test prob = 0.44/(0.44 + 1) = 0.31

34 NT Positive Pre-test Prob = 0.06 P(Result|Trisomy 21) = 0.64 P(Result|No Trisomy 21) = 0.09 Post-Test Prob = ? http://www.quesgen.com/Calculators/PostProdOfDisease/PostProdOfDisease.html Slide Rule

35 Nasal Bone Seen NBA=“No” Neg for Trisomy 21 Nasal Bone Absent NBA=“Yes” Pos for Trisomy 21

36 Second Dichotomous Test Nasal Bone Tri21+ Tri21-LR Absent Yes 229 12927.8 No10450940.32 Total3335223 Do you see that this is (229/333)/(129/5223)?

37 Pre-Test Probability of Trisomy 21 = 6% NT Positive for Trisomy 21 (≥ 3.5 mm) Post-NT Probability of Trisomy 21 = 31% NBA Positive (no bone seen) Post-NBA Probability of Trisomy 21 = ? Clinical Scenario –Two Tests Using Probabilities

38 Clinical Scenario – Two Tests Pre-Test Odds of Tri21 = 0.064 NT Positive (LR = 7.0) Post-Test Odds of Tri21 = 0.44 NBA Positive (LR = 27.8?) Post-Test Odds of Tri21 =.44 x 27.8? = 12.4? (P = 12.4/(1+12.4) = 92.5%?) Using Odds

39 Clinical Scenario – Two Tests Pre-Test Probability of Trisomy 21 = 6% NT ≥ 3.5 mm AND Nasal Bone Absent

40 Question Can we use the post-test odds after a positive Nuchal Translucency as the pre- test odds for the positive Nasal Bone Examination? i.e., can we combine the positive results by multiplying their LRs? LR(NT+, NBE +) = LR(NT +) x LR(NBE +) ? = 7.0 x 27.8 ? = 194 ?

41 Answer = No NTNBE Trisomy 21 +% Trisomy 21 -%LR Pos 15847%360.7% 69 PosNeg5416%4428.5% 1.9 NegPos7121%931.8% 12 Neg 5015%465289% 0.2 Total 333100%5223100% Not 194 158/(158 + 36) = 81%, not 92.5%

42 Non-Independence Absence of the nasal bone does not tell you as much if you already know that the nuchal translucency is ≥ 3.5 mm.

43 Clinical Scenario Pre-Test Odds of Tri21 = 0.064 NT+/NBE + (LR =68.8) Post-Test Odds = 0.064 x 68.8 = 4.40 (P = 4.40/(1+4.40) = 81%, not 92.5%) Using Odds

44 Non-Independence

45 Non-Independence of NT and NBA Apparently, even in chromosomally normal fetuses, enlarged NT and absence of the nasal bone are associated. A false positive on the NT makes a false positive on the NBE more likely. Of normal (D-) fetuses with NT < 3.5 mm only 2.0% had nasal bone absent. Of normal (D-) fetuses with NT ≥ 3.5 mm, 7.5% had nasal bone absent. Some (but not all) of this may have to do with ethnicity. In this London study, chromosomally normal fetuses of “Afro-Caribbean” ethnicity had both larger NTs and more frequent absence of the nasal bone. In Trisomy 21 (D+) fetuses, normal NT was associated with the presence of the nasal bone, so a false negative on the NT was associated with a false negative on the NBE.

46 Non-Independence Instead of looking for the nasal bone, what if the second test were just a repeat measurement of the nuchal translucency? A second positive NT would do little to increase your certainty of Trisomy 21. If it was false positive the first time around, it is likely to be false positive the second time.

47 Reasons for Non-Independence Tests measure the same aspect of disease. One aspect of Down’s syndrome is slower fetal development; the NT decreases more slowly and the nasal bone ossifies later. Chromosomally NORMAL fetuses that develop slowly will tend to have false positives on BOTH the NT Exam and the Nasal Bone Exam.

48 Reasons for Non-Independence Tests measure the same aspect of disease. Consider exercise ECG (EECG) and radionuclide scan as tests for coronary artery disease (CAD) with the gold standard being anatomic narrowing of the arteries on angiogram. Both EECG and nuclide scan measure functional narrowing. In a patient without anatomic narrowing (a D- patient), coronary artery spasm could cause false positives on both tests.

49 Reasons for Non-Independence Spectrum of disease severity. In the EECG/nuclide scan example, CAD is defined as ≥70% stenosis on angiogram. A D+ patient with 71% stenosis is much more likely to have a false negative on both the EECG and the nuclide scan than a D+ patient with 99% stenosis.

50 Reasons for Non-Independence Spectrum of non-disease severity. In this example, CAD is defined as ≥70% stenosis on angiogram. A D- patient with 69% stenosis is much more likely to have a false positive on both the EECG and the nuclide scan than a D- patient with 33% stenosis.

51 Unless tests are independent, we can’t combine results by multiplying LRs

52 Ways to Combine Multiple Tests On a group of patients (derivation set), perform the multiple tests and (independently*) determine true disease status (apply the gold standard) Measure LR for each possible combination of results Recursive Partitioning Logistic Regression *Beware of incorporation bias

53 Determine LR for Each Result Combination NTNBATri21+%Tri21-%LR Post Test Prob* Pos 15847%360.7%6981% PosNeg5416%4428.5%1.911% NegPos7121%931.8%1243% Neg 5015%465289.1% 0.21% Total 333100%5223100% *Assumes pre-test prob = 6%

54 Sort by LR (Descending) NTNBATri21+%Tri21-%LR Pos 15847%360.70%69 NegPos7121%931.80%12 PosNeg5416%4428.50%1.9 Neg 5015%465289.10%0.2

55 Apply Chapter 4 – Multilevel Tests Now you have a multilevel test (In this case, 4 levels.) Have LR for each test result Can create ROC curve and calculate AUROC Given pre-test probability and treatment threshold probability (C/(B+C)), can find optimal cutoff.

56 Create ROC Table NTNBETri21+SensTri21-1 - SpecLRAUROC 0% 0 Pos 15847%360.70%690.002 NegPos7168%933%120.012 PosNeg5484%44211%1.90.077 Neg 50100%4652100%0.20.896

57 AUROC = 0.896

58 Optimal Cutoff NTNBELR Post-Test Prob Pos 690.81 NegPos120.43 PosNeg1.90.11 Neg 0.20.01 Assume Pre-test probability = 6% Threshold for CVS is 2%

59 Determine LR for Each Result Combination 2 dichotomous tests: 4 combinations 3 dichotomous tests: 8 combinations 4 dichotomous tests: 16 combinations Etc. 2 3-level tests: 9 combinations 3 3-level tests: 27 combinations Etc.

60 Determine LR for Each Result Combination How do you handle continuous tests? Not always practical for groups of tests.

61 Recursive Partitioning Measure NT First

62 Recursive Partitioning Examine Nasal Bone First

63 Do Nasal Bone Exam First Better separates Trisomy 21 from chromosomally normal fetuses If your threshold for CVS is between 11% and 43%, you can stop after the nasal bone exam If your threshold is between 1% and 11%, you should do the NT exam only if the NBE is normal.

64 Recursive Partitioning Examine Nasal Bone First CVS if P(Trisomy 21 > 5%)

65

66 Recursive Partioning Same as Classification and Regression Trees (CART) Don’t have to work out probabilities (or LRs) for all possible combinations of tests, because of “tree pruning”

67 Tree Pruning: Goldman Rule* 8 “Tests” for Acute MI in ER Chest Pain Patient : 1.ST Elevation on ECG; 2.CP < 48 hours; 3.ST-T changes on ECG; 4.Hx of MI; 5.Radiation of Pain to Neck/LUE; 6.Longest pain > 1 hour; 7.Age > 40 years; 8.CP not reproduced by palpation. *Goldman L, Cook EF, Brand DA, et al. A computer protocol to predict myocardial infarction in emergency department patients with chest pain. N Engl J Med. 1988;318(13):797-803.

68 8 tests  2 8 = 256 Combinations

69

70 Recursive Partitioning Does not deal well with continuous test results* *when there is a monotonic relationship between the test result and the probability of disease

71 Logistic Regression Ln(Odds(D+)) = a + b NT NT+ b NBA NBA + b interact (NT)(NBA) “+” = 1 “-” = 0 More on this later in ATCR!

72 Why does logistic regression model log-odds instead of probability? Related to why the LR Slide Rule’s log-odds scale helps us visualize combining test results.

73 Probability of Trisomy 21 vs. Maternal Age

74 Ln(Odds) of Trisomy 21 vs. Maternal Age

75 Combining 2 Continuous Tests > 1% Probability of Trisomy 21 < 1% Probability of Trisomy 21

76 Logistic Regression Approach to the “R/O ACI patient” *Selker HP, Griffith JL, D'Agostino RB. A tool for judging coronary care unit admission appropriateness, valid for both real- time and retrospective use. A time-insensitive predictive instrument (TIPI) for acute cardiac ischemia: a multicenter study. Med Care. Jul 1991;29(7):610-627. For corrected coefficients, see http://medg.lcs.mit.edu/cardiac/cpain.htm CoefficientMV Odds Ratio Constant-3.93 Presence of chest pain1.233.42 Pain major symptom0.882.41 Male Sex0.712.03 Age 40 or less-1.440.24 Age > 500.671.95 Male over 50 years**-0.430.65 ST elevation1.3143.72 New Q waves0.621.86 ST depression0.992.69 T waves elevated1.0952.99 T waves inverted1.133.10 T wave + ST changes**-0.3140.73

77 Clinical Scenario* 71 y/o man with 2.5 hours of CP, substernal, non-radiating, described as “bloating.” Cannot say if same as prior MI or worse than prior angina. Hx of CAD, s/p CABG 10 yrs prior, stenting 3 years and 1 year ago. DM on Avandia. ECG: RBBB, Qs inferiorly. No ischemic ST- T changes. *Real patient seen by MAK 1 am 10/12/04

78

79 CoefficientClinical Scenario Constant-3.93Result-3.93 Presence of chest pain1.231 Pain major symptom0.881 Sex0.711 Age 40 or less-1.4400 Age > 500.671 Male over 50 years-0.431 ST elevation1.31400 New Q waves0.6200 ST depression0.9900 T waves elevated1.09500 T waves inverted1.1300 T wave + ST changes-0.31400 -0.87 Odds of ACI0.418952 Probability of ACI30%

80 Choosing Which Tests to Include in the Decision Rule Have focused on how to combine results of two or more tests, not on which of several tests to include in a decision rule. Variable Selection Options include: Recursive partitioning Automated stepwise logistic regression Choice of variables in derivation data set requires confirmation in a separate validation data set.

81 Variable Selection Especially susceptible to overfitting

82 Need for Validation: Example* Study of clinical predictors of bacterial diarrhea. Evaluated 34 historical items and 16 physical examination questions. 3 questions (abrupt onset, > 4 stools/day, and absence of vomiting) best predicted a positive stool culture (sensitivity 86%; specificity 60% for all 3). Would these 3 be the best predictors in a new dataset? Would they have the same sensitivity and specificity? *DeWitt TG, Humphrey KF, McCarthy P. Clinical predictors of acute bacterial diarrhea in young children. Pediatrics. Oct 1985;76(4):551- 556.

83 Need for Validation Develop prediction rule by choosing a few tests and findings from a large number of possibilities. Takes advantage of chance variations* in the data. Predictive ability of rule will probably disappear when you try to validate on a new dataset. Can be referred to as “overfitting.” e.g., low serum calcium in 12 children with hemolytic uremic syndrome and bad outcomes

84 VALIDATION No matter what technique (CART or logistic regression) is used, the tests included in a model and the way in which their results are combined must be tested on a data set different from the one used to derive the rule. Beware of studies that use a “validation set” to tweak the model. This is really just a second derivation step.

85 Validation Dataset Measure all the variables needed for the model. Determine disease status (D+ or D-) on all subjects.

86 VALIDATION Calibration -- Divide dataset into probability groups (deciles, quintiles, …) based on the model (no tweaking allowed). -- In each group, compare actual D+ proportion to model-predicted probability in each group.

87 VALIDATION Discrimination Discrimination -- Test result is model-predicted probability of disease. -- Use “Walking Man” to draw ROC curve and calculate AUROC.

88 Outline of Topics Prognostic Tests –Differences from diagnostic tests –Quantifying prediction: calibration and discrimination –Comparing predictions –Value of prognostic information Combining Tests/Diagnostic Models –Importance of test non-independence –Recursive Partitioning –Logistic Regression –Variable (Test) Selection –Importance of validation separate from derivation


Download ppt "Michael A. Kohn, MD, MPP 10/30/2008 Chapter 7 – Prognostic Tests Chapter 8 – Combining Tests and Multivariable Decision Rules."

Similar presentations


Ads by Google