Presentation is loading. Please wait.

Presentation is loading. Please wait.

Common Errors by Teachers and Proponents of EBM

Similar presentations


Presentation on theme: "Common Errors by Teachers and Proponents of EBM"— Presentation transcript:

1 Common Errors by Teachers and Proponents of EBM
Thomas B. Newman, MD, MPH with thanks to Michael Kohn, MD, MPP and Andi Marmor, MD Evidence-Based Pediatrics SIG, 2012

2

3 Outline/Menu Interval likelihood ratios Septic arthritis
When not to use likelihood ratios UTI in young febrile children Critical appraisal of studies of diagnostic tests: Beyond the checklist Signs and symptons of appendicitis Getting the most out of ROC curves (LAST YEAR): Meningitis in young infants ROC Curve demonstration

4 Septic Arthritis Bacterial infection in a joint.

5 Does this Adult Patient Have Septic Arthritis? JAMA. 2007;297:1478-1488.
“A 48-year-old woman…presents to the emergency department with a 2-day history of a red, swollen right knee that is painful to touch…. On examination, she is afebrile and has a right knee effusion…An arthrocentesis is performed and initial laboratory results show a negative Gram stain...” Pre-Test Probability of Septic Arthritis = 38% Synovial Fluid WBC Count = 48,000/µL Post-Test Probability of Septic Arthritis = ?

6 Test Characteristics of Synovial Fluid Studies
Margaretten, M. E. et al. JAMA 2007;297: Copyright restrictions may apply.

7 Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use?

8 Synovial WBC Count = 48,000/uL
Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 JAMA authors used this one Synovial WBC Count = 48,000/uL

9 Clinical Scenario Synovial WBC = 48,000/mL
Pre-test prob: 0.38 Pre-test odds: 0.38/0.62 = 0.61 LR(+) = 2.9 (According to JAMA authors) Post-Test Odds = Pre-Test Odds x LR(+) = 0.61 x 2.9 = 1.75 Post-Test prob = 1.75/(1.75+1) = 0.64

10 Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use?

11 Sensitivity, Specificity, LR(+), and LR(-) of the Synovial Fluid WBC Count for Septic Arthritis at 3 Different Cutoffs WBC (/uL) Sensitivity Specificity LR+ LR- >100,000 29% 99% 29.0 0.7 >50,000 62% 92% 7.8 0.4 >25,000 77% 73% 2.9 0.3 Synovial WBC Count = 48,000/uL Which LR should we use? NONE of THESE!

12 LR(result) = P(result|D+)/P(result|D-)
Likelihood Ratios P(Result) in patient WITH disease P(Result) in patients WITHOUT disease LR(result) = P(result|D+)/P(result|D-)

13 Likelihood Ratio WBC (/uL) Interval % of D+ % of D- Interval LR
>100,000 29% 1% 29.0 >50, ,000 33% 7% 4.7 >25,000-50,000 15% 19% 0.8 0 - 25,000 23% 73% 0.3

14 Likelihood Ratio WBC (/uL) Interval % of D+ % of D- Interval LR
>100,000 29% 1% 29.0 >50, ,000 33% 7% 4.7 >25,000-50,000 15% 19% 0.8 0 - 25,000 23% 73% 0.3 More appropriate LR?

15 LR = Slope of ROC Curve > 25k > 50k 15% Slope = 15%/19% =0.8 19%

16 Clinical Scenario Synovial WBC = 48,000/uL
Pre-test prob: 0.38 Pre-test odds: 0.38/0.62 = 0.61 LR(WBC btw 25,000 and 50,000) = 0.8 Post-Test Odds = Pre-Test Odds x LR(48) = 0.61 x 0.8 = 0.49 Post-Test prob = 0.49/(0.49+1) = 0.33

17 Doing it right makes a difference
From JAMA paper: “Her synovial WBC count of 48,000/µL increases the probability from 38% to 64%.” (Used LR = 2.9) Alternative calculation: Her synovial WBC count of 48,000/µL decreases the probability from 38% to 33%.” (Used LR = 0.8) Fixed - -you had both LR =2.9

18 Does This Dyspneic Patient in the Emergency Department Have Congestive Heart Failure? JAMA. 2005;294: How to interpret serum BNP (B-type Natriuretic Peptide) results? “In this case, a BNP level could be very helpful. If it were less than 100 pg/mL, heart failure would be extremely unlikely (LR 0.09). If it were elevated, the probability of heart failure is higher but not diagnostic.”

19 Summary of Operating Characteristics of Serum BNP in Emergency Department Patients
Wang, C. S. et al. JAMA 2005;294: Copyright restrictions may apply.

20 When NOT to use LR

21 Background Black children (at least girls) appear to be at lower risk of UTI (RR ~0.3) Circumcised boys are at much lower risk than uncircumcised boys (RR ~0.1) In diagnosing UTI, it makes sense to use both history findings like these with physical examination (height of fever, etc.) and laboratory (urine white cells) But there is a very important difference!

22 Does This Child Have a UTI?
JAMA. 2007;298(24):

23 Does This Child Have a UTI?
JAMA. 2007;298(24):

24 What is wrong with using LRs for these risk factors?
LR will vary tremendously with the prevalence of the risk factor in each study!

25 Definitions Disease Risk factor or Test Result Yes No Total Present (+) a b a+b Absent (-) c d c+d a+c b+d N LR+= a/(a+c) b/(b+d) LR- = c/(a+c) d/(b+d) OR = ad/bc = LR+/LR-

26 Figure 8.9 Figure Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel A: Low prevalence of strong risk factor.

27 Figure 8.9 Figure Relationship between prior odds, LR+ and LR−, posterior odds and the OR. Panel B: High prevalence of strong risk factor.

28 OR vs LR

29 Except in blacks, urinalysis and urine culture recommended for:
Additional problem: failing to quantify risks and benefits of tests and treatments, leading overly aggressive testing recommendations Except in blacks, urinalysis and urine culture recommended for: Girls and uncircumcised boys 3-24 months with any fever of any duration even if they look well and have an apparent source Circumcised boys with any fever > 24 hours even if they look well and have an apparent source *Shaikh N et al. JAMA 2007;298: , figures 2 & 3

30 Critical Appraisal of Studies of Diagnostic Test Accuracy
Index Test = Test Being Evaluated Gold Standard = Test Used to Determine True Disease Status

31 Chapter 5 – Studies of Diagnostic Tests
Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up) Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down) Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)* Spectrum Bias D+ sickest of the sick (Sensitivity Up) D- wellest of the well (Specificity Up) *If cases resolve spontaneously.

32 Bias #2 Example: Visual assessment of jaundice in newborns
Study patients who are getting a bilirubin measurement Ask clinicians to estimate extent of jaundice at time of blood draw Compare with blood test

33 Visual Assessment of jaundice*: Results
Sensitivity of jaundice below the nipple line for bilirubin ≥ 12 mg/dL = 97% Specificity = 19% What is the problem? Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD *Moyer et al., APAM 2000; 154:391

34 Bias #2: Verification Bias* -1
Inclusion criterion for study: gold standard test was done in this case, blood test for bilirubin Subjects with positive index tests are more likely to be get the gold standard and to be included in the study clinicians usually don’t order blood test for bilirubin if there is little or no jaundice How does this affect sensitivity and specificity? *AKA Work-up, Referral Bias, or Ascertainment Bias

35 Verification Bias TSB >12 TSB < 12 Jaundice below nipple a b
No jaundice below nipple c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___. *AKA Work-up, Referral Bias, or Ascertainment Bias

36 Double Gold Standard Bias
Two different “gold standards” One gold standard (usually an immediate, more invasive test, e.g., angiogram, surgery) is more likely to be applied in patients with positive index test Second gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test.

37 Double Gold Standard Bias
There are some patients in whom the two “gold standards” do not give the same answer Spontaneously resolving disease (positive with immediate invasive test, but not with follow-up) Newly occurring or newly detectable disease (positive with follow-up but not with immediate invasive test)

38 Effect of Double Gold Standard Bias: Spontaneously resolving disease
Test result will always agree with gold standard Both sensitivity and specificity increase Example: Joey has an intussusception that will resolve spontaneously. If his ultrasound scan is positive, he will get a contrast enema that will show (and cure) the intussusception (true positive) If his ultrasound scan is negative, his intussusception will resolve and we will think he never had one (true negative) Ultrasound scan can’t be wrong!

39 Does This Child Have Appendicitis? JAMA. 2007;298:438-451.
RLQ Pain: Sensitivity = 96% Specificity = 5% (1 – Specificity = 95%) Likelihood Ratio =1.0 RLQ pain was present in 96% of those with appendicitis and 95% of those without appendicitis. Copyright restrictions may apply.

40 Verification (Referral) Bias
Biases the accuracy of a finding when the presence of the finding makes the patient more likely to be studied. Specificity biased down (5%) . Sensitivity biased up (96%) .

41 No; it means only kids with RLQ pain get appendectomies.
Does the LR of 1 mean that, in children, RLQ pain is not indicative of appendicitis? Bundy, D. G. et al. JAMA 2007;298: Study Population: Children who underwent appendectomy No; it means only kids with RLQ pain get appendectomies. Copyright restrictions may apply.

42 Studies of Diagnostic Test Accuracy: Checklist
Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), p 68

43 A clinical decision rule to identify children at low risk for appendicitis* (Problem 5.6 in EBD)
Study design: prospective cohort study Subjects 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with abdominal pain 767 (19%) received surgical consultation for possible appendicitis 113 Excluded (chronic diseases, recent imaging) 53 missed 601 included in the study (425 in derivation set) *Kharbanda et al. Pediatrics 2005; 116(3):

44 A clinical decision rule to identify children at low risk for appendicitis
Predictor variables Standardized assessment by pediatric ED attending Focus on “Pain with percussion, hopping or cough” (complete data in N=381) Outcome variable: Pathologic diagnosis of appendicitis (or not) for those who received surgery (37%) Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3):

45 A clinical decision rule to identify children at low risk for appendicitis
Results: Pain with percussion, hopping or cough 78% sensitivity and 83% NPV seem low to me. Are they valid for me in deciding whom to image? Kharbanda et al. Pediatrics 116(3):

46 Checklist Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? Was the reference standard applied regardless of the diagnostic test result? Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2nd ed. (NY: Churchill Livingstone), p 68

47 In what direction would these biases affect results?
Sample not representative (population referred to pedi surgery)? Verification bias? Double-gold standard bias? Spectrum bias Sample NOT representative. Prevalence of Appy too high for decision about imaging Verification bias probably operating – lack of pain with hopping would make me LESS likely to seek surgical consultation. But this would bias sensitivity UP. DGSB COULD be a bias, if some cases of appendicitis spontaneously resolve, but this would bias sensitivity and specificity UP Spectrum bias probably operates for Specificity, not Sensitivity. Presumably the non-appy cases referred to pedi surgery looked more like appendicitis, therefore likely to have higher FP rate for pain with hopping than those note studied

48 For children presenting with abdominal pain to SFGH 6-M
Sensitivity probably valid (not falsely low) But whether all of the kids in the study tried to hop is not clear Specificity probably low PPV is too high NPV is too low Does not address surgical consultation decision


Download ppt "Common Errors by Teachers and Proponents of EBM"

Similar presentations


Ads by Google