Presentation is loading. Please wait.

Presentation is loading. Please wait.

Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007.

Similar presentations


Presentation on theme: "Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007."— Presentation transcript:

1 Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007

2 Overview n Common biases of studies of diagnostic test accuracy –Incorporation bias –Verification bias –Double gold standard bias –Spectrum bias n Prevalence, spectrum and nonindependence n Meta-analysis of diagnostic tests n Checklist & Systematic approach n Examples: –Visual assessment of jaundice –Physical examination for presentation

3 Example: BNP study (Chapter 4, Problem 3)

4 Studies of Diagnostic Tests Incorporation Bias n Gold standard: determination of Congestive Heart Failure (CHF) by two cardiologists n Blinded to BNP but not to Chest X-ray n Chest X-ray found to be highly predictive of CHF n Incorporation bias for assessment of Chest X- ray, not BNP *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

5 Verification Bias* n Inclusion criterion: gold standard was applied n Subjects with positive index tests are more likely to be referred for the gold standard. n Example: V/Q Scan as a test for pulmonary embolism (PE; blood clot in lungs). –Gold standard is a pulmonary arteriogram –Retrospective study of patients receiving arteriograms to rule out PE –Patients with negative V/Q scans less likely to be referred for PA-gram *AKA Work-up, Referral Bias, or Ascertainment Bias

6 Verification Bias PA-gram+PA-gram- V/Q Scan +ab V/Q Scan - c  d  Sensitivity (a/(a+c)) is biased UP. Specificity (d/(b+d)) is biased DOWN.

7 Double Gold Standard Bias n One gold standard (e.g. biopsy) is applied in patients with positive index test, another gold standard (e.g., clinical follow-up) is applied in patients with a negative index test.

8 Studies of Diagnostic Tests Double Gold Standard n Test: V/Q Scan n Disease: PE n Gold Standard: PA-gram in patients who had one, clinical follow-up in patients who didn’t n Study Population: All patients presenting to the ED who received a V/Q scan. n Assume some patients did not get PA-gram because of normal/low probability V/Q scans but would have had positive PA-grams. Instead they had negative clinical follow-up and were counted as true negatives. If they had had PA-grams, they would have been counted as false negatives *PIOPED. JAMA 1990;263(20):2753-9.

9 Studies of Diagnostic Tests Double Gold Standard PA-Gram + PA-Gram - V/Q Scan +ab V/Q Scan -cd Sensitivity (a/(a+c)) biased UP Specificity (d/(b+d)) biased UP

10 Double Gold Standard Bias: Ultrasound diagnosis of intussusception

11

12 Spectrum of Disease, Nondisease and Test Results n Disease is often easier to diagnose if severe n “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases n Test results will be more reproducible if ambiguous results excluded

13 Spectrum Bias n Sensitivity depends on the spectrum of disease in the population being tested. n Specificity depends on the spectrum of non-disease in the population being tested. n Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

14 Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21

15 n D+ group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) n Among these fetuses, sensitivity 32% (not 69%) n What decision is this test supposed to help with? –If it is whether to do CVS or amnio, these 295 fetuses should be included! Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality

16 Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21 Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group

17 Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? n Then instead of excluding subjects with other chromosomal abnormalities or including them as D+, we should count them as D- n What would happen to sensitivity? n What would happen to specificity?

18 Prevalence, spectrum and nonindependence n Prevalence (prior probability) of disease may be related to disease severity n One mechanism is different spectra of disease or nondisease n Another is that whatever is causing the high prior is related to the same aspect of the disease as the test

19 Prevalence, spectrum and nonindependence n Examples –Iron deficiency –Diseases identified by screening –UA for UTI

20 Meta-analyses of Diagnostic Tests n Systematic and reproducible approach to finding studies n Summary of results of each study n Investigation into heterogeneity n Summary estimate of results, if appropriate

21 MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

22 Studies of Diagnostic Test Accuracy: Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), 2000. p 68

23 Systematic Approach n Authors and funding source n Research question –Relevance? –What decision is the test supposed to help you make? n Study design –Timing of measurements of predictor and outcome –Cross-sectional vs “case-control sampling

24 Systematic Approach, cont’d n Study subjects –Disease subjects representative? –Nondiseased subjects representative? –If not, in what direction will results be affected? n Predictor variable –How was the test done? –Is it difficult? –Will it be done as well in your setting?

25 Systematic Approach, cont’d n Outcome variable –Is the “Gold Standard” really gold? –Were those measuring it blinded to results of the index test? n Results& Analysis –Were all subjects analyzed –If predictive value was reported, is prevalence similar to your population –Would clinical implications change depending on location of true result within confidence intervals? n Conclusions –Do they go beyond data? –Do they apply to patients in your setting?

26 Should every newborn have a bilirubin test before discharge? n About 60% of newborns develop some jaundice n Usually it is harmless n Current practice: Check bilirubin level if jaundice appears significant n Proposal: check it on all newborns

27 Kernicterus Public Information Campaign Draft Posters

28 Advancement of Dermal Icterus in the Jaundiced Newborn Kramer LI, AJDC 1969;118:454

29 Accuracy of Clinical Judgment in Neonatal Jaundice* n RQ: How well can clinicians estimate bilirubin levels in jaundiced newborns? n Study Design: cross-sectional study n Subjects: 122 healthy term newborns (mean age 2 days) whose total serum bilirubin (TSB) was measured in the course of standard newborn care *Moyer et al., Archives Peds Adol Med 2000; 154:391

30 Accuracy of Clinical Judgment in Neonatal Jaundice* n Measurements: –Jaundice assessed by attendings, nurse practitioners and pediatric residents (absent/slight/obvious) at each body part and Total Serum Bilirubin (TSB) estimated –TSB levels measured in clinical laboratory n Analysis –Agreement for jaundice at each body part by Weighted Kappa –Sensitivity and specificity for TSB ≥ 12 mg/dL *Moyer et al., Archives Peds Adol Med 2000; 154:391

31 Results: 1. Moyer et al., APAM 2000; 154:391

32 Results: 2 Moyer et al., APAM 2000; 154:391 n Sensitivity of jaundice below the nipple line for TSB ≥ 12 mg/dL = 97% n Specificity = 19% Editor’s Note: The take-home message for me is that no jaundice below the nipple line equals no bilirubin test, unless there’s some other indication. --Catherine D. DeAngelis, MD

33 Issues: 1 n No information on the numbers of different types of examiners or their years of experience –Generalizability uncertain n No CI around sensitivity and specificity –Sensitivity based upon 67/69 –95% CI: 90% to 99.6%

34 Issues: 2 n Verification bias (Type 1) –Infants NOT jaundiced below the nipples not likely to have a TSB measured –Sensitivity too high, specificity too low –What effect on NPV?

35 Issues for universal screening n How often would the bilirubin test alter management? n How often would this affect outcomes? –None of the bilirubin levels in the study was dangerously high n What other effects might universal bilirubin screening have?

36 CDC Posters

37 E-mail from a parent -1 To: Subject: my hyperbili son Date: Thu, 11 Aug 2005 Dear Dr Newman, I would like your input as to the prognosis with my son. He had a neonatal jaundice that was horribly mismanaged and I am now a hysterical mom.... My son was born [Wednesday] 4/13/2005 at 10am...On Sat night we had him tested, at 8pm TBR was 24, Coombs test positive. He was admitted under double lights and his TBR was 16 on Sun morn...

38 E-mail from a parent -2 He was breast fed throughout and had a strong suck. He is now 4 months old and milestones seem within developmental norms. Hearing seems ok. I am sleepless, hysterical and depressed. How concerned for the future do I have to be? Please could you get back to me asap. Thanking you, Tracey P

39 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n RQ: (above) –important to know presentation before onset of labor to know whether to try external version n Study design: Cross sectional study n Subjects: –1633 women with singleton pregnancies at 35-37 weeks at antenatal clinics at a Women’s and Babies Hospital in Australia –96% of those eligible for the study consented *BMJ 2006;333:578-80

40 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Predictor variable –Clinical examination by one of more than 60 clinicians residents or registrars 55% midwives 28% obstetricians 17% –Results classified as cephalic or noncephalic n Outcome variable: presentation by ultrasound, blinded to clinical examination *BMJ 2006;333:578-80

41 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Results n No significant differences in accuracy by experience level n Conclusions: clinical examination is not sensitive enough *BMJ 2006;333:578-80


Download ppt "Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 11, 2007."

Similar presentations


Ads by Google