Presentation is loading. Please wait.

Presentation is loading. Please wait.

Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008.

Similar presentations


Presentation on theme: "Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008."— Presentation transcript:

1 Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008

2 Reminders/Announcements n Corrected page proofs of all of EBD are now on the web –Tell us if you find additional mistakes, ASAP –Index is a mess; if you look for things there and do not find them, let us know n Final exam to be passed out 12/4, reviewed 12/11 –Send questions!

3 Overview n Common biases of studies of diagnostic test accuracy –Incorporation bias –Verification bias –Double gold standard bias –Spectrum bias n Prevalence, spectrum and nonindependence n Meta-analysis of diagnostic tests n Checklist & systematic approach n Examples: –Physical examination for presentation –Pain with percussion, hopping or cough for appendicitis

4 Incorporation bias n Recall study of BNP to diagnose congestive heart failure (CHF, Chapter 4, Problem 3)

5 Incorporation Bias n Gold standard: determination of CHF by two cardiologists blinded to BNP n Chest X-ray found to be highly predictive of CHF, but cardiologists not blinded to Chest X-ray n Incorporation bias for assessment of Chest X-ray, not BNP *Maisel AS, Krishnaswamy P, Nowak RM, McCord J, Hollander JE, Duc P, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347(3):161-7.

6 Verification Bias* n Inclusion criterion: gold standard was applied n Subjects with positive index tests are more likely to be referred for the gold standard n Example: V/Q Scan as a test for pulmonary embolism (PE; blood clot in lungs) –Gold standard is a pulmonary arteriogram –Retrospective study of patients receiving arteriograms to rule out PE –Patients with negative V/Q scans less likely to be referred for PA-gram n Many additional examples –E.g., visual assessment of jaundice mentioned in DCR *AKA Work-up, Referral Bias, or Ascertainment Bias

7 Verification Bias PA-gram+PA-gram- V/Q Scan +ab V/Q Scan - c  d  Sensitivity, a/(a+c), is biased ___. Specificity, d/(b+d), is biased ___.

8 Double Gold Standard Bias n Two different “gold standards” –One gold standard (e.g., surgery, invasive test) is more likely to be applied in patients with positive index test, –Other gold standard (e.g., clinical follow-up) is more likely to be applied in patients with a negative index test. n There are some patients in whom the tests do not give the same answer –spontaneously resolving disease –newly occurring disease

9 Double Gold Standard Bias, example n Study Population: All patients presenting to the ED who received a V/Q scan n Test: V/Q Scan n Disease: Pulmonary embolism (PE) n Gold Standards: –1. Pulmonary arteriogram (PA-gram) if done (more likely with more abnormal V/Q scan) –2. Clinical follow-up in other patients (more likely with normal VQ scan n What happens if some PEs resolve spontaneously? *PIOPED. JAMA 1990;263(20):2753-9.

10 Double Gold Standard Bias: effect of spontaneously resolving cases PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with PA-Gram for all Double gold standard compared with follow-up for all

11 Double Gold Standard Bias: effect of newly occurring cases PE +PE - V/Q Scan +ab V/Q Scan -cd Sensitivity, a/(a+c) biased __ Specificity, d/(b+d) biased __ Double gold standard compared with PA-Gram for all Double gold standard compared with follow-up for all

12 Double Gold Standard Bias: Ultrasound diagnosis of intussusception

13 What if 10% resolve spontaneously?

14 Spectrum of Disease, Nondisease and Test Results n Disease is often easier to diagnose if severe n “Nondisease” is easier to diagnose if patient is well than if the patient has other diseases n Test results will be more reproducible if ambiguous results excluded

15 Spectrum Bias n Sensitivity depends on the spectrum of disease in the population being tested. n Specificity depends on the spectrum of non-disease in the population being tested. n Example: Absence of Nasal Bone (on 13-week ultrasound) as a Test for Chromosomal Abnormality

16 Spectrum Bias Example: Absence of Nasal Bone as a Test for Chromosomal Abnormality* Sensitivity = 229/333 = 69% BUT the D+ group only included fetuses with Trisomy 21 Cicero et al., Ultrasound Obstet Gynecol 2004; 23: 218-23

17 n D+ group excluded 295 fetuses with other chromosomal abnormalities (esp. Trisomy 18) n Among these fetuses, sensitivity 32% (not 69%) n What decision is this test supposed to help with? –If it is whether to test chromosomes using chorionic villus sampling or amniocentesis, these 295 fetuses should be included! Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality

18 Sensitivity = 324/628 = 52% NOT 69% obtained when the D+ group only included fetuses with Trisomy 21 Spectrum Bias: Absence of Nasal Bone as a Test for Chromosomal Abnormality, effect of including other trisomies in D+ group

19 Quiz: What if we considered the nasal bone absence as a test for Trisomy 21? n Then instead of excluding subjects with other chromosomal abnormalities or including them as D+, we should count them as D-. Compared with excluding them, n What would happen to sensitivity? n What would happen to specificity?

20 Prevalence, spectrum and nonindependence n Prevalence (prior probability) of disease may be related to disease severity n One mechanism is different spectra of disease or nondisease n Another is that whatever is causing the high prior probability is related to the same aspect of the disease as the test

21 Prevalence, spectrum and nonindependence n Examples –Iron deficiency –Diseases identified by screening n Urinalysis as a test for UTI in women with more and fewer symptoms (high and low prior probability)

22 Meta-analyses of Diagnostic Tests n Systematic and reproducible approach to finding studies n Summary of results of each study n Investigation into heterogeneity n Summary estimate of results, if appropriate n Unlike other meta-analyses (risk factors, treatments), results aren’t summarized with a single number (e.g., RR), but with two related numbers (sensitivity and specificity) n These can be plotted on an ROC plane

23 MRI for the diagnosis of MS Whiting et al. BMJ 2006;332:875-84

24 Studies of Diagnostic Test Accuracy: Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), 2000. p 68

25 Systematic Approach n Authors and funding source n Research question –Relevance? –What decision is the test supposed to help you make? n Study design –Timing of measurements of predictor and outcome –Cross-sectional vs “case-control sampling

26 Systematic Approach, cont’d n Study subjects –Disease subjects representative? –Nondiseased subjects representative? –If not, in what direction will results be affected? n Predictor variable –How was the test done? –Is it difficult? –Will it be done as well in your setting?

27 Systematic Approach, cont’d n Outcome variable –Is the “Gold Standard” really gold? –Were those measuring it blinded to results of the index test? n Results& Analysis –Were all subjects analyzed –If predictive value was reported, is prevalence similar to your population –Would clinical implications change depending on location of true result within confidence intervals? n Conclusions –Do they go beyond data? –Do they apply to patients in your setting?

28 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n RQ: (above) –important to know presentation before onset of labor to know whether to try external version n Study design: Cross sectional study n Subjects: –1633 women with singleton pregnancies at 35-37 weeks at antenatal clinics at a Women’s and Babies Hospital in Australia –96% of those eligible for the study consented *BMJ 2006;333:578-80

29 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Predictor variable –Clinical examination by one of more than 60 clinicians residents or registrars 55% midwives 28% obstetricians 17% –Results classified as cephalic or noncephalic n Outcome variable: presentation by ultrasound, blinded to clinical examination *BMJ 2006;333:578-80

30 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy* n Results n No significant differences in accuracy by experience level n Conclusions: clinical examination is not sensitive enough *BMJ 2006;333:578-80

31 Diagnostic Accuracy of Clinical Examination for Detection of Non- cephalic Presentation in Late Pregnancy: Issues: Issues* n RQ n Subjects n Predictor n Outcome n Results n Conclusions – what decision was the test supposed to help with? *BMJ 2006;333:578-80

32 A clinical decision rule to identify children at low risk for appendicitis n Study design: prospective cohort study n Subjects –Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain –767 (19%) received surgical consultation for possible appendicitis –113 Excluded (Chronic diseases, recent imaging) –53 missed –601 included in the study (425 in derivation set) Kharbanda et al. Pediatrics 116(3): 709-16

33 A clinical decision rule to identify children at low risk for appendicitis n Predictor variable –Standardized assessment by PEM attending –For today, focus on “Pain with percussion, hopping or cough” (complete data in N=381) n Outcome variable: –Pathologic diagnosis of appendicitis for those who received surgery (37%) –Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3): 709-16

34 A clinical decision rule to identify children at low risk for appendicitis n Results: Pain with percussion, hopping or cough n 78% sensitivity seems low to me. Is it valid for me in deciding whom to image? Kharbanda et al. Pediatrics 116(3): 709-16

35 Checklist n Was there an independent, blind comparison with a reference (“gold”) standard of diagnosis? n Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)? n Was the reference standard applied regardless of the diagnostic test result? n Was the test (or cluster of tests) validated in a second, independent group of patients? From Sackett et al., Evidence-based Medicine,2 nd ed. (NY: Churchill Livingstone), 2000. p 68

36 Systematic approach n Study design: prospective cohort study n Subjects –Of 4140 patients 3-18 years presenting to Boston Children’s Hospital ED with CC abdominal pain –767 (19%) received surgical consultation for possible appendicitis Kharbanda et al. Pediatrics 116(3): 709-16

37 A clinical decision rule to identify children at low risk for appendicitis n Predictor variable –“Pain with percussion, hopping or cough” (complete data in N=381) n Outcome variable: –Pathologic diagnosis of appendicitis for those who received surgery (37%) –Follow-up telephone call to family or pediatrician 2-4 weeks after the ED visit for those who did not receive surgery (63%) Kharbanda et al. Pediatrics 116(3): 709-16

38 Issues n Sample representative? n Verification bias? n Double-gold standard bias? n Spectrum bias

39 For children presenting with abdominal pain to SFGH 6-M n Sensitivity probably valid (not falsely low) –But whether all of them tried to hop is not clear n Specificity probably low n PPV is high n NPV is low


Download ppt "Studies of Diagnostic Tests Thomas B. Newman, MD, MPH October 16, 2008."

Similar presentations


Ads by Google