Presentation on theme: "Systematic Reviews of Diagnostic Studies Guides for appraisal Acknowledgements: Paul Glasziou, Jon Deeks, Madhukar Pai, Patrick Bossuyt, and Matthias Egger."— Presentation transcript:
Systematic Reviews of Diagnostic Studies Guides for appraisal Acknowledgements: Paul Glasziou, Jon Deeks, Madhukar Pai, Patrick Bossuyt, and Matthias Egger.
Information Overload 20,000 biomedical periodicals (6M articles) 17,000 biomedical books annually 30,000 recognized diseases 15,000 therapeutic agents (250/yr) MEDLINE –4,000 journals surveyed –11,000,000 citations –1.27 million articles related to oncology –35,000 articles related to ear, nose, or throat surgery
What makes a Review “Systematic”? Based on a clearly formulated question Identifies relevant studies Appraises quality of studies Summarizes evidence by use of explicit methodology Comments based on evidence gathered
Origin of Clinical Questions Diagnosis : how to select and interpret diagnostic tests Prognosis : how to anticipate the patient’s likely course Therapy: how to select treatments that do more good than harm Prevention: how to screen and reduce the risk for disease
ROAD MAP FOR DIAGNOSTIC REVIEWS
Steps in a Systematic Review 1 Framing the Question (Q) 2 Identifying relevant publications (F) 3 Assessing Study quality (A) 4 Summarising Evidence and interpreting finding (S)
Unstructured Question Is cervicovaginal fetal fibronectin useful? –For what? –For whom? –What is meant by “useful”?
Structured Question Does a positive cervicovaginal fetal fibronectin test predict spontaneous preterm birth in asymptomatic women? Test (Intervention) OutcomePatient
Step 2 – Identifying relevant publications (F) Wide search of medical/scientific databases –Medline –Cochrane Reviews –Ovid Relevance to focused question PPICO –Population –Prior test –Intervention –Comparator –Outcome
Publication and reporting biases Health Technology Assessment, 2000; 4(10):1-115 All studies conducted All studies published Studies reviewed Grey literature Positive Results Bias Grey Literature Bias Time-Lag Bias Language and Country Bias Multiple Publication Bias Selective Citation Bias Database Indexing Bias Selective Outcome Reporting B.
Registered vs. Published Studies Ovarian Cancer chemotherapy: single v combined Simes, J. Clin Oncol, 86, p1529
No search filter: 39 studies retrieved Search filters for diagnostic studies
With search filter: 12 studies retrieved (27 missed) Lucas Bachmann
Documenting & storing
Assessment of Study Quality (A) Quality varies, therefore Standardized Assessment (?blind*) Group/Rank by quality Select a threshold, e.g. all prospective studies with blind reading of reference and index tests. * assessment of quality blind to study outcome
Quality Score: Mammals example In natural habitat (No = 0; Yes = 1) –Setting Whole animals (No = 0; Yes = 1) –Complete information Photographs (No = 0; Yes = 1) –Level of evidence
Exercise I: Study Quality
Assessing a Study of a Test (Jaeschke et al, JAMA, 1994, 271: ) Was an appropriate spectrum of patients included? –(Spectrum Bias) All patients subjected to a Gold Standard? –(Verification Bias) Was there an independent, "blind" comparison with a Gold Standard? –Observer Bias; Differential Reference Bias Methods described so you could repeat test?
Diagnostic Accuracy Study: Basic Design Series of patients Index test Reference standard Blinded cross-classification
Selected Patients Index test Reference standard Blinded cross-classification Spectrum Bias
Series of patients Index test Reference standard Blinded cross-classification Verification Bias
Series of patients Index test Blinded cross-classification Ref. Std A Ref. Std. B Differential Reference Bias
Series of patients Index test Reference standard Unblinded cross-classification Observer Bias
HF patients Index test Blinded cross-classification controls “Case-control” design
Empirical Effects of Bias Lijmer JG et al. JAMA 1999;282:
Step 4 – Summarising the Evidence (S) Extracting data from trials Combining data – Meta analysis Does it make sense to combine?
What is a meta-analysis? A way to calculate an average Estimates an ‘average’ or ‘common’ effect Improves the precision of an estimate by using all available data
What is a meta-analysis? Optional part of a systematic review Systematic reviews Meta-analyses
Summary ROC Meta-analytical Display
Threshold effects Increasing threshold increases specificity but decreases sensitivity Decreasing threshold increases sensitivity but decreases specificity
Accuracy effects Over-estimation of accuracy e.g. spectrum bias Under- estimation of accuracy e.g. poor reference standard
Spectrum effects Variation in the non- diseased study participants Variation in the diseased study participants
Methods of Meta-analysis Separate pooling of sensitivity and specificity (and likelihood ratios) –Inappropriate when highly heterogeneous –Underestimates if there is heterogeneity in threshold
Constant diagnostic odds ratio across thresholds sensitivityspecificityLR+LR-DOR=LR+/LR- 99%71% %86% % %97% %99%
Methods of Meta-analysis Creation of Summary ROC –DOR often reasonably consistent across studies –Deals with variation in threshold –Moses/Littenberg – allows for trends in DOR with threshold –Difficult to interpret a unique operating point –More advanced methods (HSROC, bivariate normal, random effects) estimate variability and uncertainty in values –Investigate why studies have different results
Variation in studies: Fetal fibronectin in asymptomatic women SROC (95%CI) predicting 37 weeks’ gestation (28 studies)
Does it make sense to combine? Do we need studies to be exactly the same? When can we say we are measuring the same thing?
Are the studies consistent? Are variations in results between studies consistent with chance? (Test of homogeneity: has low power) If NO, then WHY? –Variation in study methods (biases) –Variation in intervention –Variation in outcome measure (e.g. timing) –Variation in population
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Objective: –To determine the accuracy with which a cervicovaginal fetal fibronectin test predicts spontaneous preterm birth in women with or without symptoms of preterm labour. Q
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Q – Clearly focused question? F – Found all available evidence?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Electronic Search: Medline (19662000), Embase (19802000), PASCAL (19732001), BIOSIS (19692001), the Cochrane Library (2000:4), MEDION (19742000), National Research Register (2000:4), SCISEARCH (19742001), and conference papers (19732000). Grey literature: Contacted individual experts and manufacturer of fetal fibronectin test. Cross-checking: Checked reference lists of known reviews and primary articles toidentify cited articles not captured by electronic searches. F
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Q – Clearly focused question? F – Found all available evidence? A – Studies are critically appraised?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth A
Q – Clearly focused question? F – Found all available evidence? A – Studies are critically appraised? S – Results are adequately synthesised?
Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Subgroups –Asymptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation –Symptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation, and within 710 days of testing Quantitative summary : –Used SROC curves as measures of accuracy for all included studies regardless of their thresholds. –Provided summary likelihood ratios (positive and negative) Heterogeneity –Assessed heterogeneity of diagnostic odds ratios graphically and statistically –Meta-regression to explored sources of heterogeneity –Sensitivity - estimated accuracy of the highest quality studies S
Exercise II Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS. Accuracy of cervicovaginal fetal fibronectin test in predicting spontaneous preterm birth: systematic review. BMJ 2002;325: 301-4
Final points To Assess Systematic Reviews of Diagnostic Studies use - QFAS Q –The question should be a structured one PPICO F –Finding studies of diagnostic tools is generally more difficult than therapies.
Final points A –Spectrum, verification, differential reference and observer bias to be taken into account S –Summaries affected by choices of: Threshold, Population, and Reference test –Methods not as well researched as for Therapies –Heterogeneity analysis particularly important in these reviews
Guidelines for Conducting SRinDS For diagnostic reviews: –Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods. Cochrane Collaboration, –Deville WL, Buntinx F, Bouter LM, Montori VM, De Vet HC, Van Der Windt DA, Bezemer P. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2(1): 9. –Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Smith GD, Altman DG, eds. Systematic reviews in health care. Meta-analysis in context. London: BMJ Publishing Group, 2001: 248–282. –Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995; 48: 119–30. –The Bayes Library of Diagnostic Studies and Reviews. 2 nd Edition, For diagnostic studies reporting: –Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC; Standards for Reporting of Diagnostic Accuracy steering group. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326: 41–44.
Databases/sources of studies Electronic databases: General: Cochrane CENTRAL, PubMed, Embase, etc. Subject-specific: AIDSLINE, CANCERLIT, PsycInfo, MEDION, etc. Reference lists of included studies (citation tracking) References lists of earlier reviews, commentaries –CDSR, DARE, MEDION, PubMed search with filters for systematic reviews Personal communication with experts and authors Contacting drug/device companies Hand-searching of key, high-yield journals Grey literature –Dissertation abstracts, reports, conference proceedings, etc. Sources of ongoing studies –Trial registers, drug companies, contacting experts
Quality assessment Criteria for validity of diagnostic studies: –Study design Cross-sectional study of a clinically indicated population or case-control –Verification Complete, different reference tests, or partial –Blinding Blinded or not –Patient selection Consecutive or random or nonconsecutive –Data collection Prospective or retrospective –Appropriateness of reference standard –Description of test –Description of study population Lijmer et al. Empiric evidence of design-related bias in studies of diagnostic studies. JAMA 1999;282:1061
Present absolute numbers for test results Distribution of plasma concentrations of B type natriuretic peptide in normal elderly people and in those with left ventricular systolic dysfunction confirmed by echocardiography BMJ, 2000; 320: