Presentation on theme: "Systematic Reviews of Diagnostic Studies"— Presentation transcript:
1 Systematic Reviews of Diagnostic Studies Guides for appraisalAcknowledgements: Paul Glasziou, Jon Deeks, Madhukar Pai, Patrick Bossuyt, and Matthias Egger.
2 Information Overload 20,000 biomedical periodicals (6M articles) 17,000 biomedical books annually30,000 recognized diseases15,000 therapeutic agents (250/yr)MEDLINE4,000 journals surveyed11,000,000 citations1.27 million articles related to oncology35,000 articles related to ear, nose, or throat surgery
3 What makes a Review “Systematic”? Based on a clearly formulated questionIdentifies relevant studiesAppraises quality of studiesSummarizes evidence by use of explicit methodologyComments based on evidence gathered
4 Origin of Clinical Questions Diagnosis: how to select and interpret diagnostic testsPrognosis: how to anticipate the patient’s likely courseTherapy: how to select treatments that do more good than harmPrevention: how to screen and reduce the risk for disease
6 Steps in a Systematic Review Framing the Question (Q)Identifying relevant publications (F)Assessing Study quality (A)Summarising Evidence and interpreting finding (S)
7 Step 1- Framing the Question (Q) Clear, unambiguous, structured questionQuestions formulated re: PPICOPopulations of interestPrior test(s) (if appropriate)InterventionComparisons (if appropriate)OutcomesChange to PPICO
8 Unstructured Question Is cervicovaginal fetal fibronectin useful?For what?For whom?What is meant by “useful”?
9 Structured Question Test (Intervention) Does a positive cervicovaginal fetal fibronectin test predictspontaneous preterm birth in asymptomatic women?Sometimes you have a comparison test as wellOutcomePatient
10 Step 2 – Identifying relevant publications (F) Wide search of medical/scientific databasesMedlineCochrane ReviewsOvidRelevance to focused question PPICOPopulationPrior testInterventionComparatorOutcome
19 Assessment of Study Quality (A) Quality varies, thereforeStandardized Assessment (?blind*)Group/Rank by qualitySelect a threshold, e.g. all prospective studies with blind reading of reference and index tests.* assessment of quality blind to study outcome
20 Quality Score: Mammals example In natural habitat (No = 0; Yes = 1)SettingWhole animals (No = 0; Yes = 1)Complete informationPhotographs (No = 0; Yes = 1)Level of evidence
23 Assessing a Study of a Test (Jaeschke et al, JAMA, 1994, 271: 389-91) Was an appropriate spectrum of patients included?(Spectrum Bias)All patients subjected to a Gold Standard?(Verification Bias)Was there an independent, "blind" comparison with a Gold Standard?Observer Bias; Differential Reference BiasMethods described so you could repeat test?
24 Diagnostic Accuracy Study: Basic Design Series of patientsIndex testReference standardBlinded cross-classification
30 Empirical Effects of Bias Lijmer JG et al. JAMA 1999;282:
31 Step 4 – Summarising the Evidence (S) Extracting data from trialsCombining data – Meta analysisDoes it make sense to combine?
32 What is a meta-analysis? A way to calculate an averageEstimates an ‘average’ or ‘common’ effectImproves the precision of an estimate by using all available data
33 What is a meta-analysis? Optional part of a systematic reviewSystematic reviewsMeta analysis may be part of a systematic review. May be worth asking participants for egs of when it’s not appropriate to combine studies in meta-analysis.Systematic reviews may included meta-analyses but meta-analysis may be done with out systematically reviewing the studies – there are egs of this in journals – these therefore may be biasedIn the US the terms are used interchangably, but not the case in the UKMeta-analyses
35 Threshold effectsDecreasing threshold increases sensitivity but decreases specificityIncreasing threshold increases specificity but decreases sensitivity
36 Accuracy effects Over-estimation of accuracy e.g. spectrum bias Under-estimation of accuracy e.g. poor reference standard
37 Spectrum effects Variation in the diseased study participants Variation in the non-diseased study participants
38 Methods of Meta-analysis Separate pooling of sensitivity and specificity (and likelihood ratios)Inappropriate when highly heterogeneousUnderestimates if there is heterogeneity in threshold
39 Constant diagnostic odds ratio across thresholds sensitivityspecificityLR+LR-DOR=LR+/LR-99%71%3.440.0123197%86%6.950.0394%15.190.0733.210.1467.090.29
40 Methods of Meta-analysis Creation of Summary ROCDOR often reasonably consistent across studiesDeals with variation in thresholdMoses/Littenberg – allows for trends in DOR with thresholdDifficult to interpret a unique operating pointMore advanced methods (HSROC, bivariate normal, random effects) estimate variability and uncertainty in valuesInvestigate why studies have different results
41 Variation in studies: Fetal fibronectin in asymptomatic women SROC (95%CI) predicting 37 weeks’ gestation (28 studies)
42 Does it make sense to combine? Do we need studies to be exactly the same?When can we say we are measuring the same thing?
43 Are the studies consistent? Are variations in results between studies consistent with chance?(Test of homogeneity: has low power)If NO, then WHY?Variation in study methods (biases)Variation in interventionVariation in outcome measure (e.g. timing)Variation in population
46 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Objective:To determine the accuracy with which a cervicovaginal fetal fibronectin test predicts spontaneous preterm birth in women with or without symptoms of preterm labour.Q
47 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Q – Clearly focused question?F – Found all available evidence?
48 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Electronic Search: Medline (19662000), Embase (19802000), PASCAL (19732001), BIOSIS (19692001), the Cochrane Library (2000:4), MEDION (19742000), National Research Register (2000:4), SCISEARCH (19742001), and conference papers (19732000).Grey literature: Contacted individual experts and manufacturer of fetal fibronectin test.Cross-checking: Checked reference lists of known reviews and primary articles toidentify cited articles not captured by electronic searches.MEDION (19742000) (a database of diagnostic test reviews set up by Dutch and Belgian researchers)
49 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Q – Clearly focused question?F – Found all available evidence?A – Studies are critically appraised?
50 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Bias can be associated with casecontrol study designs, lack of blinding of carer to test results, nonconsecutive patient enrolment, nonprospective data collection, inadequate test description, use of different reference tests, partial verification, and lack of description of either the population or the reference test.28 The last four items, however, are not relevant to our review because they refer to delivery of neonates (preterm or term births). Therefore, we considered a study to be of good quality if it used a prospective design, consecutive enrolment, adequate test description (to allow replication by others), and blinding of the test result from clinicians managing the patients
51 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth Q – Clearly focused question?F – Found all available evidence?A – Studies are critically appraised?S – Results are adequately synthesised?
52 Exercise II: Fetal fibronectin for predicting spontaneous preterm birth SubgroupsAsymptomatic women spontaneous preterm birth before 34 and 37 weeks' gestationSymptomatic women spontaneous preterm birth before 34 and 37 weeks' gestation, and within 710 days of testingQuantitative summary:Used SROC curves as measures of accuracy for all included studies regardless of their thresholds.Provided summary likelihood ratios (positive and negative)HeterogeneityAssessed heterogeneity of diagnostic odds ratios graphically and statisticallyMeta-regression to explored sources of heterogeneitySensitivity - estimated accuracy of the highest quality studies
53 Exercise II Honest H, Bachmann LM, Gupta JK, Kleijnen J, Khan KS. Accuracy of cervicovaginal fetal fibronectin test in predicting spontaneous preterm birth: systematic review.BMJ 2002;325: 301-4
54 Final pointsTo Assess Systematic Reviews of Diagnostic Studies use - QFASQThe question should be a structured one PPICOFFinding studies of diagnostic tools is generally more difficult than therapies.
55 Final pointsASpectrum, verification, differential reference and observer bias to be taken into accountSSummaries affected by choices of:Threshold, Population, and Reference testMethods not as well researched as for TherapiesHeterogeneity analysis particularly important in these reviews
58 Guidelines for Conducting SRinDS For diagnostic reviews:Cochrane Methods Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods. Cochrane Collaboration, 1996.Deville WL, Buntinx F, Bouter LM, Montori VM, De Vet HC, Van Der Windt DA, Bezemer P. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2(1): 9.Deeks JJ. Systematic reviews of evaluations of diagnostic and screening tests. In: Egger M, Smith GD, Altman DG, eds. Systematic reviews in health care. Meta-analysis in context. London: BMJ Publishing Group, 2001: 248–282.Irwig L, Macaskill P, Glasziou P, Fahey M. Meta-analytic methods for diagnostic test accuracy. J Clin Epidemiol 1995; 48: 119–30.The Bayes Library of Diagnostic Studies and Reviews. 2nd Edition, 2002.For diagnostic studies reporting:Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC; Standards for Reporting of Diagnostic Accuracy steering group. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326: 41–44.
59 Databases/sources of studies Electronic databases:General: Cochrane CENTRAL, PubMed, Embase, etc.Subject-specific: AIDSLINE, CANCERLIT, PsycInfo, MEDION, etc.Reference lists of included studies (citation tracking)References lists of earlier reviews, commentariesCDSR, DARE, MEDION, PubMed search with filters for systematic reviewsPersonal communication with experts and authorsContacting drug/device companiesHand-searching of key, high-yield journalsGrey literatureDissertation abstracts, reports, conference proceedings, etc.Sources of ongoing studiesTrial registers, drug companies, contacting experts
60 Quality assessment Criteria for validity of diagnostic studies: Study designCross-sectional study of a clinically indicated population or case-controlVerificationComplete, different reference tests, or partialBlindingBlinded or notPatient selectionConsecutive or random or nonconsecutiveData collectionProspective or retrospectiveAppropriateness of reference standardDescription of testDescription of study populationLijmer et al. Empiric evidence of design-related bias in studies of diagnostic studies. JAMA 1999;282:1061
61 Present absolute numbers for test results Distribution of plasma concentrations of B type natriuretic peptide in normal elderly people and in those with left ventricular systolic dysfunction confirmed by echocardiographyBMJ, 2000; 320: