Presentation on theme: "Addressing differences in rigour and relevance of evidence – a review of existing methods Rebecca Turner, David Spiegelhalter, Simon Thompson MRC Biostatistics."— Presentation transcript:
Addressing differences in rigour and relevance of evidence – a review of existing methods Rebecca Turner, David Spiegelhalter, Simon Thompson MRC Biostatistics Unit, Cambridge
Outline Why address rigour and relevance? Review of methods for addressing rigour and relevance Bias modelling and using external information on sources of bias Ongoing work and issues for discussion
Differences in rigour (internal bias) Examples of internal bias Inadequacy of randomisation/allocation concealment in RCTs Non-compliance and drop-out in RCTs Selection bias and non-response bias in observational studies Confounding in observational studies Misclassification of exposure in case-control studies Evidence synthesis: usual approach Choose a minimum standard Include studies which achieve standard and make no allowance for further differences between these Exclude studies failing to reach standard
Problems with usual approach to rigour Relevant evidence, in some cases the majority of available studies, is discarded undesirable effects on precision (and bias?) No allowance for differences in rigour between studies included in combined analysis once minimum criteria achieved, less rigorous studies given equal influence in analysis policy decisions may be based on misleading results?
Differences in relevance (external bias) Examples of external bias Study population different from target population Outcome similar but not identical to target outcome Interventions different from target interventions e.g. dose Evidence synthesis: usual approach Similar to that for rigour: studies which achieve minimum standard included and no allowance for further differences Sometimes separate analyses carried out for different types of population/intervention Degrees of relevance are specific to target setting, so decisions on relevance are necessarily rather subjective
Example 1: donepezil for treatment of dementia due to Alzheimer’s disease Analysis reported by Birks and Harvey (2003) included 17 double-blind placebo-controlled RCTs only. 40 relevant comparative studies identified: 17 double-blind placebo-controlled RCTs 1 single-blind placebo-controlled RCT 14 non-randomised and/or open-label studies 8 donepezil vs. active comparisons, 1 randomised
Example 2: modified donepezil example 25 relevant comparative studies identified: 1 double-blind placebo-controlled RCT 2 single-blind placebo-controlled RCTs 14 non-randomised and/or open-label studies 8 donepezil vs. active comparisons, 1 randomised
Example 2: modified donepezil example 25 relevant comparative studies identified: 1 double-blind placebo-controlled RCT 2 single-blind placebo-controlled RCTs 14 non-randomised and/or open-label studies 8 donepezil vs. active comparisons, 1 randomised –Include only randomised studies? Allow for degree of blinding? –Include all studies? Allow for degree of blinding and randomisation? –Allow for additional sources of bias, and deviations from target population, outcome, details of intervention?
Methods for addressing differences in rigour and relevance Existing approaches: Methods based on quality scores Random effects modelling of bias Full bias modelling using external information on specific sources of bias
Methods based on quality scores Exclude studies below a quality score threshold Weight the analysis by quality Examine relationship between effect size and quality score Cumulative meta-analysis according to quality score Problems include: Difficult to capture quality in a single score Quality items represented may be irrelevant to bias No allowance for direction of individual biases
Random effects modelling of bias Assume that each study i estimates a biased parameter i rather than target parameter Choose a distribution to describe plausible size (and direction) of the bias i for each study Standard random-effects analysis is equivalent to assuming E[ i ] =0, V[ i ] = 2 This assumption of common uncertainty about study biases seems rather strong
Hip replacements example Comparison of hip replacements (Charnley vs. Stanmore) Endpoint:patient requires revision operation Three studies available:Registry data RCT Case series Assumptions:RCT evidence unbiased bias in case series > bias in registry data Spiegelhalter and Best, 2003
Hip replacements example: allowing for bias Values assigned to variance of bias, which controls the extent to which evidence will be downweighted. Problems: How to choose the values which control the weighting? No separation of internal and external bias.
Full bias modelling Identify sources of potential bias in available evidence Obtain external information on the likely form of each bias Construct a model to correct the data analysis for multiple biases, on the basis of external information Example (Greenland, 2005): 14 case-control studies of association between residential magnetic fields and childhood leukaemia. Potential biases identified:Non-response Confounding Misclassification of exposure
Magnetic fields example: allowing for bias Bias corrected forOR95% CIP-value Non-response1.45(0.94,2.28)0.05 Confounding1.69(1.32,2.33)0.002 Misclassification2.92(1.42,35.1)0.011 All three biases2.70(0.99,32.5)0.026 Conventional analysis Odds ratio for leukaemia, fields >3mG vs. 3mG: 1.68 (1.27, 2.22), P-value of
Choosing values for the bias parameters Bias due to an unknown confounder U: Need to express prior beliefs for: OR relating U to magnetic fields (within leukaemia strata) OR relating U to leukaemia (within magnetic fields strata) Greenland (2005) chooses wide distributions giving 5 th and 95 th percentiles of 1/6 and 6. Multiple studies Greenland expects degree of confounding to vary according to study location and method of measuring magnetic fields Uses location, measurement method as predictors for log ORs
Example: ETS and lung cancer Wolpert and Mengersen, 2004: 29 case-control and cohort studies of association between ETS and lung cancer. Potential biases identified:Eligibility violations Misclassification of exposure Misclassification of cases Penalty points represent each study’s control of each bias Error rates assumed to increase with each penalty point E.g. eligibility violation:5% for typical studies doubles with each penalty point
Arguments against bias modelling Impossible to identify all sources of bias Little information on the likely effects of bias, even for known sources Bias modelling requires external (subjective) input, rather than letting the data “speak for themselves” Increases complexity of analysis problems with presentation and interpretation
Arguments for bias modelling Assumption of zero bias is extremely implausible in most analyses (although zero expected bias may be reasonable) Uncertainty due to potential biases may be much larger than uncertainty due to random error Informal discussion of the possible effects of bias is not sufficient Preferable to include all relevant data and model bias, rather than throwing much of the data away?
Aims of planned work Allow for both rigour and relevance (internal & external bias) Consider potential sources of bias, and available evidence on plausible sizes of biases Construct simple models for adjustment Develop elicitation strategy for obtaining judgements on reasonable size of unmodelled sources of bias Develop strategy for sensitivity analysis Simple models for bias Require 4 bias parameters for each study: RIG, RIG control rigour REL, REL control relevance
Challenges Problem of multiple biases is complex, but approach for correction must be simple and accessible. Otherwise evidence synthesis will, in general, continue to exclude some studies and make no allowance for differences between others. When correcting for multiple biases, important to determine a strategy for sensitivity analysis.
Issues for discussion Credibility of findings which incorporate external information in addition to data More acceptable when available evidence is scarce and expected to be biased than when many RCTs available? Greenland and others argue that analysis corrected for biases should be treated as definitive analysis (i.e. not only sensitivity analysis) – is this a realistic aim?
References Eddy DM, Hasselblad V, Shachter R. Meta-analysis by the Confidence Profile Method. Academic Press: San Diego, Greenland S. Multiple-bias modelling for analysis of observational data. Journal of the Royal Statistical Society Series A 2005; 168: Spiegelhalter DJ, Best NG. Bayesian approaches to multiple sources of evidence and uncertainty in complex cost- effectiveness modelling. Statistics in Medicine 2003; 22: Wolpert RL, Mengersen KL. Adjusted likelihoods for synthesizing empirical evidence from studies that differ in quality and design: effects of environmental tobacco smoke. Statistical Science 2004; 19: