Presentation on theme: "Plan GRADE background certainty in evidence (quality, confidence evidence) evidence profiles strength of recommendation exercises in applying GRADE."— Presentation transcript:
1 PlanGRADE backgroundcertainty in evidence (quality, confidence evidence)evidence profilesstrength of recommendationexercises in applying GRADE
2 experience participating guideline panels? clin epi methodology course?is grading recommendations a good idea? If so, why?experience with gradingsystems used?
3 Grading good idea, but which grading system to use? many availableAustralian National and MRCOxford Center for Evidence-based MedicineScottish Intercollegiate Guidelines (SIGN)US Preventative Services Task ForceAmerican professional organizationsAHA/ACC, ACCP, AAP, Endocrine society, etc....cause of confusion, dismay
5 Common international grading system? GRADE (Grades of recommendation, assessment, development and evaluation)international groupAustralian NMRC, SIGN, USPSTF, WHO, NICE, Oxford CEBM, CDC, CC~ 35 meetings over last 14 years(~10 – 70 attendants)
6 GRADE GUIDANCE 2004 BMJ, first description 2008 BMJ six part series for guideline users, 21 part series, 15 publishedfor systematic review authors, HTA practitioners, guideline developers
7 Grading system – for what? interventionsmanagement strategy 1 versus 2what grade is not aboutindividual studies (body of evidence)
8 What GRADE is not primarily about diagnostic accuracy questionsin patients with a sore leg, what is the accuracy of a blood test (D-Dimer) in sorting out whether a deep venous thrombosis is the cause of the painprognosiswhat it is about: diagnostic impactare patients better off (improved outcomes) when doctors use the d-dimer test
11 What are we grading? two components certainty in estimate of effect adequate to support decision (quality of body of evidence)high, moderate, low, very low
12 Likelihood of and confidence in an outcome We can look at this as depicted in this cartoon. The likelihood of and the confidence in an outcome. In the cartoon one meteorologist is saying to another, I figure there is a 40% chance of showers and a 10% chance we know what we are talking about. Once again, this expresses our confidence in an estimate of effect and the likelihood that it actually occurs. For instance, the confidence intervals around the 404 chance of showers estimate may be very tight. They may in fact be based on modeling that has come up with confidence intervals that range from 35 – 45 %. However, the development of the model or the application of the model from one setting to another may leave us with very little confidence that the estimate is actually correct for the particular setting. Just imagine that model being developed in Australia and applied to North America. Once again, this is similar to how we look at the confidence in evidence in the GRADE approach.12
13 Semantic Issue: Label for trustworthiness QualityInitial choice, defined as confidencenatural to clinicians, but confusion with risk of biasConfidencewhat we actually mean, but confusion with confidence intervals, and experts always confidentCertaintyavoids confusion of others, experts might acknowledge uncertainty - Current preferred term
14 What are we grading? two components certainty in evidence adequate to support decision (quality of body of evidence)high, moderate, low, very lowstrength of recommendationstrong and weakweak alternativesconditional, contingent, discretionary
15 Generate an estimate of effect for each outcome StudiesS1S2S3S4S5Health Care Question(PICO)Systematic reviewsOutcomesOC1OC2OC3OC4ImportantoutcomesOC1OC2CriticaloutcomesOC3OC4Generate an estimate of effect for each outcomeRate the quality of evidence for each outcome, across studiesRCTs start high, observational studies start low(-)Study limitationsImprecisionInconsistency of resultsIndirectness of evidencePublication bias likelyFinal rating of quality for each outcome: high, moderate, low, or very low(+)Large magnitude of effectDose responsePlausible confounders would ↓ effect when an effect is present or ↑ effect if effect is absentRate overall quality of evidence(lowest quality among critical outcomes)Decide on the direction (for/against) and grade strength (strong/weak*) of the recommendation considering:Quality of the evidenceBalance of desirable/undesirable outcomesValues and preferencesDecide if any revision of direction or strength is necessary considering: Resource use*also labeled “conditional”or “discretionary”15
16 Structured question patients: intervention, testosterone Males over 50 presenting with fatigue, malaise and erecticle dysfunction with laboratory evidence of decreased testosteroneintervention, testosteronecomparator no testosteroneoutcomes?
17 Rating certaintyWhere to start RCTs and observational studies (High, moderate, low, very low)?Recall antioxidant vitaminsObservational studies less cancer, CV outcomesRCTs no differenceResult observed repeatedlyWhat went wrong?
18 Determinants of confidence RCTs start highobservational studies start lowwhat can lower confidence?risk of biasinconsistencyindirectnessimprecisionpublication bias
19 Risk of Bias - RCTs what to consider? well established more recent concealmentintention to treat principle observedblindingcompleteness of follow-upmore recentselective outcome reporting biasStopping early for benefit
20 RoB – Observational Studies what to consider?accurate assessment of exposureadjusted analysis for all important prognostic factors, accurately measuresaccurate assessment of outcomecompleteness of follow-up
21 Risk of Bias differs – what to do? 6 studies, 100 patients each3 studies low risk of bias, 3 highrate down for risk of bias?
26 Homogenous test for heterogeneity what is the p-value? what is the null hypothesisfor the test for heterogeneity?Ho: RR1 = RR2 = RR3 = RR4p=0.99 for heterogeneity
27 Heterogeneous test for heterogeneity what is the p-value? p-value for heterogeneity < 0.001p-value for heterogeneity < 0.001
28 Only a little concerned I2 Interpretation100%Why are we pooling?75%Very concerned25%Only a little concerned50%Getting concerned0%No worries
29 HomogenousWhat is the I2 ?p=0.99 for heterogeneityI2=0%
30 Heterogeneous What is the I2 ? I2=89% p-value for heterogeneity < 0.001I2=89%
31 Relative Risk with 95% CI for Vitamin D Non-vertebral Fractures
32 Relative Risk with 95% CI for Vitamin D (Non-Vertebral Fractures, Dose >400)
33 Relative Risk with 95% CI for Vitamin D (Non-Vertebral Fractures, Dose = 400)
34 Should we believe sub-group analysis? within-study comparison? Nounlikely chance Yes, p = 0.006consistent across studies Yesone of small number a priori hypothesis with direction Yesbiologically compelling Yesshall we believe sub-group analysis?
35 Credibility of sub-group analysis no waysure thing100
36 Confidence judgments: Directness populationsolder, sicker or more co-morbidityinterventionswarfarin in trials vs clinical practiceoutcomesimportant versus surrogate outcomesglucose control versus CV events
37 Hierarchy of outcomes according to their patient-importance effect of phosphate lowering drugs in patients with renal failure and hyperphophatemiaImportanceof endpointsSurrogates of declining importanceMortality 9Criticalfor decision makingImportant,but not critical fordecision makingOf low patient-importanceCoronarycalcificationCa2+/P-ProductMyocardial infarction 8BonedensityCa2+/P-ProductFracturesPain due to soft tissueCalcification / function 6Soft tissue calcificationCa2+/P-Product54Lower by one level for indirectness3Flatulence2Lower by two levels for indirectness1
38 Directness Alendronate Risedronate Placebo interested in A versus B available data A vs C, B vs CAlendronateRisedronatePlacebo
39 Imprecision small sample size wide confidence intervals small number of eventswide confidence intervalsuncertainty about magnitude of effecthow do you decide what is too wide?primary criterion:would decisions differ at ends of CI
40 Precision atrial fib at risk of stroke warfarin increases serious gi bleeding3% per year1,000 patients 1 less stroke30 more bleeds for each stroke prevented1,000 patients 100 less strokes3 strokes prevented for each bleedwhere is your threshold?how many strokes in 100 with 3% bleeding?
45 Example: clopidogrel or ASA? pts with threatened strokeRCT of clopidogrel vs ASA19,185 patientsischaemic stroke, MI, or vascular death compared939 events (5·32%) clopidogrel1021 events (5·83%) with aspirinRR 0.91 (95% CI 0.83 – 0.99) (p=0·043)rate down for precision?
46 Clopidogrel or ASA for threatened vascular events RCT 19,185 patients1.7% – 0.1%RR 0.91 (95% CI 0.83 – 0.99)1.0%
50 small trials, large effect analogy to stopping early likely to be overestimateanalogy to stopping earlylack of prognostic balancesolution: optimal information size# of pts from conventional sample size calculationspecify control group risk, α, β, Δ
51 Fluoroquinolone prophylaxis in neutropenia: infection-related mortalityTotal number of events: 47
57 What can raise confidence? What do you do high certainty, no RCTs?common criteriaeveryone used to do badlyalmost everyone does wellquick actioninsulin for diabetic ketoacidosis?thyroxine for thyroid deficiency?hydrocortisone for adrenal insufficiency?
58 Dose-response gradient childhood lymphoblastic leukemiarisk for CNS malignancies 15 years after cranial irradiationno radiation: 1% (95% CI 0% to 2.1%)12 Gy: 1.6% (95% CI 0% to 3.4%)18 Gy: 3.3% (95% CI 0.9% to 5.6%).
60 Overall level of evidence What to do when certainty differs across outcomes?optionsignore all but primaryprevious approachleast certainty of any outcomesome blended approachleast certainty of critical outcomes
61 Trading off desirable and undersirable what do patients/clinicians need to knowrelative risk reduction?absolute risk difference?Toxic treatment, 50% RRR mortality? OK?1% to 1/2% OK?40% to 20%, OK?body of evidencehow do we get risk difference?
62 How to get absolute? meta-analysis get pooled relative risk obtain baseline risk and multiplyBR 10%, RRR 50%, RD 5%why not get risk difference directly?
65 High versus low PEEP in ALI and ARDS PopulationNo. of participants (trials) †Higher PEEPLower PEEPAdjusted Relative Risk (95% CI; P-value) ‡Adjusted Absolute Risk Difference (95% CI)QualityPatients with ARDS1892 (3)324/951 (34.1%)368/941 (39.1%)0.90 (0.81 to 1.00; 0.049)-3.9% (-7.4% to -0.04%)HighPatients without ARDS404 (3)50/184 (27.2%)41/220 (18.6%)1.37 (0.98 to 1.92; 0.065)6.9% (-0.4% to 17.1%)Moderate(imprecision)
66 Strength of Recommendation strong recommendationbenefits clearly outweigh risks/hassle/costrisk/hassle/cost clearly outweighs benefitwhat can downgrade strength?low confidence in estimatesclose balance between up and downsides
67 Risk/Benefit tradeoff aspirin after myocardial infarction25% reduction in relative riskside effects minimal, cost minimalbenefit obviously much greater than risk/costwarfarin in low risk atrial fibrillationwarfarin reduces stroke vs ASA by 50%but if risk only 1% per year, ARR 0.5%increased bleeds by 1% per yearReason for clear recommendations in first example is that benefits moderate to large and risk or costs are minimalReson that equivocal in second is that benefits slightly smaller, risks greater, and costs greater; means that close call (or at least some might think so)
68 Strength of Recommendations Aspirin after MI – do itWarfarin rather than ASA in Afib-- probably do it-- probably don’t do it
72 Significance of strong vs weak variability in patient preferencestrong, almost all same choice (> 90%)weak, choice varies appreciablyinteraction with patientstrong, just inform patientweak, ensure choice reflects valuesuse of decision aidstrong, don’t bother; weak, use the aidquality of care criterionstrong, consider; weak, don’t consider
73 When evidence is low confidence choice more preference dependentrisk aversionsteroids for pulmonary fibrosislow quality evidence in support of benefithigh quality evidence of toxicity
74 When confidence is low recommendation to the hopeful patient I’m likely to deteriorateif something might work, let’s try itdamn the torpedoesrecommendation to the fearful patientdoctor, you mean you know it’s toxicdiabetes, skin changes, body habitus, infection, osteoporosisyou don’t know for sure it works? are you crazy?weak recommendation mandated
75 Presentation strong weak never “we suggest…” “we recommend”…weak“we suggest…”neverwe recommend (or suggest) you consider…
76 Challenge Comparator often not clear Children with suspected or confirmed tuberculous meningitis should be treated with a four-drug regimen (HRZE) for 2 months, followed by a two-drug regimen (HR) for 10 monthsOffer and promote postpartum and post-abortion contraception to adolescents through multiple home visits and/or clinic visits76
77 Strong recommendations, Low certainty: Discordant recs Experts use oftenWhy? What are the possibilities?
78 Why all the inappropriate strong recommendations? panels don’t believe their own confidence ratingspersonal conviction trumps evidencebelieve weak recommendations ignoredinfluence funders
79 Discordant recommendations: What are the possibilities? good practicemistaken judgmentinappropriateexceptional situation they got it right
80 Good Practice Statements For patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excessWealth of indirect linked evidenceHigh confidence in net benefitBenefit clearMinimal harms or costsPoor use of guideline panel time effort summarize
81 Summarizing evidence poor use of time symptoms and signs appear not infrequentlyCollect cohort studies of incidenceStudies of accuracy of symptoms and signspatients suffer if clinicians fail to recognizeReports of untreated glucocorticoid excessclinical action can ameliorate the problemEvidence supporting therapydescribe how evidence is linked
82 Questions panels considering good practice statement should ask Is the statement clear and actionable?Is the message really necessary?Is the net benefit large and unequivocal?Is the evidence difficult to collect and summarize?If a public health guideline, are there specific issues that should be considered (e.g. equity)Have you made the rationale explicit?Is this better to be formally GRADEd?
83 Clear and actionableFor patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excessMonitor how often?Nature of monitoringWhat to do if signs of excess found
84 Really necessary?For patients with congenital adrenal hyperplasia, we recommend monitoring patients for signs of glucocorticoid excessReally plausible that clinicians won’t monitor?If not, not necessary
85 Provide Rationale relevant symptoms and signs appear not infrequently patients will suffer if clinicians fail to recognize these signsclinical action can ameliorate the problem.
86 1 2 3 4 5 LQE in a life-threatening situation Fresh frozen plasma and intracranial bleed2LQoE benefit and HQoE suggests harmHead-to-toe CT/MRI screening for cancer.3LQoE suggests equivalence, HQoE less harm for one alternativeHelicobacter pylori eradication early stage gastric MALT lymphoma4HQoE suggests equivalence, LQoE suggests harm in one alternativeACEI in hypertension in women planning conception and in pregnancy.5HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valuedoutcomeTestosterone in males with or at risk of prostate cancer86
87 1 2 3 4 5 LQE in a life-threatening situation Fresh frozen plasma and intracranial bleed2LQoE benefit and HQoE suggests harmHead-to-toe CT/MRI screening for cancer.3LQoE suggests equivalence, HQoE less harm for one alternativeHelicobacter pylori eradication early stage gastric MALT lymphoma4HQoE suggests equivalence, LQoE suggests harm in one alternativeACEI in hypertension in women planning conception and in pregnancy.5HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valuedoutcomeTestosterone in males with or at risk of prostate cancer87
88 1 2 3 4 5 LQE in a life-threatening situation Fresh frozen plasma and intracranial bleed2LQoE benefit and HQoE suggests harmHead-to-toe CT/MRI screening for cancer.3LQoE suggests equivalence, HQoE less harm for one alternativeHelicobacter pylori eradication early stage gastric MALT lymphoma4HQoE suggests equivalence, LQoE suggests harm in one alternativeACEI in hypertension in women planning conception and in pregnancy.5HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valuedoutcomeTestosterone in males with or at risk of prostate cancer88
89 1 2 3 4 5 LQE in a life-threatening situation Fresh frozen plasma and intracranial bleed2LQoE benefit and HQoE suggests harmHead-to-toe CT/MRI screening for cancer.3LQoE suggests equivalence, HQoE less harm for one alternativeHelicobacter pylori eradication early stage gastric MALT lymphoma4HQoE suggests equivalence, LQoE suggests harm in one alternativeACEI in hypertension in women planning conception and in pregnancy.5HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valuedoutcomeTestosterone in males with or at risk of prostate cancer89
90 1 2 3 4 5 LQE in a life-threatening situation Fresh frozen plasma and intracranial bleed2LQoE benefit and HQoE suggests harmHead-to-toe CT/MRI screening for cancer.3LQoE suggests equivalence, HQoE less harm for one alternativeHelicobacter pylori eradication early stage gastric MALT lymphoma4HQoE suggests equivalence, LQoE suggests harm in one alternativeACEI in hypertension in women planning conception and in pregnancy.5HQoE suggests benefit in one outcome, LQoE suggests harm in more highly valuedoutcomeTestosterone in males with or at risk of prostate cancer90
91 Methodssystematic survey of all published ES guidelines between 2005 and 2011screening and extraction in duplicatefor each recommendation: confidence in estimates, strength of recommendationstrong recommendations based on LQE taxonomy for paradigmatic recommendations applied
92 If they did not fit one of 5 paradigms ConditionExample1Best practice statementsFor patients with Congenital Adrenal Hyperplasia, we recommend monitoring patients for signs of glucocorticoid excess2Additional researchWe recommend additional investigation using rodents and primates to further define the specific targets of androgen action3Mistaken judgmentFor overweight and obese children and adolescents, intensive lifestyle modification for the patient and entire family4Inappropriate strong recommendationIn patients with primary aldosteronism who are unable or unwilling to undergo laparascopic adrenalectomy, we recommend medical treatment with mineralocorticoids
93 Guidelines Strenght and Confidence Strong recommendations (n=206):n (%)Weak recommendations (n=151):High/moderate confidence in estimates85(41%)High/moderate confidence in estimates16(8%)Very low/low confidence in estimates121(59%)Very low/ low confidence in estimates135(92%)Totals (%)206(100%)15193
94 Condition N - 35 1 LQE in a life-threatening situation 13 LQoE benefit and HQoE suggests harm or a very high cost7LQoE suggests equivalence, HQoE less harm for one of the competing alternatives.5HQoE suggests equivalence of two alternatives and LQoE suggests harm in one alternative9HQoE suggests modest benefits and LQoE suggests possibility of catastrophic harm
96 Summary majority ES recommendations strong 121 (59%) discordant 35/121 (29%) of discordant appropriateof 86 inappropriate, 43 (50%) best practice statements33/86 inappropriate, should have been weak recommendations
97 WHO recommendation mental health guideline 2013 8 of 9 confidence low, strong“On a number of occasions, the GDG decided to give a strong recommendation despite a GRADE assessment of the available evidence on effect as being of “very low quality”.
98 Why?“This occurred only when the following conditions applied: (a) there was certainty about the balance of benefits versus harms and burdens; (b) the expected values and preferences were clearly in favour of the recommendation; and (c) there was certainty about the balance between benefits and resources being consumed.”
99 Value and preference statements underlying values and preferences always presentsometimes crucialimportant to make explicit
100 Values and preferences Stroke guideline: patients with TIA clopidogrel over aspirin (Grade 2B).Underlying values and preferences: This recommendation to use clopidogrel over aspirin places a relatively high value on a small absolute risk reduction in stroke rates, and a relatively low value on minimizing drug expenditures.
101 Values and preferences peripheral vascular disease: aspirin be used instead of clopidogrel (Grade 2A).Underlying values and preferences: This recommendation places a relatively high value on avoiding large expenditures to achieve small reductions in vascular events.
102 Values and preferences Consider UpToDate style of values and preferencesWeak recommendation low certainty evidence for trial of testosterone in men with apparent testosterone deficiency and cardiovascular diseaseMen who place a high value on minimizing risk of an adverse cardiovascular event and a relatively low value in ameliorating the symptoms of testosterone deficiency are likely to choose against testoserone use
103 Flavanoids for Hemorrhoids venotonic agentsmechanism unclear, increase venous returnpopularity90 venotonics commercialized in Francenone in Sweden and NorwayFrance 70% of world marketpossibilitiesFrench misguidedrest of world missing out
104 Systematic Review 14 trials, 1432 patients key outcome risk not improving/persistent symptoms11 studies, 1002 patients, 375 eventsRR 0.4, 95% CI 0.29 to 0.57minimal side effectsis France right?what is the certainty of evidence?
105 What can lower confidence? risk of biaslack of detail re concealmentquestionnaires not validatedindirectness – no probleminconsistency, need to look at the results
109 What can lower confidence? risk of biaslack of detail re concealmentquestionnaires not validatedinconsistencyalmost all show positive effect, trendheterogeneity p < 0.001; I2 65.1%indirectnessimprecisionRR 0.4, 95% CI 0.29 to 0.57publication bias40 to 234 patients, most around 100
110 Is France right? recommendation yes no against use strength strong weak
111 Beta blockers in non-cardiac surgery Quality AssessmentSummary of FindingsQualityRelative Effect(95% CI)Absolute risk differenceOutcomeNumber of participants(studies)Risk of BiasConsistencyDirectnessPrecisionPublication BiasMyocardial infarction10,125(9)No serious limitationsNo serious imitationsNot detectedHigh0.71(0.57 to 0.86)1.5% fewer(0.7% fewer to 2.1% fewer)Mortality10,205(7)No serious limiationsImpreciseModerate1.23(0.98 – 1.55)0.5% more(0.1% fewerto 1.3% more)Stroke10,889(5)No serious limitaions2.21(1.37 – 3.55)(0.2% more to1.3% more0
112 Where to from here? GRADE values and preferences GRADE diagnosis Aspirin for primary preventionCulprit only vs complete revascularization in STEMIManagement of esophageal varices