Presentation is loading. Please wait.

Presentation is loading. Please wait.

Studying the Impact of Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist.

Similar presentations


Presentation on theme: "Studying the Impact of Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist."— Presentation transcript:

1 Studying the Impact of Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist in Evidence Synthesis Award

2 Answering policy decisions about the use of diagnostic tests Should GPs refer patients with low back pain for X-ray and/or MRI? Should patients with dyspeptic symptoms receive serology tests for H.pylori, endoscopy, or empirical therapy?

3 Standard hierarchy for HTA of tests (Fryback and Thornton 1991) 1. Technical quality of the test 2. Diagnostic accuracy 3. Change in diagnostic thinking 4. Change in patient management 5. Change in patient outcomes 6. Societal costs and benefits

4 Studies on the Diagnostic Evaluation Pathway Analytical validity Reliability (repeatability and reproducibility) Measurement accuracy Diagnostic validity Diagnostic accuracy Comparative/incremental diagnostic accuracy Impact Change in diagnostic yield Change in management Change in patient outcomes Economic evaluation

5 HTA policy on evaluating tests (up until 2004) the emphasis of the HTA programme is to assess the effect on patient management and outcomes … improvements in diagnostic accuracy, whilst relevant, are not the primary interest of this commissioned research programme

6 Studies on the Diagnostic Evaluation Pathway Analytical validity Reliability (repeatability and reproducibility) Measurement accuracy Diagnostic validity Diagnostic accuracy Incremental diagnostic accuracy Impact Change in diagnostic yield Change in management Change in patient outcomes Economic evaluation Focus of HTA programme

7 Outline of talk Trials of diagnostic evaluations Problems What is being evaluated? Statistical power Study validity Outcomes Pragmatic suggestions When are trials really needed? Alternative trial designs Alternative of assessing comparative accuracy More research is needed

8 RCT to assess patient outcomes Outcome Control Active Randomise Population Sample

9 Diagnostic RCT Outcome Control TEST Randomise Population Sample

10 Outcome 6 weeks N=59 Outcome 1 year N=50 RCT 1: X-ray at first GP presentation for low back pain. HTA 2000(4): 20 Randomise GP attendees aged yrs with LBP. Excluded if flu or previous consultation for LBP in last 4 weeks Referred for X-ray N=73 Sample N=153 No X-ray referral N=80 PRIMARY Roland score HADS SF-36 EuroQol SECONDARY time off work therapists medication satisfaction Outcome 6 weeks N=67 Outcome 1 year N=58

11 Outcome 6 weeks N=59 Outcome 1 year N=50 RCT 1: X-ray at first GP presentation for low back pain. HTA 2000(4): 20 Randomise GP attendees aged yrs with LBP. Excluded if flu or previous consultation for LBP in last 4 weeks Referred for X-ray N=73 Sample N=153 No X-ray referral N=80 Outcome 6 weeks N=67 Outcome 1 year N=58 RESULTS At 6 weeks SF-36 mental health and vitality subscales (P<.05) At 12 months SF-36 mental health subscale (P<.05)

12 Outcome 3 months N=199 Outcome 9 months N=195 RCT 2: X-ray for GP presentation for low back pain >6 weeks. HTA 2001(5): 30 Randomise GP attendees aged st episode of LBP between 6 weeks and 6 months duration. Excluded if red flags Referred for X-ray N=210 Sample N=421 No X-ray referral N=211 PRIMARY Roland score SECONDARY pain (VAS) EuroQol pain (diary) satisfaction pain (any) belief in X-ray time off work therapists medication consultations Outcome 3 months N=203 Outcome 9 months N=199

13 Outcome 3 months N=199 Outcome 9 months N=195 RCT 2: X-ray for GP presentation for low back pain >6 weeks. HTA 2001(5): 30 Randomise GP attendees aged st episode of LBP between 6 weeks and 6 months duration. Excluded if red flags Referred for X-ray N=210 Sample N=421 No X-ray referral N=211 Outcome 3 months N=203 Outcome 9 months N=199 RESULTS At 3 months proportion reporting LBP (P<.05) At 9 months None

14 What is being evaluated? Medical Test Information Decision Action Patient Outcome Test harms and placebo effects RCT combines effects Diagnostic accuracy Diagnostic yield Management

15 What is being evaluated? Conditions for a test to be of diagnostic benefit Test is more accurate Interpretation of test results is rational and consistent Management is rational and consistent Treatment is effective Conditions for a trial to be informative Rules for interpretation of test results are described Management protocol is described No descriptions given in example trials Applying the results requires faith that the behaviour of your patients and clinicians is the same as the trial

16 What is being evaluated? If no difference is observed … Is the test no more accurate? Are clinicians not correctly interpreting test results? Are management decisions inconsistent or inappropriate? Is the treatment ineffective? None of these questions can be answered If one element changes, the results of the trial become redundant

17 Statistical Power RCT 1: Reduction in proportion with pain at 2 weeks from 40% to 30% could be detected with 300 patients with 80% power at 5% significance RCT 2: Difference of 1.5 on Roland score could be detected with 388 patients with 90% power and 5% significance sd=4.5, standardised difference=1.5/4.5=0.33 These sample size calculations are suitable for a trial of treatment vs placebo, not a trial of test+treatment

18 Diagnostic Accuracy of Clinical Judgement TP FN FP TN Serious (requires intervention) Minor (requires no intervention)

19 Diagnostic Accuracy of Clinical Judgement + X-ray TP FN FP TN Serious (requires intervention) Minor (requires no intervention)

20 Comparison of Diagnostic Accuracy All FP Discrepant B All TN All TP Discrepant A All FN Serious (requires intervention) Minor (requires no intervention)

21 Benefit can only occur in those whose diagnosis changes Where can differences arise? Discrepant A could benefit if intervention effective Discrepant B could benefit if intervention harmful All others have no benefit as no change in their intervention Sample size must take into account Prevalence of treatable condition Detection rate (sensitivity) with control test Detection rate (sensitivity) with new test Treatment rate if control test negative (assume zero) Treatment rate if new test positive (assume 100%) Outcome for treatable condition if untreated Treatment effect

22 Sample size for detecting treatment effects Sample size for treatment vs control Sample size must be adjusted according to the proportion in discrepant cells (particularly A). If 20% have serious disease and sensitivity 20% there will be 4% in Discrepant A increase N 25-fold (N=7,500-10,000) If 10% have serious disease and sensitivity 10% there will be 1% in Discrepant A increase N 100-fold (N=30,000-40,000)

23 Sample size for detecting differences in accuracy Sample size depends on whether the sample all receive both tests, or are randomised to tests Sample sizes for difference in sensitivity If 20% have serious disease to detect sensitivity 20% from 70% to 90% (80% power, alpha 0.05) paired cohort design N=116 [68-136] parallel cohort design N=232 If 10% have serious disease to detect sensitivity 10% from 80% to 90% (80% power, alpha 0.05) paired cohort design N=706 [ ] parallel cohort design N=1411

24 Sample size for detecting differences in diagnoses and management Sample size based on accuracy sample size inflated according to: For diagnostic impact diagnosis rate if control test negative diagnosis rate if new test positive* For therapeutic impact treatment rate if control test negative treatment rate if new test positive* * subject to learning effects

25 Validity Concerns Blinding Participants and outcome assessors are rarely blind in diagnostic trials Trials may be more susceptible to measuring preconceived notions of participants and expectations of trialists Drop-out Lack of blinding can induce differential drop-out There are more stages at which drop-out occurs Compliance Lack of blinding and complexity in strategies can reduce compliance

26 What outcomes? The problem is multi-multi-factorial Assessing the effect of a single intervention for a single disease requires multiple outcomes Tests are used to differentiate between multiple diseases and disease states A trial should assess all the important outcomes for the multiple diseases within the differential diagnosis But trials usually have a focus on one condition

27 Summary of problems Diagnostic trials are … Rarely done Assess effects of test+treatment package Uninformative about the value of the test Often underpowered At risk of bias May not assess all relevant outcomes May be more likely to detect placebo effects than benefits of better diagnoses May not represent future impact on treatment and diagnostic decisions

28 Key issues Trials only need be done in limited circumstances Only patients in the discrepant cell are informative Audit and feedback studies are better for assessing and changing clinicians behaviour than trials More good comparative studies of test accuracy are required

29 When is measuring sensitivity and specificity sufficient to evaluate a new test? Lord et al. Ann Int Med 2006; 144: Categories of test attributes: The new test is safer or is less costly The new test is more specific (excludes more cases of non-disease) The new test is more sensitive (detects more cases of disease) If an RCT of treatments exists, when do we still need to undertake an RCT of test+treatment?

30 Lord, S. J. et. al. Ann Intern Med 2006;144: Trial evidence versus linked evidence of test accuracy and treatment efficacy

31 Lord, S. J. et. al. Ann Intern Med 2006;144: Assessing new tests using evidence of test accuracy, given that treatment is effective for cases detected by the old test

32 When is measuring sensitivity and specificity sufficient to evaluate a new test? Lord et al. Ann Int Med 2006; 144: If the new test has similar sensitivity Trials of test+treatment are not required Reductions in harm or cost are benefits Improved specificity can only be a benefit Decision models can be used to analyse trade- offs between positive and negative benefits

33 When is measuring sensitivity and specificity sufficient to evaluate a new test? Lord et al. Ann Int Med 2006; 144: If the new test has improved sensitivity Value of using the test depends on treatment response in the extra cases detected A trial is still not needed if Inclusion in the treatment trial was based on the reference standard for assessing test accuracy The test is evaluated in a treatment trial as a predictor of response The new cases represent the same spectrum or subtype of disease Treatment response is known to be similar across the spectrum or subtype of disease

34 Alternative Diagnostic RCT OutcomeIntervene Outcome Do not intervene Randomise X-ray Do not intervene Outcome InterveneOutcome Clinical diagnosis Population Sample Serious Minor Serious Minor

35 Alternative Diagnostic RCT OutcomeIntervene Outcome Do not intervene Randomise X-ray Clinical diagnosis Population Sample Serious Minor Serious Minor Compare

36 Alternative Diagnostic RCT Everybody gets all tests, randomise only those with discrepant results Benefits Assess diagnostic yield and resultant patient outcomes Less follow-up required Include a reference standard for a random sample and comparative diagnostic accuracy can also be assessed Downsides More tests undertaken Problems when test material is limited Does not assess test harms or other direct effects May not be ethical to randomise treatment

37 Assessing clinicians behaviours Informative trials require documentation and standardisation of decision-making Particularly difficult when the comparison group is standard practice Assessing behaviour observed in a trial may not be representative Future behaviour will depend on the trial results Learning curves may affect compliance Becoming acquainted with a test Ascertaining how best to use it Gaining confidence in its findings Allowing it to replace other investigations

38 Diagnostic Before-and-After Studies Design Doctors assessments of diagnostic, prognostic and required management decisions recorded Result of new test made available Doctors changes in diagnostic, prognostic and required management decisions noted (Reference standard applied) Application Assessment of an Additional Test only Assessment of Diagnostic Yield and Management Concerns New test assessed independent of other tests Doctors processes may not reflect standard clinical practice Learning effects

39 Conclusions 1. We have much to learn about the best way of studying diagnostic tests 2. Test+treatment trials are difficult to undertake, are prone to bias, and often require unattainable sample sizes. 3. Good comparative studies of test accuracy combined in decision models with evidence from trials of treatments may in many circumstances provide the necessary evidence for policy decisions 4. Good comparative studies of test accuracy should be commissioned more readily

40

41

42 Situations when test accuracy is likely to be adequate 1. Effective treatment exists 2. Reference standard similar to trial entry criteria 3. All tests detect disease at the same stage 4. All tests lead to the same treatment options 5. Evidence of test accuracy is from similar populations to the evidence of effectiveness Studies of diagnostic accuracy need not be large Good studies that compare tests head-to-head need to be done

43 Research Questions Do the limitations of HTA evaluations of patient outcomes bias towards showing no difference? How often are trials appropriately powered? How often are the criteria for not needing RCT evidence met? What efficiencies can be made with other designs? How are economic analyses affected by these issues? Would money be better spent getting good evidence of the comparative accuracy of alternative diagnostic tests?

44 Lord, S. J. et. al. Ann Intern Med 2006;144: New Diagnostic Test Assessment Framework and Examples

45 When are trials most needed? When tests detect disease earlier introducing different treatment options (screening) When interventions have harmful effects so treating some non-diseased (FP) may outweigh benefits of treating diseased (TP) When the test itself has a harmful effect When diagnostic accuracy cannot be assessed as a reference standard does not exist (although there still need to be indications for therapy – which could be used as a ref std)

46 Destination worth reaching

47 The Role of Randomised Trials in Evaluating Diagnostic Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist in Evidence Synthesis Award

48 Defects and Disasters in Evaluations of the Impact of Diagnostic Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist in Evidence Synthesis Award


Download ppt "Studying the Impact of Tests Jon Deeks Professor of Health Statistics University of Birmingham Work supported by a DOH NCC RCD Senior Research Scientist."

Similar presentations


Ads by Google