Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham Chris Hyde Professor of Public Health and.

Similar presentations

Presentation on theme: "Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham Chris Hyde Professor of Public Health and."— Presentation transcript:

1 Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham Chris Hyde Professor of Public Health and Clinical Epidemiology University of Exeter

2 Archie Cochrane. Effectiveness and efficiency. Random reflections on health services. The Nuffield Provincial Hospitals Trusts, 1972.

3 Medical Test Information Decision Action Patient Outcome Test harms/ placebo effects Diagnostic accuracy Diagnostic yield Management Diagnostic and Treatment Pathways Accuracy provides information about the hypothetical value of a test in decision making

4 Phases of Test Evaluation Technical Performance Does CT produce good quality images, reliably and reproducibly? DIAGNOSTIC PERFORMANCE Do CT images accurately differentiate diseased from non- diseased patients? Diagnostic Impact Does CT change how diagnoses are made by doctors? Therapeutic Impact Does CT change how treatment decisions are made? Patient Health Impact Does CT ultimately reduce mortality or morbidity? Fryback and Thornbury Med Dec Mak 1991;11:88-94

5 Why Emphasis on Test Accuracy?  Lax regulatory system for tests: no requirement for evidence about patient impact  Health Technology assessment traditionally concerned with treatments -what if we are treating the wrong patients? -what if test use itself causes harm?  Medical education: concept of EBM relatively new and test evaluation even more so  Difficulties conducting test-treat trials: sample size; blinding; measurement and reporting of diagnostic and treatment decisions

6 Outcomes Framework: 14 Mechanisms Test results produced Diagnostic decision made Treatment decision made Patient Outcome Treatment implemented Patient given test 6. Accuracy 8. Dx confidence 9. Dx Yield 10. Rx Yield 3. Feasibility 5. Interpretability 1. Test Process 11. Rx Confidence 2. Timing Test 13.Timing Treatment 4. Timing Results 7. Timing Diagnosis 12. Adherence 14. Patient / Clinician Perspective Ferrante di Ruffano et al. BMJ 2012;344:e686

7 Diagnostic confidence  Do I know where to find trustworthy information on the accuracy of this test?  Do I understand the test accuracy information available?  What are the implications of the accuracy of this test for my patients

8 Test Accuracy Measures Disease (Reference test result) Present (D+) Absent (D-) Index Test + True Positives False Positives TP+FP - False Negatives True Negatives FN+TN TP+FNFP+TN TP+FP+ FN+TN Sensitivity TP / (TP+FN) Specificity TN / (TN+FP) Positive Predictive Value TP / (TP+FP) Negative Predictive Value TN / (TN+FN) Disease as Reference Class Test Result as Reference Class DOR = True Positives X True Negatives False Negatives X False Positives

9 AUC Summary ROC curves SPECIFICITY LR- ROC Space Summary specificity ERROR RATE NPV Summary sensitivity LR+ DOR ROC plot RDOR 2x2 table Relative sensitivity Relative specificity


11 CA125 is a marker used to identify individuals who have an increased risk of ovarian cancer. NICE endorse the use of the test in primary care to select who should undergo pelvic ultrasound testing with and those who can safely not be investigated further. Sensitivity OR Specificity? Ca125 testing for ovarian cancer

12  False Positives receive further investigation with US  False Negatives do not receive further investigation Sensitivity : FN Specificity : FP Do not have ovarian cancer but test positive Have ovarian cancer but test negative

13  Contextual variables cause variation in test accuracy estimates: -Characteristics of the population to be tested -Variation in the conduct of the test itself  The proposed role of a test determines the point in a care pathway that it should be evaluated and any comparator tests that should be considered.  Context determines the relative value placed on test errors (false positives and false negatives)  Systematic reviews of test accuracy offer the potential to mitigate the lack of contextual fit observed in primary studies of test accuracy… are they doing their job?

14 It is stated to be common knowledge that decision makers have difficulty understanding and applying test accuracy informationBUT  What is the EXTENT of the problem ?  WHY do decision makers have particular difficulty with test accuracy measures – what is the nature of the difficulty?

15 To what extent are contextual considerations represented in Test Accuracy Reviews ?

16  Aims: -Assess the extent to which testing context is reflected at each stage of the review process: -when formulating review questions -synthesis, including investigation of heterogeneity -discussing results and making recommendations  Searches: -3 review databases chosen on the basis of an epidemiological mapping exercise (Bayliss & Davenport 2008 IJTAHC; 24(4): ) ARIF, (University of Birmingham) DARE (CRD York) Cochrane Database Systematic Reviews -Cochrane Diagnostic Test Accuracy Working Group & the UK National Research Register: unpublished reviews

17 HITS: N=1215 Cochrane Database of Systematic Reviews DAREARIF Excluded at abstract stage: N=641 Excluded at full paper stage: N=34 Duplicates N=303 Eligible test accuracy reviews N=237 Random 100 reviews data extracted

18  Date of publication ranged from 1990 to 2006; 23% of reviews before 2000 and 73% on or after  A total of 16 disease topic areas were represented  Between 1 and 50 index tests (median 3) were evaluated by a single review  The majority of reviews (43/100) were conducted in the USA, 23 in the UK, 12 in the Netherlands and 8 in the rest of Europe, 6 in Australia, 4 in Canada, 2 in Peru and one each in Columbia and China.  94/100 reviews included a clinician as an author  Using a modified checklist of 9 items taken from the QUORUM and AMSTAR checklists, study quality ranged from 0-9 (median 4.6; inter-quartile range 3 to 6)

19  Only 24/100 reviews clearly specified all of test application, test role and prior tests as part of question formulation; 26% of the 73 reviews published on or after 2000 and 22% of the 23 reviews published before 2000

20 Only 9/100 reviews reported all of setting, (pesentation (symptomatic or asymptomatic) and prior tests.



23  Theoretical perspective papers: to 2010, (88% after 1990) -34 papers written by 30 unique authors -25/30 clinicians of which 16/25 affiliated to an academic institution  Empirical research: to 2010,(60% after 1995) -majority of health professional samples were self- selected, convenience samples from medical education courses - only 3/26 health professional samples representative of primary care

24  Systematic review of evidence concerned with the understanding and application of test accuracy metrics.  Survey of use and understanding of test accuracy metrics by general practitioners

25 Total Hits N=16765 Included papers N=67 Empirical research N=33 Theoretical papers N=34 Clinical sample N=26 Other N=7 Excluded N=14136 Duplicates N=2508

26  Understanding and application of test accuracy information: -Accessibility of test accuracy metrics -Knowledge about accuracy of tests used in practice likely to be limited -Lack of appreciation of variation in pre-test probability across healthcare settings -Use of graphical aids and frequencies (rather than % or proportions) to facilitate probability revision  Factors affecting testing behaviour: -Testing viewed as a risk aversive behaviour -Testing context considered important modifier of attitudes to risk

27 Theoretical: understanding and applying test accuracy information Sensitivity and specificity “A clinician will not start from diseased or not diseased, but from a positive or negative test. Therefore sensitivity and specificity are intuitively not so evident” (Dujardin 1994) Global test accuracy measures do not distinguish between test accuracy in 2 dimensions “The problem that occurs in a meta-analysis of diagnostic studies is the multi-directional performance of the diagnostic instrument regarding its ability to detect (specificity) or exclude (sensitivity) the characteristic of interest is not distinguished. Multi-dimensional outcomes cannot be summarised well by a single estimate.” (Stengel 2003) Likelihood Ratios “Never in 20 years of teaching clinical logic, have we found a clinician who used the word “positive likelihood ratio”. (Van Den Ende 2005)

28 Test Accuracy Measures Disease (Reference test result) Present (D+) Absent (D-) Index Test + True Positives False Positives TP+FP - False Negatives True Negatives FN+TN TP+FNFP+TN TP+FP+ FN+TN Sensitivity TP / (TP+FN) Specificity TN / (TN+FP) Positive Predictive Value TP / (TP+FP) Negative Predictive Value TN / (TN+FN) Disease as Reference Class Test Result as Reference Class DOR = True Positives X True Negatives False Negatives X False Positives

29 Theoretical: understanding and applying test accuracy information Pre- test probability and test accuracy estimation “research has shown that clinicians’ estimates of probability vary widely and are often itself, clinical experience appears insufficient to guide accurate probability estimation” (Richardson 2003) “We rarely know what the sensitivities, specificities or likelihood ratios are for tests. At best clinicians carry a general impression about their usefulness” (Gill 2005) Contextual variation in test accuracy Unfortunately, it is often not realised that there can be no generally valid estimates of a test’s sensitivity, specificity or likelihood ratio that apply to all patients of a particular population, nor should such values be sought” (Moons 2003)

30 Theoretical: factors affecting testing behaviour Clinicians are uncomfortable with uncertainty Testing ‘risk’ is context dependent “...some physicians order all the tests that may be even remotely applicable in a given clinical situation. Such a practice may comfort the patient and enhance the physician’s belief that all diagnostic avenues have been pursued, but more tests do not necessarily produce more certainty..... we continue to test excessively, partly because of our discomfort with uncertainty.” (Kassirer 1989) “.. feelings of uncertainty regarding medical problems can differ depending on the situation, not only because one physician may be faced with more complicated diagnostic puzzles than the other, but also, and primarily because the consequences of a vague and uncertain diagnosis may vary in each situation.” (Zaat (1992)

31  High levels of error observed for estimates of the accuracy of particular tests. Estimates were based on clinical experience of test use, rather than published evidence.  Between-person variation in estimation of disease prevalence (pre-test probability) for any one disease was considerable (25-100%)  Confusion with interpretation of test accuracy metrics (for example sensitivity and specificity confused with positive and negative predictive values).

32  Research based on premise that probability revision is necessary for diagnostic decision making... (32/33 empirical studies) investigated the ability of respondents to undertake probability revision  Average proportion of respondents able to undertake probability revision 46%, range 0% - 33% for practising clinicians, 33%-73% academic clinicians  Presentation of test accuracy as frequencies rather than proportions or percentages appears to facilitate probability revision  …. However only 3% of respondents reported using probability revision in clinical practice

33 Probability revision..... A serum test screens pregnant women for babies with Down’s syndrome. The test is a very good one but not perfect. Roughly 1% of babies have Down’s syndrome. If the baby has Down’s syndrome, there is a 90% chance that the result will be positive. If the baby is unaffected, there is still a 1% chance that the result will be positive. A pregnant woman has been tested and the result is positive. What is the chance that her baby actually has Down’s syndrome? (0.9) x (0.01) (probability of a TP) (0.9 x 0.01)+(0.01 x 0.99 ) (probability of a TP or a FP) Bramwell 2006; BMJ 333(7562):

34 Facilitation of probabilistic reasoning: frequencies Probabilistic representation Frequency representation  The prevalence of disease is 10% (0.1). 100 patients 10 with disease  90 no disease 8 test +ve 2 test -ve 10 test +ve 80 test -ve Probability of disease if test +ve 8 18 Probability of disease if test +ve (0.8) x (0.1) (0.8 x 0.1)+(0.11 x 0.9)  The probability of testing positive if you have disease is 80% (0.8)  The probability of testing negative if you do not have disease is 89% (0.89)  The probability of testing positive even if you do not have disease is 11% (0.11)

35 “It would just confirm what we already know, doctors, on the whole, struggle with these concepts”

36 Survey Objectives  To identify which sources of test accuracy information are used by primary care clinicians and barriers to their use  To evaluate the utility of existing test accuracy metrics as measured by self-reported familiarity, perceived ability to define metrics and self- reported use of metrics in clinical practice  To investigate whether there is consistency in the application of different test accuracy metrics and graphics across a common scenario

37 Survey Methods: distribution  Incentivised, electronic survey hosted by a professional network of ~200,000 GMC registered doctors with access to approximately of general practitioners across the UK ( )  Sample size of 200 pre-specified

38 Survey Results: respondent characteristics  224/215 participants accessing the survey (95%) completed the survey in full  Number of years since qualification in the specialty ranged from 0-41 (median 14 years)  11% had work responsibilities that might result in greater knowledge about test accuracy (GP trainer; GP with an academic position; GP involved in policy)  13% of respondents had undertaken training that included test accuracy interpretation in the last 3 years.

39 “Please estimate how often you use the following test accuracy information sources as part of your clinical work”




43 A new biological marker for ovarian cancer has been identified and is available as a blood test for use in primary care. A 57 year old asymptomatic woman presents to you concerned about her risk of ovarian cancer and you perform the blood test at her request.” TEST ACCURACY INFORMATION PRESENTED IN ONE OF NINE DIFFERENT FORMATS  “If the test came back positive would you refer the woman for further investigation?  If the test came back negative would you be confident not to investigate further at this point in time?”

44  Sensitivity and Specificity  Sensitivity and Specificity (frequencies)  Predictive values  Predictive Values (frequencies)  Likelihood ratios  Pre to post test probability  Diagnostic Odds Ratio  Annotated 2x2 Diagnostic contingency table  Annotated pictogram

45  Sensitivity and Specificity: “The marker has a sensitivity of 76% and a specificity of 98%”  Sensitivity and Specificity (frequencies): “Of every 100 women with ovarian cancer, 76 would test positive (be detected by the test) but 24 would test negative (be missed). Of every 100 women without ovarian cancer, 98 would test negative (receive a correct diagnosis) but 2 would test positive (be falsely labelled as having cancer).”

46 Annotated 2x2 table Women with confirmed ovarian cancer (based on surgery and long term clinical follow up) Women confirmed free of ovarian cancer (based on surgery and long term clinical follow up) New blood test for detecting ovarian cancer: POSITIVE RESULT 31 women with ovarian cancer correctly test +ve with the new blood test (TRUE +VES) 26 women without ovarian cancer incorrectly test +ve with the new blood test (FALSE +VES) New blood test for detecting ovarian cancer: NEGATIVE RESULT 10 women with ovarian cancer incorrectly test –ve with the new blood test (FALSE -VES) 1293 women without ovarian cancer correctly test –ve with the new blood test (TRUE -VES) 41 women, in total, with confirmed ovarian cancer 1319 women, in total, confirmed free of ovarian cancer 1360 women tested in total

47 Pictograph

48 Survey RESULTS: Scenarios :“ If the test came back POSITIVE would you refer the woman for further investigation?”

49 Survey RESULTS: Scenarios :“If the test came back NEGATIVE would you be confident not to investigate further at this point in time?” YES – would not investigate further / would not refer NO – would investigate further / would refer

50  Obligation to test further: -“Would probably investigate (on the basis of a negative test result) but aware all further tests may be negative” -“I would refer -ve result here even - would be difficult to defend if subsequently turned out to have ovarian carcinoma.” -“Patient choice as well - but if she wanted further referral I would do this”

51  : SENSITIVITY AND SPECIFICITY: RESPONDENT EMPHASIS ON FALSE NEGATIVES (more important test error in this testing context) -“ will miss about 1/4 +ve cases ” -“24% ?false negative.....too high” -“there are a lot of falsely negative results ”  PREDICTIVE VALUES: RESPONDENT EMPHASIS ON FALSE POSITIVES (low prevalence testing context) - “A lot of healthy women would be investigated due to a positive result” -“Would have concerns over doing test in view of high false positive rate.”

52 2X2 TABLE: FALSE NEGATIVES AND FALSE POSITIVES - “Too many false positives-they nearly equal the true positives. It is much better at helping you predict who does not have ovarian cancer but still too many false negatives” - “1303 women had negative tests. You cannot send all of these for further investigation; you would swamp the system. The 10 false -ves will just have to come in if they develop symptoms.”

53  Predictive Values : “ Not familiar with terminology here, presume PPV and NPV correspond with sensitivity and specificity but I would need to check”  Diagnostic Odds Ratio: “DOR is this good or bad ?” “Would need guidelines to follow here because I have no experience of the DOR”  Likelihood Ratios: ”I do not understand the LR terminology” “Not sure what LR - or + means” “I have no experience of using likelihood ratios so would have to research before deciding on next course of action”

54 -“Is this the same data being presented with different indices? Scary how presenting the same data differently induces different behaviour!” -“It’s slightly scary how the way this is presented can change the way you feel about the results.” - “Interesting - when asked this question earlier I would have referred -ve result patient, but realising now I can confidently say she has only a 0.6% chance of having the disease I would explain this to her” -“Too many false negatives for me to feel comfortable when presented in this way.”

55  The degree to which testing context is reflected in question formulation, conduct and reporting of systematic reviews appears limited  Inadequacies in contextualisation of review methods appear to reflect a deficiency in methodological approach rather than poor reporting of methods.  94 % of test accuracy reviews included a clinician as a co- author.  No relationship between review quality and contextualisation of review reporting was observed suggesting AMSTAR and QUORUM did not capture this dimension of review quality.

56  Ability to understand and apply test accuracy metrics and contextual factors are both likely to be contributing factors to the observed effect of test accuracy metric on diagnostic decision making.  Sensitivity and specificity are understood by a significant proportion of respondents, but it is unclear to what extent this is due to familiarity as opposed to their intuitive nature.  Predictive values and likelihood ratios do not appear to be well understood.

57  Test errors (false negatives and false positives) appear prominent as part of the translational pathway from summary estimates of test accuracy to management decisions.  ‘Raw data’ in the from of the 2x2 diagnostic table appears to reduce framing effects introduced by summary test accuracy metrics

58  For practice: - Medical education - Dissemination of test evaluation research and guideline development  For research: - Further investigation in more diverse testing contexts -What do decision makers want / need

59 “I t’s clinical medicine... not based on any form of probability. That’s gambling with lives.”

Download ppt "Clare Davenport Senior Clinical Lecturer Public Health, Epidemiology and Biostatistics University of Birmingham Chris Hyde Professor of Public Health and."

Similar presentations

Ads by Google