Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interpreting Diagnostic Tests

Similar presentations


Presentation on theme: "Interpreting Diagnostic Tests"— Presentation transcript:

1 Interpreting Diagnostic Tests
Ian McDowell Department of Epidemiology & Community Medicine January 2010 Note to users: you may find the additional notes & explanations in the ppt notes panel helpful.

2 Objectives To understand sources of error in typical measurements
To understand sensitivity, specificity To explain the implications of false positives and false negatives To understand predictive values, And Likelihood ratios

3 Road map to date This session considers the interpretation of diagnostic tests, a daily issue in clinical practice. It builds on some of the ideas introduced last term: Measurements: validity, bias determinants of bias Applying conclusions from a study sample to an individual patient Contrasts between research on hospital patients and community practice Evidence-based practice

4 The Challenge of Clinical Measurement
Diagnoses are based on information, from formal measurements and/or from your clinical judgment. This information is seldom perfectly accurate: Random errors can occur (machine not working?) Biases in judgment or measurement can occur (“this kid doesn’t look sick”) Due to biological variability, this patient may not fit the general rule Diagnosis (e.g., hypertension) involves a categorical judgment; this often requires dividing a continuous score (blood pressure) into categories. Choosing the cutting-point is challenging.

5 Therefore… You need to be aware …
Diagnostic judgments are based on probabilities; That using a quantitative approach is better than just guessing! That you will gradually become familiar with the typical accuracy of measurements in your chosen clinical field; That the principles apply to both diagnostic and screening tests; Of some of the ways to describe the accuracy of a measurement. This approach to thinking about interpreting tests applies to all types of clinical information-gathering, including general clinical observations, history taking and forma tests.

6 Why choose one test and not another?
Reliability: consistency or reproducibility; this considers chance or random errors (which sometimes increase, sometimes decrease, scores). “Is it measuring something?” Validity: “Is it measuring what it is supposed to measure?” By extension, “what diagnostic conclusion can I draw from a particular score on this test?” Validity may be affected by bias, which refers to systematic errors (these fall in a certain direction) Safety, Acceptability, Cost, etc. Costs are becoming important due to the increasing cost of care; it is probably no longer adequate to call for a complete blood scan, especially if you don’t really know what conclusion you would draw from a particular result on a particular test. The philosophy is to only call for tests that you need to eliminate specific diagnostic possibilities Safety is perhaps a minor concern, but we always hesitate to needlessly expose patients to radiation, etc. Reliability errors arise from a wide range of sources: laboratory error, dirty reagents, contaminated samples, mis-reading the scale and so forth. Reliability is a characteristic of the measurement system as a whole (taking the blood, storing it, analysing it…) Validity is concerned more with the interpretation of the test results: what conclusions can I draw from the test? Does it tell me what I wanted to know? 6

7 Reliability and Validity
Low High Biased result! Validity Low The target metaphor represents reliability by the consistency of shooting: doe the arrows hit the same area consistently? Validity is represented by how close the arrows fall to the centre. The case at the lower left is theoretically possible, although a bit strange: here the measures are unreliable, but if you average them you will get an accurate picture of what is going on. An illustration might come from asking a patient about their depression; you ask six questions, each of which has some bias in the way it is phrased, but overall these biases cancel each other out. Clearly this is just a theoretical possibility (included here for completeness) and we wish to avoid this situation if at all possible! High Average of these inaccurate results is not bad. This is probably how screening questionnaires (e.g., for depression) work

8 Ways of Assessing Validity
Content or “Face” validity: does it make clinical or biological sense? Does it include the relevant symptoms? Criterion: comparison to a “gold standard” definitive measure (e.g., biopsy, autopsy) Expressed as sensitivity and specificity Construct validity (this is used with abstract themes, such as “quality of life” for which there is no definitive standard) Content validity of a diagnostic test is often judged against set criteria. A good example comes from depression measurements: the American Psychiatric Association sets diagnostic standards for common conditions. These illustrate the symptoms of various categories of depression (for example) and if the rating scale or questionnaire you are using covers these, it may be judged to have content validity. (American Psychiatric Association, Diagnostic and statistical manual of mental disorders (DSM-IV). American Psychiatric Association, Washington, DC, 1994.) For the criterion validity, see pages 95, of Beaglehole “Basic Epidemiology” textbook; also see chap. 4 in “Clinical Epidemiology” by DL Sackett, RB Haynes et al (Little, Brown, Co, Boston, 1991) 8

9 Criterion validation: “Gold Standard”
The criterion that your clinical observation or simple test is judged against: more definitive (but expensive or invasive) tests, such as a complete work-up, or the clinical outcome (for screening tests, when workup of well patients is unethical). Sensitivity and specificity are calculated from a research study comparing the test to a gold standard. 9

10 “2 x 2” table for validating a test
Gold standard Disease Disease Present Absent Test score: Test positive Test negative a (TP) b (FP) c (FN) d (TN) Validity: Sensitivity Specificity = a/(a+c) = d/(b+d) = TP/Diseased = TN/Healthy TP = true positive; FP = false positive… Golden Rule: always calculate based on the gold standard

11 A Bit More on Sensitivity
= Test’s ability to detect disease when it is present a/(a+c) = TP/(TP+FN) = TP/disease Mnemonics: - a sensitive person is one who is aware of your feelings - (1 – seNsitivity) = false Negative rate = how many cases are missed by the screening test? Not that your feelings are diseased, of course! 11

12 …and More on Specificity
Precision of the test a specific test would identify only that type of disease. “Nothing else looks like this” a highly specific test generates few false positives. So, If the result is positive, you can be confident the patient has this diagnosis. Mnemonics: (1- sPecificity) = false Positive rate (How many are falsely classified as having the disease?) 12

13 Problems Resulting from Test Errors
False Positives can arise due to other factors (such as taking other medications, diet, etc.) They entail the cost and danger of further investigations, labeling, worry for the patient. This is similar to Type I or alpha error in a test of statistical significance (the possibility of falsely concluding that there is an effect of an intervention). False Negatives imply missed cases, so potentially bad outcomes if untreated Cf. Type II or beta error: the chance of missing a true difference “Labeling” refers to the possible adverse consequences of being reported as having a disease. In extreme cases, this may lead to loss of a job, or being socially excluded. Obviously it hinges on who is told about the diagnosis, and patient confidentiality is a central principle. However, sometimes you are obliged to inform authorities (e.g. in cases of transmissible and serious diseases. (See “reportable diseases” on the I&PH web site ) 13

14 Most Tests Provide a Continuous Score. Selecting a Cutting Point
Test scores for a healthy population Sick population Back at the beginning, remember that we mentioned that one difficulty in reaching a diagnosis is that most biological processes are continuous, yet the diagnosis implies dividing people into 2 categories. We should always think of disease as a continuum, but there is still a binary decision to be made: to treat or not to treat. The dividing line may be more or less arbitrary, and has shifted over the years. The more we learn (for example) about the long term consequences of mild elevations of blood pressure, the more physicians are willing to start people on anti-hypertensive medications at relatively low BP. But this can lead to criticisms that people who are within the normal range of BP are being “medicalised”… Sometimes, it feels as though you just can’t win. Care has to be negotiated with the patient; it makes it much simpler when the patient can understand these subtleties. Healthy scores Pathological scores Possible cut-point Move this way to increase sensitivity (include more of sick group) Move this way to increase specificity (exclude healthy people) Crucial issue: changing cut-point can improve sensitivity or specificity, but never both

15 Choosing the cut-point
Choice depends on relative implications of false positive and false negative errors Choose a low cut-point (= increase sensitivity) If results of missing a case (FN) are important If cost of further diagnostic confirmation is not high e.g. phenylketonuria PKU test. Choose a higher cut-point If implications of a false positive are serious e.g., alpha-fetoprotein test for Down’s syndrome during pregnancy

16 Clinical applications
D + D - a b c d T + T - A specific test can be useful to rule in a disease. Why? Very specific tests give few false positives. So, if the result is positive, you can be sure the patient has the condition (‘nothing else would give this result’): “SpPin” A sensitive test can be useful for ruling a disease out: A negative result on a very sensitive test (which detects all true cases) reassures you that the patient does not have the disease: “SnNout” This one often catches people out, and you may well meet experienced clinicians who still get confused. Spend some good quality after-dinner time to wrap your head around it. The following thoughts may help you: Ruling in and out is not the same as screening for a disease. Screening concerns a population; here, a sensitive test will help you find cases with the disease, but it will also pick up many who do not have the disease. 1) Ruling in is concerned with this patient only, and once you have got a positive result: does this positive result prove they have the disease? Hence, you need a specific test (which only identifies this type of disease). 2) Sadly, you still may have missed the disease (false negative), since a specific test may be low in sensitivity. 3) If someone tries to argue that a very sensitive test rules disease in, point out to them that, to achieve high sensitivity, the cut-point on the test has probably been moved to include quite a lot of healthy people too (see last slide), so it doesn’t prove that this patient really does have the disease. So, it is the specificity that concerns us in ruling a disease in.

17 Your Patient’s Question: “Doctor, how likely am I to have this disease
Your Patient’s Question: “Doctor, how likely am I to have this disease?” This introduces Predictive Values Sensitivity & specificity don’t answer this, because they work from the gold standard. Now you need to work from the test result, but you won’t know whether this person is a true positive or a false positive (or a true or false negative). Hmmm… How accurately does a positive (or negative) result predict disease (or health)?

18 Start from Prevalence Before you do any test, the best guide you have to a diagnosis is based on prevalence: Common conditions (in this population) are the more likely diagnosis Prevalence indicates the ‘pre-test probability of disease’

19 2 x 2 table: Prevalence Disease present Disease absent Total
Test positive a b a+b Test negative c d c+d a+c b+d N Prevalence = a+c / N

20 Positive and Negative Predictive Values
Based on rows, not columns Positive Predictive Value (PPV) = a/(a+b) = Probability that a positive score is a true positive NPV = d/(c+d); same for a negative test result BUT… there’s a big catch: We are now working across the columns, so PPV & NPV depend on how many cases of disease there are (prevalence). As prevalence goes down, PPV goes down (it’s harder to find the smaller number of cases) and NPV rises. So, PPV and NPV must be determined for each clinical setting, But they are immediately useful to clinician: they reflect this population, so tell us about this patient D + D – a b c d T + T –

21 Prevalence and Predictive Values
A. Specialist referral hospital B. Primary care D D - D D - 50 10 50 100 T + T - T + T - 5 100 5 1000 Sensitivity = 50/55 = 91% Specificity = 100/110 = 91% Prevalence = 55/165 = 33% Sensitivity = 50/55 = 91% Specificity = 1000/1100 = 91% Prevalence = 55/1155 = 3% The first point here is that predictive value results derived fro a hospital setting will not apply if you are using the same test in a community (family practice) setting. The second point is that if you indiscriminately administer tests in a primary care setting, you are certain to get a lot of false positives. Hence, be wise in whom you test! PPV = 50/60 = 83% NPV = 100/105 = 95% PPV = 50/150 = 33% NPV = 1000/1005 = 99.5%

22 Predictive Values High specificity = few FPs: Sp = TN/(TN+FP); FPs also drive PPV: PPV = TP/(TP + FP); So, the clinician is more certain that a patient with a positive test has the disease (it rules in the disease) The higher the sensitivity, the higher the NPV: Sn = TP/(TP+FN); NPV = TN/(TN+FN); the clinician can be more confident that a patient with a negative score does not have the diagnosis (because there are few false negatives). So, high NPV can rule out a disease.

23 From the literature you can get Sensitivity & Specificity
From the literature you can get Sensitivity & Specificity. To work out PPV and NPV for your practice, you need to guess prevalence, then work backwards: Fill cells in following order: “Truth” Disease Disease Total Predictive Present Absent Values Test Pos Test Neg Total 4th 5th 7th 6th 8th 9th 10th 11th 2nd rd 1st (from estimated prevalence) (from sensitivity) (from specificity)

24 Gasp…! Isn’t there an easier way to do all this…?
Yes (good!) But first, you need a couple more concepts (less good…) We said that before you apply a test, prevalence gives your best guess about the chances that this patient has the disease. This is known as “Pretest Probability of Disease”: (a+c) / N in the 2 x 2 table: It can also be expressed as odds of disease: (a+c) / (b+d), as long as the disease is rare Odds introduces Bayesian estimation; you already met odds ratios in the session on study designs, under “case-control studies”. (Remember them? – the odds of getting a disease due to a risk factor. oh joy). Odds ratios were a useful way of summarizing the effect of a causal factor when you do not have information on the whole population, which is the way odds are helping us out here. This is on the margins of what you need to know as a clinician, but you should be aware of the existence of this thinking: it may become more mainstream over your time in practice. Alternatively, palm pilots and other gizmos (nanotechnology mathematical neurons implanted in your brain?) will figure it all out for you. a b c d N

25 This Leads to … Likelihood Ratios
Defined as the odds that a given level of a diagnostic test result would be expected in a patient with the disease, as opposed to a patient without: true positive rate / false positive rate [TP / FP] Advantages: Combines sensitivity and specificity into one number Can be calculated for many levels of the test Can be turned into predictive values LR for positive test = Sensitivity / (1-Specificity) LR for negative test = (1-Sensitivity) / Specificity For the mathematically inclined: Calculating Likelihood Ratios from Bayes’ Theorem: For this you must express pretest probability as odds: Pretest odds = prevalence/(1-prevalence) Convert sensitivity and specificity to likelihood ratios LR+ = sens/(1-spec); bigger is better (>>1) LR- = (1-sens)/spec; smaller is better (<<1) Post-test odds = pretest odds * LR (+ or -) Convert to posttest probability if desired… Posttest probability = post-test odds / (1+posttest odds) Remember that: Positive post-test probability = PPV Negative post-test probability = 1-NPV

26 Practical application: a Nomogram
You need the LR for this test Plot the likelihood ratio on center axis (e.g., LR+ = 20) 3) Select pretest probability (prevalence) on left axis (e.g. Prevalence = 30%) 4) Draw line through these points to right axis to indicate post-test probability of disease Interpretation: in a population with a base rate of 30% prevalence, a patient with a positive score on this test is more than 90% likely to have the disease. The test has greatly increased the information you have on this patient – but a test with a LR+ of 20 is a good test. Example: Post-test probability = 91%

27 There is another way to combine sensitivity and specificity: Meet Receiver Operating Characteristic (ROC) curves Work out Sen and Spec for every possible cut-point, then plot these. Area under the curve indicates the information provided by the test 1-Specificity ( = false positives) Sensitivity 0.2 0.4 0.6 0.8 1 In an ideal test, the blue line would reach the top left corner. For a useless test it would lie along the diagonal: no better than guessing This is really a slide for nerds. Others please Relax... You may well hear about ROC curves, but you are Really Unlikely to get examined on this in the next few years. But some of the neat connections are as follows: Remember I defined validity of a test in terms of the interpretation you can place on its result? Well, this is close to speaking of the information it can give you, and the ROC curve summarizes (across all possible cut-points) the information the test can provide. ROC curves were developed as part of the problem of sorting out signals from noise in radar applications, and so the false positives represent “noise” or things you are not interested in detecting (clouds, etc, rather than another aircraft).

28 Chaining LRs Together (1)
Example: 45 year-old woman presents with “chest pain” Based on her age, pretest probability that a vague chest pain indicates CAD is about 1% Take a fuller history. She reports a 1-month history of intermittent chest pain, suggesting angina (substernal pain; radiating down arm; induced by effort; relieved by rest…) LR of this history for angina is about 100 This illustrates a practical application in a clinical setting. It also shows you how much further you come out ahead if you interpret diagnostic tests serially (one after the other), interpreting the next test on the basis of the results of previous tests. It can be more informative than applying a raft of tests together and then trying to interpret the pattern of results. Problem is: to actually administer tests in sequence may take more time (so not feasible in general practice).

29 The previous example: 1. From the History:
She’s young; pretest probability about 1% Pretest probability rises to 50% based on history LR 100

30 Chaining LRs Together (2)
45 year-old woman with 1-month history of intermittent chest pain… After the history, post test probability is now about 50%. What will you do? A more precise (but also more costly) test: Record an ECG Results = 2.2 mm ST-segment depression. LR for ECG 2.2 mm result = 10. This raises post test probability to > 90% for coronary artery disease (see next slide) For the mathematically inclined: Calculating Likelihood Ratios: For this you must express pretest probability as odds: Pretest odds = prevalence/(1-prevalence) Convert sensitivity and specificity to likelihood ratios LR+ = sens/(1-spec); bigger is better (>>1) LR- = (1-sens)/spec; smaller is better (<<1) Post-test odds = pretest odds * LR (+ or -) Convert the odds to posttest probability if desired… Posttest probability = post-test odds / (1+posttest odds) Remember that: Positive post-test probability = PPV Negative post-test probability = 1-NPV

31 The previous example: ECG Results
Post-test probability now rises to 90% Now start pretest probability (i.e. 50%, prior to ECG, based on history)


Download ppt "Interpreting Diagnostic Tests"

Similar presentations


Ads by Google