Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun.

Slides:



Advertisements
Similar presentations
Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.
Advertisements

Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine November, 2004.
Evaluation of segmentation. Example Reference standard & segmentation.
Module 6 “Normal Values”: How are Normal Reference Ranges Established?
Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.
Receiver Operating Characteristic (ROC) Curves
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
TUTORIAL SCREENING Dr. Salwa Tayel, Dr. A. Almazam, Dr Afzal Mahmood
Azita Kheiltash Social Medicine Specialist Tehran University of Medical Sciences Diagnostic Tests Evaluation.
1 Using Biostatistics to Evaluate Vaccines and Medical Tests Holly Janes Fred Hutchinson Cancer Research Center.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Evaluation of Screening and Diagnostic Tests.
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
Statistical Fridays J C Horrow, MD, MSSTAT
Baye’s Rule and Medical Screening Tests. Baye’s Rule Baye’s Rule is used in medicine and epidemiology to calculate the probability that an individual.
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
(Medical) Diagnostic Testing. The situation Patient presents with symptoms, and is suspected of having some disease. Patient either has the disease or.
DIAGNOSTIC TESTS Assist. Prof. E. Çiğdem Kaspar Yeditepe University Faculty of Medicine Department of Biostatistics and Medical Informatics.
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
Interpreting Diagnostic Tests
Screening Test for Occult Cancer 100 patients with occult cancer: 95 have "x" in their blood 100 patients without occult cancer: 95 do not have "x" in.
Statistics in Screening/Diagnosis
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
Basic statistics 11/09/13.
Reliability of Screening Tests RELIABILITY: The extent to which the screening test will produce the same or very similar results each time it is administered.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Sensitivity Sensitivity answers the following question: If a person has a disease, how often will the test be positive (true positive rate)? i.e.: if the.
1 Interpreting Diagnostic Tests Ian McDowell Department of Epidemiology & Community Medicine January 2012 Note to readers: you may find the additional.
Screening and Diagnostic Testing Sue Lindsay, Ph.D., MSW, MPH Division of Epidemiology and Biostatistics Institute for Public Health San Diego State University.
BIOE 301 Lecture Thirteen. Review of Lecture 12 The burden of cancer Contrasts between developed/developing world How does cancer develop? Cell transformation.
CpSc 810: Machine Learning Evaluation of Classifier.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Appraising A Diagnostic Test
CT image testing. What is a CT image? CT= computed tomography CT= computed tomography Examines a person in “slices” Examines a person in “slices” Creates.
Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.
Evidence-Based Medicine Diagnosis Component 2 / Unit 5 1 Health IT Workforce Curriculum Version 1.0 /Fall 2010.
SCREENING Dr. Aliya Hisam Community Medicine Dept. Army Medical College, RWP.
TESTING A TEST Ian McDowell Department of Epidemiology & Community Medicine January 2008.
Screening of diseases Dr Zhian S Ramzi Screening 1 Dr. Zhian S Ramzi.
Screening and its Useful Tools Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
HSS4303B – Intro to Epidemiology Feb 8, Agreement.
Positive Predictive Value and Negative Predictive Value
1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.
Predictive values prevalence CK and acute myocardial infarction –sensitivity 70% –specificity 80% –prevalence - 40% –prevalence - 20% –PPV and NPV.
Diagnostic Tests Studies 87/3/2 “How to read a paper” workshop Kamran Yazdani, MD MPH.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
1 Medical Epidemiology Interpreting Medical Tests and Other Evidence.
Session 8: Paired Samples (Zar, Chapter 9,24). General: One population of subjects: x 1, x 2, …, x n, but a pair of data points on each. Examples: Before.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.
SCREENING FOR DISEASE. Learning Objectives Definition of screening; Principles of Screening.
Biostatistics Board Review Parul Chaudhri, DO Family Medicine Faculty Development Fellow, UPMC St Margaret March 5, 2016.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Receiver Operator Characteristics What is it and where does it come from Statistical aspects Use of ROC.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Copyright © 2009 Pearson Education, Inc. 4.4 Statistical Paradoxes LEARNING GOAL Investigate a few common paradoxes that arise in statistics, such as how.
Performance of a diagnostic test Tunisia, 31 Oct 2014
Diagnostic Test Studies
Probability and Statistics
Classification Evaluation And Model Selection
Measuring Success in Prediction
How do we delay disease progress once it has started?
Screening, Sensitivity, Specificity, and ROC curves
Evaluating Classifiers
Evidence Based Diagnosis
Presentation transcript:

Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun

TODAY’S EXAMPLE Why a physician needs biostatistics?

Understanding the “Statistics” A 50-year-old woman, no symptoms, participates in routine mammography screening. She tests positive, is alarmed, and wants to know from you whether she has breast cancer for certain or what the chances are. Apart from the screening results, you know nothing else about this woman. How many women who test positive actually have breast cancer?

Additional Info The probability that a woman has breast cancer is 1% ("prevalence") If a woman has breast cancer, the probability that she tests positive is 90% ("sensitivity") If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9% (”false positive rate")

Your answer??? a) nine in 10 (90%) b) eight in 10 (80%) c) one in 10 (10%) d) one in 100 (1%)

ATTENTION !! The fact that 90% of women with breast cancer get a positive result from a mammogram (sensitivity) doesn't mean that 90% of women with positive results have breast cancer.

Prevalance Sensitivity False Positive Rate

Answer Total positive test results among 1,000 women = 98 Only 9 of them are actually having cancer How many women who test positive actually have breast cancer? ◦ 9/98 =~ one in 10 (10%) The high false positive rate, combined with the disease's prevalence of 1%, means that roughly nine out of 10 women with a worrying mammogram don't actually have breast cancer.

What Doctors Do with the Question? In one trial, almost half the group of 160 gynecologists responded that the woman's chance of having cancer was nine in 10 (90%). Only 21% said that the figure was one in 10 (10%) - which is the correct answer. That's a worse result than if the doctors had been answering at random (25%).

What Happens When Doctor Does Not Explain the Right Probabilities to the Patient? How few specialists understand the risk a woman with a positive mammogram result is worrying We can only imagine how much anxiety those innumerate doctors cause in women This may even lead to unnecessary cancer treatment to healthy woman Research suggests that months after a mammogram false alarm, up to a quarter of women are still affected by the process on a daily basis.

EVALUATION OF DIAGNOSTIC TESTS

The “Gold Standard” : What is a Gold Standard ? Biopsy results, pathological evaluation, radiological procedures, prolonged follow up, autopsies Almost always more costly, invasive, less feasible Lack of objective standards of disease (e.g. angina Pectoris: Gold standard is careful history taking)

Diagnostic Characteristics It is not hypothesis testing BUT ◦ How well does the test identify patients with a disease? ◦ How well does the test identify patients without a disease?

Evaluation of the Diagnostic Test Give a group of people (with and without the disease) both tests (the candidate test and the “gold standard” test) and then cross-classify the results and report the diagnostic characteristics of the test.

Truth or Gold Standard +- Candidate Test + a (TP) b (FP) - c (FN) d (TN) A perfect test would have b and c equal to 0

Diagnostic Characteristics Sensitivity: The probability that a diseased individual will be identified as “diseased” by the test = P(T + / D + ) = a/(a+c) Specificity: The probability that an individual without the disease will be identified as “healthy” by the test = P(T - / D - ) = d/(b+d)

Diagnostic Characteristics False positive rate= Given a subject without the disease, the probability that he will have a positive test result ◦ P(T + / D - ) = b/(b+d) = 1 – Specificity False negative rate= Given a subject with the disease, the probability that he will have a negative test result ◦ P(T - / D + ) = c/(a+c) = 1 – Sensitivity

Predictive Values of Diagnostic Tests More informative from the patient or physician perspective Special applications of Bayes Theorem

Predictive Values of Diagnostic Tests Positive Predictive Value: The probability that an individual with a positive test result has the disease = P(D + / T + ) = a/(a+b)

Predictive Values of Diagnostic Tests Negative Predictive Value: The probability that an individual with a negative test result does not have the disease = P(D - / T - ) = d/(c+d)

A LAST SIMPLE EXAMPLE TO SUM IT UP

True Disease Status PosNeg Test Criterion Pos Neg Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status PosNeg Test Criterion Pos Neg Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status PosNeg Test Criterion Pos TP Neg TP = True Positive

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos Neg

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos FP  Neg FP = False Positive

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos Neg

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos Neg FN  FN = False Negative

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos Neg

Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status PosNeg Test Criterion Pos Neg TN TN = True Negative

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Suppose we have a test statistic for predicting the presence or absence of disease.

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Accuracy = Probability that the test yields a correct result. = (TP+TN) / (P+N)

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Sensitivity = Probability that a true case will test positive = TP / P Also referred to as True Positive Rate (TPR) or True Positive Fraction (TPF).

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Specificity = Probability that a true negative will test negative = TN / N Also referred to as True Negative Rate (TNR) or True Negative Fraction (TNF).

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N False Negative Rate = Prob that a true positive will test negative = FN / P = 1 - Sensitivity Also referred to as False Negative Fraction (FNF).

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N False Positive Rate = Prob that a true negative will test positive = FP / N = 1 - Specificity Also referred to as False Positive Fraction (FPF).

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Positive Predictive Value (PPV) = Probability that a positive test will truly have disease = TP / (TP+FP)

True Disease Status PosNeg Test Criterion Pos TPFP Neg FNTN PNP+ N Negative Predictive Value (NPV) = Probability that a negative test will truly be disease free = TN / (TN+FN)

True Disease Status PosNeg Test Criterion Pos Neg /100 =.27Se = S p =727/900 =.81 FPR = 1- Spe =.19 Acc = (27+727)/1000 =.75 PPV = 27/200 =.14 NPV = 727/800 =.91 FNR = 1- S en =.73

ROC CURVE

Introduction to ROC curves ROC = Receiver Operating Characteristic The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battle fields Soon introduced to psychology to account for perceptual detection of stimuli. During World War II, for the analysis of radar signals. Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of correctly detected Japanese aircraft from their radar signals.

ROC ROC ROC Receiver Operating Characteristics ROC analysis is developed for the signal receivers in radars Basic aim was to distinguish the enemy signals from normal signals It is a graphical analysis method

Development of Receiver Operating Characteristics (ROC) Curves

If you decrease the threshold (cut off), sensitivity will increase. You will be able to catch every (enemy) plane signals. However, noise in the data will also increase so that you will not be able to progress

ROC curve in this example includes alternative threshold (cut off) values and beware that the sensitivity and specificity will simultaneously change as we change the threshold. Remember, some signals are from the enemy planes while some are from normal.

ROC Analysis “ROC analysis since then has been used in medicine, radiology, biometrics, and other areas for many decades.” In medicine, ROC analysis has been extensively used in the evaluation of diagnostic tests. ROC curves are also used extensively in epidemiology and medical research Evidence-based medicine. In radiology, ROC analysis is a common technique to evaluate new radiology techniques. Can be used to compare tests & procedures

ROC Curves Use and interpretation The ROC methodology easily generalizes to test statistics that are continuous (such as lung function or a blood gas). The ROC curve allows us to see, in a simple visual display, how sensitivity and specificity vary as our threshold varies. The shape of the curve also gives us some visual clues about the overall strength of association between the underlying test statistic and disease status.

Example Test Result People with disease People without the disease

Test Result Call these patients “negative”Call these patients “positive” Threshold

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease True Positives Some definitions...

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False Positives

Test Result Call these patients “negative”Call these patients “positive ” without the disease with the disease True negatives

Test Result Call these patients “negative”Call these patients “positive” without the disease with the disease False negatives

Test Result without the disease with the disease ‘‘-’’‘‘+’’ Moving the Threshold: right

Test Result without the disease with the disease ‘‘-’’‘‘+’’ Moving the Threshold: left

Diseased Healthy DiseasedHealthy GOLD STANDARD ALTERNATIVE TEST FrequencyFrequency Test parameter, mg/dl GOLD STANDARD

Diseased Healthy ALTERNATIVE TEST FrequencyFrequency Test parameter, mg/dl Healthy Diseased ALTERNATIVE TEST FrequencyFrequency Test parameter, mg/dl

FN False Negative TP True Positive TN True Negative FP False Positive Diseased Healthy GOLD STANDARD FrequencyFrequency Test parametresi, mg/dl ALTERNATIVE TEST TPFN FP TN Positive outcomeNegative outcome

FN FP TN TP FN FP TN TP

Sensitivity and Specificity Sensitivity Ability of a test to correctly diagnose the real patients. Sensitivity = TP / ( TP + FN ) Specificity Ability of a test to correctly diagnose the real healthy people. Specificity = TN / ( TN + FP ) TPFN FP TN

FN TP TN FP “Receiver Operating Characteristic” Curve Measured Value Frequency ı ı ı ı ı ı ı ı ı ı ı Sensitivity Specificity ı ı ı ı ı ı ı ı ı ı ı It is the graphical representation of all sensitivity and specificity combinations for every possible threshold (cut off) value. Aim is to differenciate the diseased and healthy subjects. Sensitivity : 25 / 25 = 1.00 Specificity: 0 / 25 = 0.00 Sensitivity : 25 / 25 = 1.00 Specificity: 1 / 25 = 0.04 Sensitivity : 25 / 25 = 1.00 Specificity: 3 / 25 = 0.12 Sensitivity : 25 / 25 = 1.00 Specificity: 5 / 25 = Sensitivity: 24 / 25 = 0.96 Specificity: 8 / 25 =

“Receiver Operating Characteristic” Curve Frequency Measured value 1 1 Sensitivity Specificity 0 Area Under the Curve (AUC) shows the diagnostic performance of a test. AUC is between 0.5 and 1.0

We can use ROC curves to compare the diagnostic performances of more than one alternative tests. “Receiver Operating Characteristic” Curve Frequency Measure d value Frequency Measured value Test 2 Test Sen Spe 0

True Positive Rate (sensitivity) 0% 100% False Positive Rate (1-specificity) 0% 100% ROC curve

True Positive Rate 0%0% 100% False Positive Rate 0%0% 100% True Positive Rate 0%0% 100% False Positive Rate 0%0% 100% A good test: A poor test: ROC curve comparison

Best Test: Worst test: True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % The distributions don’t overlap at all The distributions overlap completely (Tossing a coin) ROC curve extremes

Area under ROC curve (AUC) Overall measure of test performance Comparisons between two tests based on differences between (estimated) AUC For continuous data, AUC equivalent to Mann- Whitney U-statistic (nonparametric test of difference in location between two populations)

True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % AUC = 50% AUC = 90% AUC = 65% AUC = 100% True Positive Rate 0%0% 100% False Positive Rate 0%0% 100 % AUC for ROC curves

Interpretation of AUC AUC can be interpreted as the probability that the test result from a randomly chosen diseased individual is more indicative of disease than that from a randomly chosen healthy individual No clinically relevant meaning