Receiver Operating Characteristic (ROC) Curves

Slides:

Advertisements

Similar presentations

Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.

Advertisements

Lecture 3 Validity of screening and diagnostic tests

Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.

Learning Algorithm Evaluation

Evaluation of segmentation. Example Reference standard & segmentation.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.

Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.

Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.

Azita Kheiltash Social Medicine Specialist Tehran University of Medical Sciences Diagnostic Tests Evaluation.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.

Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Evaluation of Screening and Diagnostic Tests.

Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Anthropometry Technique of measuring people Measure Index Indicator Reference Information.

Evaluating Hypotheses

Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.

Statistical Fridays J C Horrow, MD, MSSTAT

Jeremy Wyatt Thanks to Gavin Brown

ROC & AUC, LIFT ד"ר אבי רוזנפלד.

How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.

Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.

Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.

DATASET INTRODUCTION 1. Dataset: Urine 2 From Cleveland Clinic

Evaluating Classifiers

Screening Test for Occult Cancer 100 patients with occult cancer: 95 have "x" in their blood 100 patients without occult cancer: 95 do not have "x" in.

BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.

SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.

Multiple Choice Questions for discussion

Lecture 4: Assessing Diagnostic and Screening Tests

1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.

Basic statistics 11/09/13.

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 5: Classification Trees: An Alternative to Logistic.

Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.

How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)

Sensitivity Sensitivity answers the following question: If a person has a disease, how often will the test be positive (true positive rate)? i.e.: if the.

1 SCREENING. 2 Why screen? Who wants to screen? n Doctors n Labs n Hospitals n Drug companies n Public n Who doesn’t ?

Probability. Statistical inference is based on a Mathematics branch called probability theory. If a procedure can result in n equally likely outcomes,

BIOE 301 Lecture Thirteen. Review of Lecture 12 The burden of cancer Contrasts between developed/developing world How does cancer develop? Cell transformation.

CpSc 810: Machine Learning Evaluation of Classifier.

MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.

Appraising A Diagnostic Test

CT image testing. What is a CT image? CT= computed tomography CT= computed tomography Examines a person in “slices” Examines a person in “slices” Creates.

Likelihood 2005/5/22. Likelihood  probability I am likelihood I am probability.

1 Risk Assessment Tests Marina Kondratovich, Ph.D. OIVD/CDRH/FDA March 9, 2011 Molecular and Clinical Genetics Panel for Direct-to-Consumer (DTC) Genetic.

Evaluating Results of Learning Blaž Zupan

Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.

Screening of diseases Dr Zhian S Ramzi Screening 1 Dr. Zhian S Ramzi.

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.

Diagnostic Test Characteristics: What does this result mean

Evaluating Classification Performance

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.

Laboratory Medicine: Basic QC Concepts M. Desmond Burke, MD.

Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun.

ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.

Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.

Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.

SCREENING FOR DISEASE. Learning Objectives Definition of screening; Principles of Screening.

7. Performance Measurement

Evaluating Results of Learning

Patricia Butterfield & Naomi Chaytor October 18th, 2017

Computational Intelligence: Methods and Applications

Roc curves By Vittoria Cozza, matr

Evaluating Classifiers

More on Maxent Env. Variable importance:

Evidence Based Diagnosis

Presentation transcript:

Receiver Operating Characteristic (ROC) Curves Assessing the predictive properties of a test statistic – Decision Theory

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion TP  TP = True Positive

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion FP  FP = False Positive

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion FN  FN = False Negative

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion TN  TN = True Negative

Binary Prediction Problem Conceptual Framework Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N

Binary Prediction Problem Conceptual Framework

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N Accuracy = Probability that the test yields a correct result. = (TP+TN) / (P+N)

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N Sensitivity = Probability that a true case will test positive = TP / P Also referred to as True Positive Rate (TPR) or True Positive Fraction (TPF).

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N Specificity = Probability that a true negative will test negative = TN / N Also referred to as True Negative Rate (TNR) or True Negative Fraction (TNF).

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N 1-Specificity = Prob that a true negative will test positive = FP / N Also referred to as False Positive Rate (FPR) or False Positive Fraction (FPF).

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N Positive Predictive Value (PPV) = Probability that a positive test will truly have disease = TP / (TP+FP)

Binary Prediction Problem Test Properties True Disease Status Pos Neg Test Criterion TP FP FN TN P N P+ N Negative Predictive Value (NPV) = Probability that a negative test will truly be disease free = TN / (TN+FN)

Binary Prediction Problem Example True Disease Status Pos Neg Test Criterion 27 173 200 73 727 800 100 900 1000 Se = 27/100 = .27 Acc = (27+727)/1000 = .75 Sp = 727/900 = .81 PPV = 27/200 = .14 FPF = 1- Sp = .19 NPV = 727/800 = .91

Binary Prediction Problem Test Properties Of these properties, only Se and Sp (and hence FPR) are considered invariant test characteristics. Accuracy, PPV, and NPV will vary according to the underlying prevalence of disease. Se and Sp are thus “fundamental” test properties and hence are the most useful measures for comparing different test criteria, even though PPV and NPV are probably the most clinically relevant properties.

ROC Curves Now assume that our test statistic is no longer binary, but takes on a series of values (for instance how many of five distinct risk factors a person exhibits). Clinically we make a rule that says the test is positive if the number of risk factors meets or exceeds some threshold (#RF > x) Suppose our previous table resulted from using x = 4. Let’s see what happens as we vary x.

ROC Curves Impact of using a threshold of 3 or more RFs True Disease Status Pos Neg Test Criterion 45 200 245 55 700 755 100 900 1000 200 800 .27 .75 Se = 27/100 = .45 Acc = (27+727)/1000 = .75 .81 .14 Sp = 727/900 = .78 PPV = 27/200 = .18 .91 FPF = 1- Sp = .22 NPV = 727/800 = .93 Se , Sp , and interestingly both PPV and NPV 

ROC Curves Summary of all possible options Threshold TPR FPR 6 0.00 5 0.10 0.11 4 0.27 0.19 3 0.45 0.22 2 0.73 1 0.98 0.80 1.00 As we relax our threshold for defining “disease,” our true positive rate (sensitivity) increases, but so does the false positive rate (FPR). The ROC curve is a way to visually display this information.

ROC Curves Summary of all possible options x=5 x=4 x=2 The diagonal line shows what we would expect from simple guessing (i.e., pure chance). Threshold TPR FPR 6 0.00 5 0.10 0.11 4 0.27 0.19 3 0.45 0.22 2 0.73 1 0.98 0.80 1.00 What might an even better ROC curve look like?

ROC Curves Summary of a more optimal curve Threshold TPR FPR 6 0.00 5 0.10 0.01 4 0.77 0.02 3 0.90 0.03 2 0.95 0.04 1 0.99 0.40 1.00 Note the immediate sharp rise in sensitivity. Perfect accuracy is represented by upper left corner.

ROC Curves Use and interpretation The ROC curve allows us to see, in a simple visual display, how sensitivity and specificity vary as our threshold varies. The shape of the curve also gives us some visual clues about the overall strength of association between the underlying test statistic (in this case #RFs that are present) and disease status.

ROC Curves Use and interpretation The ROC methodology easily generalizes to test statistics that are continuous (such as lung function or a blood gas). We simply fit a smoothed ROC curve through all observed data points.

ROC Curves Use and interpretation See demo from www.anaesthetist.com/mnm/stats/roc/index.htm

ROC Curves Area under the curve (AUC) The total area of the grid represented by an ROC curve is 1, since both TPR and FPR range from 0 to 1. The portion of this total area that falls below the ROC curve is known as the area under the curve, or AUC.

Area Under the Curve (AUC) Interpretation The AUC serves as a quantitative summary of the strength of association between the underlying test statistic and disease status. An AUC of 1.0 would mean that the test statistic could be used to perfectly discriminate between cases and controls. An AUC of 0.5 (reflected by the diagonal 45° line) is equivalent to simply guessing.

Area Under the Curve (AUC) Interpretation The AUC can be shown to equal the Mann-Whitney U statistic, or equivalently the Wilcoxon rank statistic, for testing whether the test measure differs for individuals with and without disease. It also equals the probability that the value of our test measure would be higher for a randomly chosen case than for a randomly chosen control.

Area Under the Curve (AUC) Interpretation FPR TPR 1 ROC Curve AUC ~ 0.540 controls cases

Area Under the Curve (AUC) Interpretation ~ .95 TPR 1 FPR ROC Curve controls cases

Area Under the Curve (AUC) Interpretation What defines a “good” AUC? Opinions vary Probably context specific What may be a good AUC for predicting COPD may be very different than what is a good AUC for predicting prostate cancer

Area Under the Curve (AUC) Interpretation http://gim.unmc.edu/dxtests/roc3.htm .90-1.0 = excellent .80-.90 = good .70-.80 = fair .60-.70 = poor .50-.60 = fail Remember that <.50 is worse than guessing!

Area Under the Curve (AUC) Interpretation www.childrens-mercy.org/stats/ask/roc.asp .97-1.0 = excellent .92-.97 = very good .75-.92 = good .50-.75 = fair

ROC Curves Comparing multiple ROC curves Suppose we have two candidate test statistics to use to create a binary decision rule. Can we use ROC curves to choose an optimal one?

ROC Curves Comparing multiple ROC curves Adapted from curves at: http://gim.unmc.edu/dxtests/roc3.htm

ROC Curves Comparing multiple ROC curves http://en.wikipedia.org/wiki/Receiver_operating_characteristic

ROC Curves Comparing multiple ROC curves We can formally compare AUCs for two competing test statistics, but does this answer our question? AUC speaks to which measure, as a continuous variable, best discriminates between cases and controls? It does not tell us which specific cutpoint to use, or even which test statistic will ultimately provide the “best” cutpoint.

ROC Curves Choosing an optimal cutpoint The choice of a particular Se and Sp should reflect the relative costs of FP and FN results. What if a positive test triggers an invasive procedure? What if the disease is life threatening and I have an inexpensive and effective treatment? How do you balance these and other competing factors? See excellent discussion of these issues at www.anaesthetist.com/mnm/stats/roc/index.htm

ROC Curves Generalizations These techniques can be applied to any binary outcome. It doesn’t have to be disease status. In fact, the use of ROC curves was first introduced during WWII in response to the challenge of how to accurately identify enemy planes on radar screens.

ROC Curves Final cautionary notes We assume throughout the existence of a gold standard for measuring “disease,” when in practice no such gold standard exists. COPD, asthma, even cancer (can we truly rule out the absence of cancer in a given patient?) As a result, even Se and Sp may not be inherently stable test characteristics, but may vary depending on how we define disease and the clinical context in which it is measured. Are we evaluating the test in the general population or only among patients referred to a specialty clinic? Incorrect specification of P and N will vary in these two settings.