Receiver Operating Characteristic Curve (ROC) Analysis for Prediction Studies Ruth O’Hara, Helena Kraemer, Jerome Yesavage, Jean Thompson, Art Noda, Joy.

Slides:



Advertisements
Similar presentations
Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.
Advertisements

Sample size estimation
Speed of processing, the missing measure in early detection of MCI? Ruth O’Hara March 13 th 2001 Yogesh Shah.
New England Journal of Medicine October 18;367: Relapse Risk after Discontinuation of Risperidone in Alzheimer’s disease Molly Moncrieff.
MemTrax (Computerized Memory Screen) American Association of Geriatric Psychiatry (AAGP) March 2, 2007 J. Wesson Ashford, M.D., Ph.D. Stanford / VA Aging.
Receiver Operating Characteristic (ROC) Curves
Evaluating Diagnostic Accuracy of Prostate Cancer Using Bayesian Analysis Part of an Undergraduate Research course Chantal D. Larose.
Chance, bias and confounding
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 6.1 Chapter Six Probability.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Sample size computations Petter Mostad
BS704 Class 7 Hypothesis Testing Procedures
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
PSY 307 – Statistics for the Behavioral Sciences
Today Concepts underlying inferential statistics
Inferential Statistics
Sampling and Participants
Screening and Early Detection Epidemiological Basis for Disease Control – Fall 2001 Joel L. Weissfeld, M.D. M.P.H.
DATASET INTRODUCTION 1. Dataset: Urine 2 From Cleveland Clinic
Evaluating Classifiers
Chapter 2: The Research Enterprise in Psychology
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Multiple Choice Questions for discussion
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
When is it safe to forego a CT in kids with head trauma? (based on the article: Identification of children at very low risk of clinically- important brain.
Diagnostic Cases. Goals & Objectives Highlight Bayesian and Boolean processes used in classic diagnosis Demonstrate use/misuse of tests for screening.
Lecture 4: Assessing Diagnostic and Screening Tests
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 8 Introduction to Hypothesis Testing
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
The Effectiveness of Psychodynamic Therapy and Cognitive Behavior Therapy in the Treatment of Personality Disorders: A Meta-Analysis. By Falk Leichsenring,
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
1 SCREENING. 2 Why screen? Who wants to screen? n Doctors n Labs n Hospitals n Drug companies n Public n Who doesn’t ?
CpSc 810: Machine Learning Evaluation of Classifier.
MEASURES OF TEST ACCURACY AND ASSOCIATIONS DR ODIFE, U.B SR, EDM DIVISION.
Statistical test for Non continuous variables. Dr L.M.M. Nunn.
Do Instrumental Activities of Daily Living Predict Dementia at 1- and 2- Year Follow-Up? Findings from the Development of Screening Guidelines and Diagnostic.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Evaluating Results of Learning Blaž Zupan
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Diagnostic Tests Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /7/20151.
1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Unit 15: Screening. Unit 15 Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
The Diagnostic Process A BRIEF OVERVIEW diagnostic process What is it? to figure out to problem solve method scheme.
Screening.  “...the identification of unrecognized disease or defect by the application of tests, examinations or other procedures...”  “...sort out.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Chapter 5 Assessment: Overview INTRODUCTION TO CLINICAL PSYCHOLOGY 2E HUNSLEY & LEE PREPARED BY DR. CATHY CHOVAZ, KING’S COLLEGE, UWO.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Timothy Wiemken, PhD MPH Assistant Professor Division of Infectious Diseases Diagnostic Tests.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Cognitive Testing, Statistics and Dementia Ralph J. Kiernan Ph.D. 14 th May 2013.
© 2010 Jones and Bartlett Publishers, LLC. Chapter 12 Clinical Epidemiology.
Biomedical Research Centre for Mental Health and Dementia Unit at South London and Maudsley NHS Foundation Trust the Institute of Psychiatry, King’s College.
Receiver Operator Characteristics What is it and where does it come from Statistical aspects Use of ROC.
Screening Tests: A Review. Learning Objectives: 1.Understand the role of screening in the secondary prevention of disease. 2.Recognize the characteristics.
Is suicide predictable? Paul St John-Smith Short Courses in Psychiatry 15/10/2008.
Introduction.
Evaluating Results of Learning
Dr Gayan Perera Epidemiologist
Strategies to incorporate pharmacoeconomics into pharmacotherapy
How do we delay disease progress once it has started?
Dr. Muhammad Ajmal Zahid Chairman, Department of Psychiatry,
Chapter 10 Introduction to the Analysis of Variance
Evidence Based Diagnosis
Presentation transcript:

Receiver Operating Characteristic Curve (ROC) Analysis for Prediction Studies Ruth O’Hara, Helena Kraemer, Jerome Yesavage, Jean Thompson, Art Noda, Joy Taylor, Jared Tinklenberg Receiver Operating Characteristic Curve (ROC) Analysis for Prediction Studies Ruth O’Hara, Helena Kraemer, Jerome Yesavage, Jean Thompson, Art Noda, Joy Taylor, Jared Tinklenberg Stanford University, Department of Psychiatry and Behavioral Sciences Stanford University School of Medicine Sierra Pacific MIRECC Veterans Affairs Palo Alto Health Care System

Clinical practice is often “hit or miss” therapy Clinical practice is often “hit or miss” therapy Try one thing, if that does not work, try another Try one thing, if that does not work, try another This is frustrating for the patient and expensive This is frustrating for the patient and expensive The Goal: find the best treatment for the patient with specific characteristics The Goal: find the best treatment for the patient with specific characteristics New news in psychiatry; old hat in internal medicine New news in psychiatry; old hat in internal medicine The Clinical Need for Signal Detection Procedures

Receiver Operating Characteristic Curve (ROC) Analysis Signal Detection Technique Signal Detection Technique Traditionally used to evaluate diagnostic tests Traditionally used to evaluate diagnostic tests Now employed to identify subgroups of a population at differential risk for a specific outcome (clinical decline, treatment response) Now employed to identify subgroups of a population at differential risk for a specific outcome (clinical decline, treatment response) Identifies moderators Identifies moderators

Receiver Operating Characteristic Curve (ROC) Analysis Historical Development

ROC Analysis: Historical Development (1) Derived from early radar in WW2 Battle of Britain to address: Accurately identifying the signals on the radar scan to predict the outcome of interest – Enemy planes – when there were many extraneous signals (e.g. Geese)? Derived from early radar in WW2 Battle of Britain to address: Accurately identifying the signals on the radar scan to predict the outcome of interest – Enemy planes – when there were many extraneous signals (e.g. Geese)?

ROC Analysis: Historical Development (2) True Positives = Radar Operator interpreted signal as Enemy Planes and there were Enemy planes (Good Result: No wasted Resources) True Positives = Radar Operator interpreted signal as Enemy Planes and there were Enemy planes (Good Result: No wasted Resources) True Negatives = Radar Operator said no planes and there were none (Good Result: No wasted resources) True Negatives = Radar Operator said no planes and there were none (Good Result: No wasted resources) False Positives = Radar Operator said planes, but there were none (Geese: wasted resources) False Positives = Radar Operator said planes, but there were none (Geese: wasted resources) False Negatives = Radar Operator said no plane, but there were planes (Bombs dropped: very bad outcome) False Negatives = Radar Operator said no plane, but there were planes (Bombs dropped: very bad outcome)

ROC Analysis: Historical Development Sensitivity = Probability of correctly interpreting the radar signal as Enemy planes among those times when Enemy planes were actually coming Sensitivity = Probability of correctly interpreting the radar signal as Enemy planes among those times when Enemy planes were actually coming SE = True Positives / True Positives + False NegativesSE = True Positives / True Positives + False Negatives Specificity = Probability of correctly interpreting the radar signal as no Enemy planes among those times when no Enemy planes were actually coming Specificity = Probability of correctly interpreting the radar signal as no Enemy planes among those times when no Enemy planes were actually coming SP = True Negatives / True Negatives + False PositivesSP = True Negatives / True Negatives + False Positives

ROC: Prediction of Enemy Planes by RAF Radar Operators

Receiver Operating Characteristic Curve (ROC) Analysis Applications: Evaluating Medical Tests

ROC Analysis: Evaluating Medical Tests The evaluation of the ability of a diagnostic test to identify a disease involves considering: The evaluation of the ability of a diagnostic test to identify a disease involves considering: P=Prevalence = occurrence in the population of the outcome of interest (e.g. disease) P=Prevalence = occurrence in the population of the outcome of interest (e.g. disease) True Positives True Positives True Negatives True Negatives False Positives False Positives False Negatives False Negatives P=Prevalence=True Positives + False Negatives P=Prevalence=True Positives + False Negatives

ROC Analysis: Medical Test Evaluation True Positives = Test states you have the disease when you do have the disease True Positives = Test states you have the disease when you do have the disease True Negatives = Test states you do not have the disease when you do not have the disease True Negatives = Test states you do not have the disease when you do not have the disease False Positives = Test states you have the disease when you do not have the disease False Positives = Test states you have the disease when you do not have the disease False Negatives = Test states you do not have the disease when you do False Negatives = Test states you do not have the disease when you do

ROC Analysis: Evaluating Medical Tests Sensitivity =The probability of having a positive test result among those with a positive diagnosis for the disease Sensitivity =The probability of having a positive test result among those with a positive diagnosis for the disease SE = True Positives / True Positives + False NegativesSE = True Positives / True Positives + False Negatives Specificity = The probability of having a negative test result among those with a negative diagnosis for the disease Specificity = The probability of having a negative test result among those with a negative diagnosis for the disease SP = True Negatives / True Negatives + False PositivesSP = True Negatives / True Negatives + False Positives

The Basic Tool: 2X2 Test+Test- O+TP(a)FN(b) P(a + b) O-FP(c)TN(d)P'=1-P Q(a + c) Q'=1-Q Sensitivity (SE)=a/PSpecificity (SP)=d/P’

ROC: GDS (Test) for Diagnosis of Clinically Confirmed Depression

Which Test Do You Use: Medical Tests Evaluation GDS: SE =.80; SP =.85 GDS: SE =.80; SP =.85 Beck Depression Inventory: SE =.85; SP =.75 Beck Depression Inventory: SE =.85; SP =.75 Major Depression Inventory = SE =.66; SP =.63 Major Depression Inventory = SE =.66; SP =.63

ROC Analysis ROC first calculates Sensitivity and Specificity ROC first calculates Sensitivity and Specificity Quality Indices measures the quality of the sensitivity and specificity Quality Indices measures the quality of the sensitivity and specificity ROC computes the quality indices for each predictor to find the ones with optimal sensitivity and specificity ROC computes the quality indices for each predictor to find the ones with optimal sensitivity and specificity

To Detect the Optimal Sensitivity and Specificity Depends on the relative CLINICAL importance of false negatives versus false positives. W=1 means only false negatives matter. W=0 means only false positives matter. W=1/2 means both matter equally. Analytically: Use weighted kappa.

ROC Analysis P = TP + FNP’= 1- (TP + FN) P = TP + FNP’= 1- (TP + FN) Q = TP + FPQ’= 1- (TP + FP) Q = TP + FPQ’= 1- (TP + FP) EFF = TP + TN EFF = TP + TN κ(0.5, 0) = [ (TP + TN) - (TP + FN)(TP+FP) - (1-(TP + FN)(1-(TP + FP))] κ(0.5, 0) = [ (TP + TN) - (TP + FN)(TP+FP) - (1-(TP + FN)(1-(TP + FP))] [1 – (TP + FN)(TP+FP) - (1-(TP + FN))(1-(TP + FP))]

ROC Plane and “Curve” (P,P) (Q,Q) Random ROC Ideal Point ROC “curve”

Receiver Operating Characteristic Curve (ROC) Analysis Applications Identifying Predictors of Clinical Outcome

ROC Analysis: Prediction Studies (Dr. Kraemer) ROC can identify predictors/characteristics ROC can identify predictors/characteristics of patients that are at differential risk for a specific outcome of interest. e.g. What are the Characteristics of AD Patients at risk for rapid decline and are high priority for treatment? of patients that are at differential risk for a specific outcome of interest. e.g. What are the Characteristics of AD Patients at risk for rapid decline and are high priority for treatment? What are the clinical predictors of Alzheimer Disease patients who are “good responders” (or “poor responders”) to cholinesterase inhibitor treatments? What are the clinical predictors of Alzheimer Disease patients who are “good responders” (or “poor responders”) to cholinesterase inhibitor treatments? Useful in “real world” clinical medicine where multiple variables affect the clinical outcome and patients seldom have one pure diagnosis Useful in “real world” clinical medicine where multiple variables affect the clinical outcome and patients seldom have one pure diagnosis

ROC: Identifying Predictors of an Outcome 1. ROC relates a predictor (test) to the clinical outcome of interest (Diagnosis/Gold Standard) 1. ROC relates a predictor (test) to the clinical outcome of interest (Diagnosis/Gold Standard) 2. ROC searches all predictors and their associated cut-points 2. ROC searches all predictors and their associated cut-points 3. ROC determines which predictor and associated cut-point yields the optimal sensitivity and specificity for identifying the outcome of interest yielding two groups at differential risk for the outcome 3. ROC determines which predictor and associated cut-point yields the optimal sensitivity and specificity for identifying the outcome of interest yielding two groups at differential risk for the outcome

ROC: Identifying Predictors of an Outcome 4. ROC is an iterative process that is then rerun automatically for each group yielded in Step 3. in order to examine which predictor and associated cut-point may further divide the groups 4. ROC is an iterative process that is then rerun automatically for each group yielded in Step 3. in order to examine which predictor and associated cut-point may further divide the groups 5. ROC will keep searching within each group yielded until one of three stopping rules apply (see Stopping rule slide) 5. ROC will keep searching within each group yielded until one of three stopping rules apply (see Stopping rule slide) 6. ROC thus identifies subgroups of individuals that are at increased risk for the outcome of interest 6. ROC thus identifies subgroups of individuals that are at increased risk for the outcome of interest

ROC Analysis: Advantages and Disadvantages ROC Analysis: Advantages and Disadvantages No assumptions of normal distribution No assumptions of normal distribution Multiple predictors can be evaluated simultaneously Multiple predictors can be evaluated simultaneously Indicates interactions among predictors Indicates interactions among predictors Indicates cut-points on these predictors Indicates cut-points on these predictors Yields clinically relevant information Yields clinically relevant information Non-hypothesis testing Non-hypothesis testing Requires large samples Requires large samples Capitalizes on chance: needs stringent stopping rule Capitalizes on chance: needs stringent stopping rule

ROC Analysis: Procedure Start with large sample size Start with large sample size Define the outcome of interest (always binary) Define the outcome of interest (always binary) Choose Success/Failure criteria Choose Success/Failure criteria Select predictor variables of interest (as many as you like) Select predictor variables of interest (as many as you like) Run ROC Program that systematically finds best predictors for Success/Failure Run ROC Program that systematically finds best predictors for Success/Failure

The Basic Tool: 2X2 RF+RF- O+TP(a)FN(b) P(a + b) O-FP(c)TN(d)P'=1-P Q(a + c) Q'=1-Q Sensitivity (SE)=a/PSpecificity (SP)=d/P’

ROC: Identifying Predictors & Their Cut-points Dichotomous Variables such as Gender: Dichotomous Variables such as Gender: ROC calculates the Se and Sp for Female vs. MaleROC calculates the Se and Sp for Female vs. Male For Continuous Variables such as Age: For Continuous Variables such as Age: ROC would calculate Se and Sp for the cut- point of 60 vs ….85; then could calculate for cut-point of vs ….85, and so forth.ROC would calculate Se and Sp for the cut- point of 60 vs ….85; then could calculate for cut-point of vs ….85, and so forth.

ROC: Gender as Predictor of Clinically Confirmed Depression

ROC: Identifying Predictors & Their Cut-points Dichotomous Variables: ROC calculates the Se and Sp for Female vs. Male, Aphasia vs. No Aphasia, etc. Dichotomous Variables: ROC calculates the Se and Sp for Female vs. Male, Aphasia vs. No Aphasia, etc. For Continuous Variables such as Age: For Continuous Variables such as Age: ROC would calculate Se and Sp for the cut- point of 60 vs ….85; then could calculate for cut-point of vs ….85, and so forth.ROC would calculate Se and Sp for the cut- point of 60 vs ….85; then could calculate for cut-point of vs ….85, and so forth.

ROC: Age as Predictor of Clinically Confirmed Depression

Receiver Operating Characteristic Curve (ROC) Analysis Conducting the ROC: An Example

ROC Analysis: Procedure Start with large sample size Start with large sample size Define the outcome of interest Define the outcome of interest Choose Success/Failure criteria Choose Success/Failure criteria Identify predictor variables of interest Identify predictor variables of interest Run ROC Program that systematically finds best predictors for Success/Failure Run ROC Program that systematically finds best predictors for Success/Failure

ROC Analysis: Example Population under investigation: 1, 472 AD patients from 10 Centers with a 12 month follow-up Population under investigation: 1, 472 AD patients from 10 Centers with a 12 month follow-up Clinically significant outcome: More rapid decline as defined by a loss of 3 or more MMSE points per year, post-visit Clinically significant outcome: More rapid decline as defined by a loss of 3 or more MMSE points per year, post-visit O'Hara R et al. (2002). Which Alzheimer patients are at risk for rapid cognitive decline? J Geriatr Psychiatry Neurol;15(4):233-8.

Predictor Variables Age-at -patient-visit Age-at -patient-visit Reported age of symptom onset Reported age of symptom onset Gender Gender Years of education Years of education Ethnicity Ethnicity MMSE score MMSE score Living Arrangement Living Arrangement Presence of Aphasia Presence of Aphasia Presence of Hallucinations Presence of Hallucinations Presence of Extrapyramidal Signs Presence of Extrapyramidal Signs

Stopping Rules No more possibilities (rare!) No more possibilities (rare!) Inadequate sample size Inadequate sample size Optimal test (if ‘a priori’) would not have been statistically significant (p<.001) Optimal test (if ‘a priori’) would not have been statistically significant (p<.001)

Figure 10.3 P=.53 N=512 (100%) P=.53 ROC Decision Tree for IHDP Control group with outcome of low IQ at age 3. (w= 0.5) Non-minority Minority P=.70 N = 321 (63%) P=.70 P=.25 N = 191 (37%) P=.25 P=.81 N=211 (41%) P=.81 P=.48 N=110 (21%) P=.48 P=.45 N=87 (17%) P=.45 Bayley Mental Dev. Index < 115 Mother never attended college P=.09 N=104 (20%) P=.09 Mother attended college Bayley Mental Dev. Index ≥ 115 P=.91 N=131 (26%) P=.91 P=.65 N=80 (16%) P=.65 P=.30 N=57 (11%) P=.30 P=.73 N=30 (6%) P=.73 P=.19 N=43 (8%) P=.19 P=.02 N=61 (12%) P=.02 Bayley Mental Dev. Index<106 Bayley Mental Dev. Index≥106 Bayley Mental Dev. Index<106 Bayley Mental Dev. Index≥106 Graduated from college Attended, did not graduate

ROC Plane and “Swarm” of Points ROC “curve”

To Detect the Optimal Sensitivity and Specificity Depends on the relative CLINICAL importance of false negatives versus false positives. W=1 means only false negatives matter. W=0 means only false positives matter. W=1/2 means both matter equally. Analytically: Use weighted kappa. Geometrically: Draw a line through the Ideal Point with slope determined by P and w. Push this line down until it just touches the ROC “curve”. That point is optimal.

ROC Analysis: Conclusion Yields Clinically Relevant Information Yields Clinically Relevant Information Identifies complex interactions Identifies complex interactions Identifies individuals with different characteristics but at the same risk for the clinically relevant outcome Identifies individuals with different characteristics but at the same risk for the clinically relevant outcome Identifies individuals at the least risk Identifies individuals at the least risk Can take differential clinical costs of false positives and false negatives into account Can take differential clinical costs of false positives and false negatives into account

Conclusion It is not sufficient to identify risk factors or even to identify moderators and mediators etc. or a structural model. It is not sufficient to identify risk factors or even to identify moderators and mediators etc. or a structural model. It is necessary to present and interpret the results so that clinicians, policy makers, consumers, other researchers can apply them. It is necessary to present and interpret the results so that clinicians, policy makers, consumers, other researchers can apply them. ROC trees are one method to accomplish this purpose. ROC trees are one method to accomplish this purpose.