Presentation is loading. Please wait.

Presentation is loading. Please wait.

ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.

Similar presentations


Presentation on theme: "ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005."— Presentation transcript:

1 ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005

2 Why Should I Care? Imagine you have 2 different probabilistic classification models –e.g. logistic regression vs. neural network How do you know which one is better? How do you communicate your belief? Can you provide quantitative evidence beyond a gut feeling and subjective interpretation?

3 Recall Basics: Contingencies MODEL PREDICTED It’s NOT a Heart Attack Heart Attack!!! GOLD STANDARD TRUTH Was NOT a Heart Attack AB Was a Heart Attack CD

4 Some Terms MODEL PREDICTED NO EVENTEVENT GOLD STANDARD TRUTH NO EVENT TRUE NEGATIVE B EVENTC TRUE POSITIVE

5 Some More Terms MODEL PREDICTED NO EVENTEVENT GOLD STANDARD TRUTH NO EVENTA FALSE POSITIVE (Type 1 Error) EVENT FALSE NEGATIVE (Type 2 Error) D

6 Accuracy What does this mean? What is the difference between “accuracy” and an “accurate prediction”? Contingency Table Interpretation (True Positives) + (True Negatives) + (False Positives) + (False Negatives) Is this a good measure? (Why or Why Not?)

7 Note on Discrete Classes TRADITION … Show contingency table when reporting predictions of model. BUT … probabilistic models do not provide discrete calculations of the matrix cells!!! IN OTHER WORDS … Regression does not report the number of individuals predicted positive (e.g. has a heart attack) … well, not really INSTEAD … report probability the output will be certain variable (e.g. 1 or 0)

8 Visual Perspective ??

9 ROC Curves Originated from signal detection theory –Binary signal corrupted by Guassian noise –What is the optimal threshold (i.e. operating point)? Dependence on 3 factors –Signal Strength –Noise Variance –Personal tolerance in Hit / False Alarm Rate

10 ROC Curves Receiver operator characteristic Summarize & present performance of any binary classification model Models ability to distinguish between false & true positives

11 Use Multiple Contingency Tables Sample contingency tables from range of threshold/probability. TRUE POSITIVE RATE (also called SENSITIVITY) True Positives (True Positives) + (False Negatives) FALSE POSITIVE RATE (also called 1 - SPECIFICITY) False Positives (False Positives) + (True Negatives) Plot Sensitivity vs. (1 – Specificity) for sampling and you are done

12 Data-Centric Example TRUTHLOGISTICNEURAL 10.71980.9038 00.24600.8455 00.12190.4655 00.15600.3204 00.75270.2491 10.30640.7129 00.71940.4983 00.55310.6513 10.21730.3806 00.08390.1619 10.84290.7028

13 ROC Rates LOGISTIC REGRESSIONNEURAL NETWORK THRESHOLDTP-RateFP-RateTP-RateFP-Rate 11111 0.910.857111 0.810.571410.8571 0.70.750.428610.7143 0.60.50.42860.750.5714 0.5 0.42860.750.2857 0.40.50.28570.750.2857 0.30.50.28570.750.1429 0.20.250 0.1429 0.1000.250 00000

14 ROC Plot LOGISTIC NEURAL

15 Sidebar: Use More Samples (These are plots from a much larger dataset – See Malin 2005)

16 ROC Quantification Area Under ROC Curve –Use quadrature to calculate the area –e.g. trapz (trapezoidal rule) function in Matlab will work Example – Appears “Neural Network” model is better. AREA UNDER ROC CURVE LOGISTIC0.7321 NEURAL0.7679

17 Theory: Model Optimality Classifiers on convex hull are always “optimal” –e.g. Net & Tree Classifiers below convex hull are always “suboptimal” –e.g. Naïve Bayes Naïve Bayes Decision Tree Neural Net

18 Building Better Classifiers Classifiers on convex hull can be combined to form a strictly dominant hybrid classifier –ordered sequence of classifiers –can be converted into “ranker” Decision Tree Neural Net

19 Some Statistical Insight Curve Area: –Take random healthy patient  score of X –Take random heart attack patient  score of Y –Area estimate of P [Y > X] Slope of curve is equal to likelihood: P (score | Signal) P (score | Noise) ROC graph captures all information in conting. table –False negative & true negative rates are complements of true positive & false positive rates, resp.

20 Can Always Quantify Best Operating Point When misclassification costs are equal, best operating point is … 45  tangent to curve closest to (0,1) coord. Verify this mathematically (economic interpretation) Why?

21 Quick Question Are ROC curves always appropriate? Subjective operating points? Must weight the tradeoffs between false positives and false negatives –ROC curve plot is independent of the class distribution or error costs This leads into utility theory (not touching this today)

22 Much Much More on ROC Oh, if only I had more time. You should also look up and learn about: –Iso-accuracy lines –Skew distributions and why the 45  line isn’t always “best” –Convexity vs. non-convexity vs. concavity –Mann-Whitney-Wilcoxon sum of ranks –Gini coefficient –Calibrated thresholds –Averaging ROC curves Precision-Recall (THIS IS VERY IMPORTANT) Cost Curves

23 Some References Good Bibliography: http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html Drummond C and Holte R. What ROC curves can and can’t do (and cost curves can). In Proceedings of the Workshop on ROC Analysis in AI; in conjunction with the European Conference on AI. Valencia, Spain. 2004. Malin B. Probabilistic prediction of myocardial infarction: logistic regression versus simple neural networks. Data Privacy Lab Working Paper WP-25, School of Computer Science, Carnegie Mellon University. Sept 2005. McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984; 4: 137-50. Provost F and Fawcett T. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15 th International Conference on Machine Learning. Madison, Wisconsin. 1998: 445-453. Swets J. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): 1285-1293. (based on his 1967 book Information Retrieval Systems)


Download ppt "ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005."

Similar presentations


Ads by Google