...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting.

Slides:



Advertisements
Similar presentations
...visualizing classifier performance Tobias Sing Dept. of Modeling & Simulation Novartis Pharma AG Joint work with Oliver Sander (MPI for Informatics,
Advertisements

Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Curva ROC figuras esquemáticas Curva ROC figuras esquemáticas Prof. Ivan Balducci FOSJC / Unesp.
Sensitivity, Specificity and ROC Curve Analysis.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Jeremy Wyatt Thanks to Gavin Brown
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
ROC Curve and Classification Matrix for Binary Choice Professor Thomas B. Fomby Department of Economics SMU Dallas, TX February, 2015.
Notes on Measuring the Performance of a Binary Classifier David Madigan.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
SPH 247 Statistical Analysis of Laboratory Data May 19, 2015SPH 247 Statistical Analysis of Laboratory Data1.
Medical decision making. 2 Predictive values 57-years old, Weight loss, Numbness, Mild fewer What is the probability of low back cancer? Base on demographic.
Evaluation – next steps
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
From Genome-Wide Association Studies to Medicine Florian Schmitzberger - CS 374 – 4/28/2009 Stanford University Biomedical Informatics
Machine learning system design Prioritizing what to work on
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Analytics CMIS Short Course part II Day 1 Part 4: ROC Curves Sam Buttrey December 2015.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Model Evaluation Saed Sayad
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Canadian Bioinformatics Workshops
Evaluating Classifiers
Evaluation – next steps
Performance Evaluation 02/15/17
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Evaluating Results of Learning
Chapter 7 – K-Nearest-Neighbor
Measuring Success in Prediction
Our Data Science Roadmap
Machine Learning Week 10.
Data Mining Classification: Alternative Techniques
Features & Decision regions
Evaluating Classifiers (& other algorithms)
Learning Algorithm Evaluation
Model Evaluation and Selection
Evaluating Models Part 1
Evaluating Classifiers
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ROC Curves and Operating Points
Presentation transcript:

...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting (Oct 13, 2010)

2 Q: When can ROCR be useful for you? Not only in machine learning!  A: Whenever you want to evaluate a numerical score against a categorical (binary) outcome “Does the score separate the two classes well?”

3 Examples (1) Markers of disease severity in psoriatric arthritis From: Englbrecht et al, 2010 Numerical score: numbers of swollen and tender joints Categorical outcome: Clinical response of ACR20 improvement (yes/no)

4 Examples (2) HIV drug resistance  Physicians count specific mutations in the HIV genome to predict whether the virus will be resistant or susceptible to antiviral therapy.  e.g.: protease mutation list for SQV resistance from Int. AIDS Society

5 Examples (3) Evaluating scoring output from machine learning approaches  Logistic regression model <- glm( Y ~ X, family=binomial, A) predict(model, data.frame(X=31),type='response') [1]  Decision trees m1 <- rpart(Class ~.,data=GlaucomaM) predict(m1,X) glaucoma normal  SVMs, Random Forests,...

Scoring classifiers  Output: continuous score (instead of actual class prediction) Example: Saquinavir (SQV) prediction with SVMs: > m X.new predict(m, X.new) 1 2 TRUE TRUE Use distance to the hyperplane as numeric score: > predict(m, X.new, decision.values=TRUE) attr(,"decision.values") TRUE/FALSE  If predictions are (numeric) scores, actual predictions of (categorical) classes depend on the choice of a cutoff c: f(x) ≥ c  class „1“ f(x) < c  class „-1“  Thus, a scoring classifiers induces a family {f c } c of binary classifiers

7 Binary classifiers (1/2) Prediction – outcomes

Binary classifiers (2/2) Some metrics of predictive performance  Accuracy: P(Ŷ = Y); estimated as (TP+TN) / (TP+FP+FN+TN)  Error rate: P(Ŷ ≠ Y); est: (FP+FN) / (TP+FP+FN+TN)  True positive rate (sensitivity, recall): P(Ŷ = 1 | Y = 1); est: TP / P  False positive rate (fallout): P(Ŷ = 1 | Y = -1); est: FP / N  True negative rate (specificity): P(Ŷ = -1 | Y = -1); est: FP / N  False negative rate (miss): P(Ŷ = -1 | Y = 1); est: FN / P  Precision (pos. predictive value): P(Y= 1 | Ŷ = 1); est: TP / (TP+FP) 1 1True positive (TP)False positive (FP) False negative (FN) True negative (TN) True class Predicted class

Performance curves for visualizing trade-offs  Often the choice of a cutoff c involves trading off two competing performance measures, e.g. TPR: P(Ŷ = 1 | Y = 1) vs. FPR: P(Ŷ = 1 | Y = -1)  Performance curves (ROC, sensitivity-specificity, precision-recall, lift charts,...) are used to visualize this trade-off

Classifier evaluation with ROCR  Only three commands pred <- prediction( scores, labels ) (pred: S4 object of class prediction) perf <- performance( pred, measure.Y, measure.X) (pred: S4 object of class performance) plot( perf )  Input format Single run: vectors ( scores : numeric; labels : anything) Multiple runs (cross-validation, bootstrapping, …): matrices or lists  Output format Formal class 'performance' [package "ROCR"] with 6 x.name : chr y.name : chr alpha.name : chr x.values :List of y.values :List of alpha.values: list()

Classifier evaluation with ROCR: An example  Example: Cross-validation of saquinavir resistance prediction with SVMs > library(ROCR) Only take first training/test run for now: > pred perf [[1]] [1] You will understand in a minute why there are three accuracies (the one in the middle is relevant here)  Works exactly the same with cross-validation data (give predictions and true classes for the different folds as a list or matrix) – we have already prepared the data correctly: > str(predictions) List of 10 $ : Factor w/ 2 levels "FALSE","TRUE": $ : Factor w/ 2 levels "FALSE","TRUE": > pred perf [[1]] [1] [[2]] [1]

Examples (1/8): ROC curves  pred <- prediction(scores, labels)  perf <- performance(pred, "tpr", "fpr")  plot(perf, colorize=T)

Examples (2/8): Precision/recall curves  pred <- prediction(scores, labels)  perf <- performance(pred, "prec", "rec")  plot(perf, colorize=T)

Examples (3/8): Averaging across multiple runs  pred <- prediction(scores, labels)  perf <- performance(pred, "tpr", "fpr")  plot(perf, avg='threshold', spread.estimate='stddev', colorize=T)

Examples (4/8): Performance vs. cutoff  perf <- performance(pred, "acc")  plot(perf, avg= "vertical", spread.estimate="boxplot", show.spread.at= seq(0.1, 0.9, by=0.1))  perf <- performance(pred, "cal", window.size=50)  plot(perf)

Examples (5/8): Cutoff labeling  pred <- prediction(scores, labels)  perf <- performance(pred,"pcmiss","lift")  plot(perf, colorize=T, print.cutoffs.at=seq(0,1,by=0.1), text.adj=c(1.2,1.2), avg="threshold", lwd=3)

Examples (6/8): Cutoff labeling – multiple runs  plot(perf, print.cutoffs.at=seq(0,1,by=0.2), text.cex=0.8, text.y=lapply(as.list(seq(0,0.5,by=0.05)), function(x) { col= as.list(terrain.colors(10)), text.col= as.list(terrain.colors(10)), points.col= as.list(terrain.colors(10)))

Examples (7/8): More complex trade-offs...  perf <- performance(pred,"acc","lift")  plot(perf, colorize=T)  plot(perf, colorize=T, print.cutoffs.at=seq(0,1,by=0.1), add=T, text.adj=c(1.2, 1.2), avg="threshold", lwd=3)

Examples (8/8): Some other examples  perf<-performance( pred, 'ecost')  plot(perf)  perf<-performance( pred, 'rch')  plot(perf)

Extending ROCR: An example  Extend environments assign("auc", "Area under the ROC curve", envir = long.unit.names) assign("auc", ".performance.auc", envir = function.names) assign("auc", "fpr.stop", envir=optional.arguments) assign("auc:fpr.stop", 1, envir=default.values)  Implement performance measure (predefined signature).performance.auc <- function (predictions, labels, cutoffs, fp, tp, fn, tn, n.pos, n.neg, n.pos.pred, n.neg.pred, fpr.stop) { }

21 Now.... ...get ROCR from CRAN!  demo(ROCR) [cycle through examples by hitting, examine R code that is shown]  Help pages: help(package=ROCR) ?prediction ?performance ?plot.performance ?'prediction-class' ?'performance-class‘  Please cite the ROCR paper!