Evaluating Classifiers

Slides:



Advertisements
Similar presentations
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Advertisements

...visualizing classifier performance in R Tobias Sing, Ph.D. (joint work with Oliver Sander) Modeling & Simulation Novartis Pharma AG 3 rd BaselR meeting.
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Receiver Operating Characteristic (ROC) Curves
Lecture 16: Logistic Regression: Goodness of Fit Information Criteria ROC analysis BMTRY 701 Biostatistical Methods II.
Logistic Regression Chapter 5, DDS. Introduction What is it? – It is an approach for calculating the odds of event happening vs other possibilities…Odds.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
ROC Curves.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
ROC Curve and Classification Matrix for Binary Choice Professor Thomas B. Fomby Department of Economics SMU Dallas, TX February, 2015.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Evaluation – next steps
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
1 Evaluating Model Performance Lantz Ch 10 Wk 5, Part 2 Right – Graphing is often used to evaluate results from different variations of an algorithm. Depending.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
6/7/2014 CSE6511.  What is it? ◦ It is an approach for calculating the odds of event happening vs other possibilities…Odds ratio is an important concept.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Machine learning system design Prioritizing what to work on
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Linear Discriminant Analysis and Logistic Regression.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Evaluating Classification Performance
Quiz 1 review. Evaluating Classifiers Reading: T. Fawcett paper, link on class website, Sections 1-4 Optional reading: Davis and Goadrich paper, link.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #50 Binary.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Evolving Decision Rules (EDR)
Evaluating Classifiers
Machine Learning: Methodology Chapter
Performance Evaluation 02/15/17
CSSE463: Image Recognition Day 11
Performance Measures II
Chapter 7 – K-Nearest-Neighbor
Measuring Success in Prediction
Our Data Science Roadmap
Data Mining Classification: Alternative Techniques
Features & Decision regions
CSSE463: Image Recognition Day 11
Evaluation and Its Methods
Evaluating Classifiers (& other algorithms)
Image Classification via Attribute Detection
ROC Curves and Operating Points
Model Evaluation and Selection
Data Mining Class Imbalance
Evaluation and Its Methods
CSSE463: Image Recognition Day 11
Roc curves By Vittoria Cozza, matr
CSSE463: Image Recognition Day 11
Our Data Science Roadmap
Evaluation and Its Methods
Precision and Recall.
Logistic Regression 10/13/2019.
Evaluation Metrics CS229 Anand Avati.
ROC Curves and Operating Points
Information Organization: Evaluation of Classification Performance
Presentation transcript:

Evaluating Classifiers

Evaluation Rank the ads according to click logistic regression, and display them accordingly Measures of evaluation: {lift, accuracy, precision, recall, f-score) Error evaluation: in fact often the equation is written in with an error factor. For example, for logit we rewrite the equation as logit(P(ci=1|xi) = α + βt Xi + err

Evaluation (contd.) Lift: You use a classifier, and apply the results to a business process. What is the change in the outcomes since you applied the new methods/model? E.g., How many more people are clicking (or buying) because of the introduction of the model? Accuracy: How often is the correct outcome predicted?

Positive and Negative Classifications In most business applications of classification, you are looking for one class more so than the other. For example, you are looking for promising stocks over potential poor performers. There are two aspects to how well a classifier performs: You want the classifier to “flag” items you are looking for but also NOT flag thinks you are NOT looking for. “Flag” aggressively, you will be get a lot of false positives “Flag” conservatively, you will be leaving out items that should have flagged. Esp. dangerous in certain situations, e.g., Screening for cancer. 5/7/2019

True Positive Rate True positive rate (TPR). Of all the things that should be flagged by our classifier, this is the fraction that actually gets flagged. We want it to be high 1.0, if perfect! False positive rate (FPR. Of all the things that should NOT be flagged, this is fraction that still ends up getting flagged. We want this to be low, 0.0 to be perfect! Here is a graph explaining it. 5/7/2019

1.0 Many false positives A happy medium True positive rate Many false Anything in this lower triangle is not good Many false negatives 0,0 False positive rate 1.0 5/7/2019

Roc: Receiver Operating Characteristic (ROC) Curve ROC curve was designed during World War II for detecting enemy objects in the battle field. Now it is extensively used in medicine and statistical evaluation of classifiers. Here is an example for multiple-classifier comparison using ROC curve 5/7/2019

https://towardsdatascience https://towardsdatascience.com/fraud-detection-under-extreme-class-imbalance-c241854e60c 5/7/2019

Area under ROC, AUC or AUROC In the last slide you also saw another metric comparing the various methods: AUC: Area under the curve (AUC) This provides area under the ROC, a numerical quantity for comparing the various curves. A good curve will approach 1.0 and shallow curve will be around 0.5. A higher value is better measure of AUC. You can evaluate all these using Spark mllib ! 5/7/2019

Other Performance Metrics Precision: #True positives/ (#True positives+ #False positives) Recall: #True positives/(#True positives + #False negatives) F-score: mean of precision and accuracy = 2*precision*recall/(precision+recall) 5/7/2019

Confusion Matrix for Multiclass Classifiers A simple way to apply the evaluation we learned so far is to form a “confusion matrix” 5/7/2019

Confusion Matrix 35 4 1 5 40 3 15 6 45 30 Here is an example: Class Sports Business World USA 35 4 1 5 40 3 15 6 45 30 5/7/2019

Confusion Matrix Simple binary classifier confusion matrix and calculations: http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/ Lets work out an example in the final review. 5/7/2019

Summary We looked at various evaluation matrix for assessing the quality of the classification. We worked with a simple binary classification confusion matrix. 5/7/2019