Evaluation – next steps

Slides:



Advertisements
Similar presentations
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Model Evaluation Metrics for Performance Evaluation
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
CAP and ROC curves.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Tutorial 2 LIU Tengfei 2/19/2009. Contents Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource.
ROC & AUC, LIFT ד"ר אבי רוזנפלד.
Evaluation – next steps
2015/7/15University of Waikato1 Significance tests Significance tests tell us how confident we can be that there really is a difference Null hypothesis:
Chapter 5 Data mining : A Closer Look.
Evaluation of Learning Models
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Classification II (continued) Model Evaluation
Evaluating Classifiers
BASIC STATISTICS: AN OXYMORON? (With a little EPI thrown in…) URVASHI VAID MD, MS AUG 2012.
Evaluation – next steps
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 2 Data Mining: A Closer Look Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
Evaluating Results of Learning Blaž Zupan
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
1 Performance Measures for Machine Learning. 2 Performance Measures Accuracy Weighted (Cost-Sensitive) Accuracy Lift Precision/Recall –F –Break Even Point.
Evaluating Classification Performance
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluation of Learning Models Evgueni Smirnov. Overview Motivation Metrics for Classifier’s Evaluation Methods for Classifier’s Evaluation Comparing Data.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Model Evaluation Saed Sayad
CHAPTER 7 Decision Analytic Thinking I: What Is a Good Model?
7. Performance Measurement
Evaluating Classifiers
Data Science Credibility: Evaluating What’s Been Learned Evaluating Numeric Prediction WFH: Data Mining, Section 5.8 Rodney Nielsen Many of these.
Performance Evaluation 02/15/17
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Evaluating Results of Learning
9. Credibility: Evaluating What’s Been Learned
Performance evaluation
Performance Measures II
Chapter 7 – K-Nearest-Neighbor
Measuring Success in Prediction
Data Mining Classification: Alternative Techniques
Features & Decision regions
Data Mining Practical Machine Learning Tools and Techniques
Evaluation and Its Methods
Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn Some overheads from Galit Shmueli and Peter Bruce 2010.
Learning Algorithm Evaluation
CSCI N317 Computation for Scientific Applications Unit Weka
Model Evaluation and Selection
Computational Intelligence: Methods and Applications
Data Mining Class Imbalance
Evaluation and Its Methods
Roc curves By Vittoria Cozza, matr
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Evaluation and Its Methods
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Slides for Chapter 5, Evaluation
Presentation transcript:

Evaluation – next steps Lift and Costs

Outline Lift and Gains charts *ROC Cost-sensitive learning Evaluation for numeric predictions

Application Example: Direct Marketing Paradigm Find most likely prospects to contact Not everybody needs to be contacted Number of targets is usually much smaller than number of prospects Typical Applications retailers, catalogues, direct mail (and e-mail) customer acquisition, cross-sell, attrition prediction ...

Direct Marketing Evaluation Accuracy on the entire dataset is not the right measure Approach develop a target model score all prospects and rank them by decreasing score select top P% of prospects for action How to decide what is the best selection?

Model-Sorted List Use a model to assign score to each customer Sort customers by decreasing score Expect more targets (hits) near the top of the list No Score Target CustID Age 1 0.97 Y 1746 … 2 0.95 N 1024 3 0.94 2478 4 0.93 3820 5 0.92 4897 99 0.11 2734 100 0.06 2422 3 hits in top 5% of the list If there 15 targets overall, then top 5 has 3/15=20% of targets

CPH (Cumulative Pct Hits) Cumulative % Hits Definition: CPH(P,M) = % of all targets in the first P% of the list scored by model M CPH frequently called Gains Pct list Potential benefit of a good customer segmentation is very high, but hard to estimate objectively. For tactical tasks, we have some analytical tools that we can use. First a brief review of the notion of lift. 5% of random list have 5% of targets Q: What is expected value for CPH(P,Random) ? A: Expected value for CPH(P,Random) = P

CPH: Random List vs Model-ranked list Cumulative % Hits Pct list 5% of random list have 5% of targets, but 5% of model ranked list have 21% of targets CPH(5%,model)=21%.

Lift Lift(P,M) = CPH(P,M) / P Lift (at 5%) = 21% / 5% = 4.2 better than random Note: Some (including Witten & Eibe) use “Lift” for what we call CPH. P -- percent of the list

Lift Properties Q: Lift(P,Random) = Q: Lift(100%, M) = A: 1 (expected value, can vary) Q: Lift(100%, M) = A: 1 (for any model M) Q: Can lift be less than 1? A: yes, if the model is inverted (all the non-targets precede targets in the list) Generally, a better model has higher lift

ROC curves ROC curves Stands for “receiver operating characteristic” Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel Plots TP Rate vs FP Rate True-positive rate is also known as recall TP/(TP+FN): fraction of positives that you correctly classify False Positive rate FP/(FP+TN): fraction of negatives that you incorrectly classify witten & eibe

A sample ROC curve Jagged curve—one set of test data Smooth curve—use cross-validation witten & eibe

ROC curves for two schemes For a small, focused sample, use method A only need to act on a few predictions For a larger one, use method B In between, choose between A and B with appropriate probabilities witten & eibe

Cost Sensitive Learning There are two types of errors Machine Learning methods usually minimize FP+FN Direct marketing maximizes TP Predicted class Yes No Actual class TP: True positive FN: False negative FP: False positive TN: True negative

Different Costs In practice, true positive and false negative errors often incur different costs Examples: Medical diagnostic tests: does X have leukemia? Loan decisions: approve mortgage for X? Web mining: will X click on this link? Promotional mailing: will X buy the product? …

Cost-sensitive learning Most learning schemes do not perform cost-sensitive learning They generate the same classifier no matter what costs are assigned to the different classes Example: standard decision tree learner Simple methods for cost-sensitive learning: Re-sampling of instances according to costs Weighting of instances according to costs Change probability threshold

Evaluating numeric prediction Same strategies: independent test set, cross-validation, significance tests, etc. Difference: error measures Actual target values: a1 a2 …an Predicted target values: p1 p2 … pn Most popular measure: mean-squared error Easy to manipulate mathematically witten & eibe

Other measures The root mean-squared error : The mean absolute error is less sensitive to outliers than the mean-squared error: witten & eibe