Download presentation

Presentation is loading. Please wait.

Published byAlice Kelley Modified over 4 years ago

2
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures

3
Counting the Cost ● In practice, different types of classification errors often incur different costs ● Examples: Terrorist profiling ● “Not a terrorist” correct 99.99% of the time Loan decisions Oil-slick detection Fault diagnosis Promotional mailing

4
Counting the Cost ● The confusion matrix: There are many other types of cost! ● E.g.: cost of collecting training data Actual class True negativeFalse positiveNo False negativeTrue positiveYes NoYes Predicted class

5
The Kappa Statistic ● Two confusion matrices for a 3-class problem: actual predictor (left) vs. random predictor (right) Number of successes: sum of entries in diagonal (D) ● Kappa statistic: measures relative improvement over random predictor

6
Classification with Costs ● Two cost matrices: ● Success rate is replaced by average cost per prediction ● Cost is given by appropriate entry in the cost matrix

7
Cost-Sensitive Classification ● Can take costs into account when making predictions ● Basic idea: only predict high-cost class when very confident about prediction ● Given: predicted class probabilities ● Normally we just predict the most likely class ● Here, we should make the prediction that minimizes the expected cost ● Expected cost: dot product of vector of class probabilities and appropriate column in cost matrix ● Choose column (class) that minimizes expected cost

8
Cost-Sensitive Learning ● So far we haven't taken costs into account at training time ● Most learning schemes do not perform cost-sensitive learning ● They generate the same classifier no matter what costs are assigned to the different classes ● Example: standard decision tree learner ● Simple methods for cost-sensitive learning: ● Resampling of instances according to costs ● Weighting of instances according to costs ● Some schemes can take costs into account by varying a parameter, e.g. naïve Bayes

9
Lift Charts ● In practice, costs are rarely known ● Decisions are usually made by comparing possible scenarios ● Example: promotional mailout to 1,000,000 households Mail to all; 0.1% respond (1000) Data mining tool identifies subset of 100,000 most promising, 0.4% of these respond (400) 40% of responses for 10% of cost may pay off Identify subset of 400,000 most promising, 0.2% respond (800) ● A lift chart allows a visual comparison

10
Generating a Lift Chart ● Sort instances according to predicted probability of being positive: ● x axis is sample size ● y axis is number of true positives …… … Yes0.88 4 No0.90 3 Yes0.93 2 Yes0.95 1 Actual classPredicted probability

11
Comparing Models by Measuring Lift

13
A Hypothetical Lift Chart 40% of responses for 10% of cost 80% of responses for 40% of cost

14
ROC Curves ● ROC curves are similar to lift charts Stands for “receiver operating characteristic” Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel ● Differences to lift chart: y axis shows percentage of true positives in sample rather than absolute number x axis shows percentage of false positives in sample rather than sample size

15
A Sample ROC Curve ● Jagged curve—one set of test data ● Smooth curve—use cross-validation

16
Cross-Validation and ROC Curves ● Simple method of getting a ROC curve using cross- validation: Collect probabilities for instances in test folds Sort instances according to probabilities ● This method is implemented in WEKA ● However, this is just one possibility Another possibility is to generate an ROC curve for each fold and average them

17
ROC Curves for Two Schemes ● For a small, focused sample, use method A ● For a larger one, use method B ● In between, choose between A and B with appropriate probabilities

18
More Measures... ● Percentage of retrieved documents that are relevant: precision=TP/(TP+FP) ● Percentage of relevant documents that are returned: recall =TP/(TP+FN) ● Precision/recall curves have hyperbolic shape ● Summary measures: average precision at 20%, 50% and 80% recall (three-point average recall) ● F-measure=(2 × recall × precision)/(recall+precision) ● sensitivity × specificity = (TP / (TP + FN)) × (TN / (FP + TN)) ● Area under the ROC curve (AUC): probability that randomly chosen positive instance is ranked above randomly chosen negative one

19
Summary of Some Measures ExplanationPlotDomain TP/(TP+FN) TP/(TP+FP) Recall Precision Information retrieval Recall- precision curve TP/(TP+FN) FP/(FP+TN) TP rate FP rate CommunicationsROC curve TP (TP+FP)/(TP+FP+TN +FN) TP Subset size MarketingLift chart

20
The Mystery Sound Bet you can’t tell what this is…

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google