# CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

## Presentation on theme: "CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures."— Presentation transcript:

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures

Counting the Cost ● In practice, different types of classification errors often incur different costs ● Examples:  Terrorist profiling ● “Not a terrorist” correct 99.99% of the time  Loan decisions  Oil-slick detection  Fault diagnosis  Promotional mailing

Counting the Cost ● The confusion matrix: There are many other types of cost! ● E.g.: cost of collecting training data Actual class True negativeFalse positiveNo False negativeTrue positiveYes NoYes Predicted class

The Kappa Statistic ● Two confusion matrices for a 3-class problem: actual predictor (left) vs. random predictor (right) Number of successes: sum of entries in diagonal (D) ● Kappa statistic: measures relative improvement over random predictor

Classification with Costs ● Two cost matrices: ● Success rate is replaced by average cost per prediction ● Cost is given by appropriate entry in the cost matrix

Cost-Sensitive Classification ● Can take costs into account when making predictions ● Basic idea: only predict high-cost class when very confident about prediction ● Given: predicted class probabilities ● Normally we just predict the most likely class ● Here, we should make the prediction that minimizes the expected cost ● Expected cost: dot product of vector of class probabilities and appropriate column in cost matrix ● Choose column (class) that minimizes expected cost

Cost-Sensitive Learning ● So far we haven't taken costs into account at training time ● Most learning schemes do not perform cost-sensitive learning ● They generate the same classifier no matter what costs are assigned to the different classes ● Example: standard decision tree learner ● Simple methods for cost-sensitive learning: ● Resampling of instances according to costs ● Weighting of instances according to costs ● Some schemes can take costs into account by varying a parameter, e.g. naïve Bayes

Lift Charts ● In practice, costs are rarely known ● Decisions are usually made by comparing possible scenarios ● Example: promotional mailout to 1,000,000 households Mail to all; 0.1% respond (1000) Data mining tool identifies subset of 100,000 most promising, 0.4% of these respond (400) 40% of responses for 10% of cost may pay off Identify subset of 400,000 most promising, 0.2% respond (800) ● A lift chart allows a visual comparison

Generating a Lift Chart ● Sort instances according to predicted probability of being positive: ● x axis is sample size ● y axis is number of true positives …… … Yes0.88 4 No0.90 3 Yes0.93 2 Yes0.95 1 Actual classPredicted probability

Comparing Models by Measuring Lift

A Hypothetical Lift Chart 40% of responses for 10% of cost 80% of responses for 40% of cost

ROC Curves ● ROC curves are similar to lift charts  Stands for “receiver operating characteristic”  Used in signal detection to show tradeoff between hit rate and false alarm rate over noisy channel ● Differences to lift chart:  y axis shows percentage of true positives in sample rather than absolute number  x axis shows percentage of false positives in sample rather than sample size

A Sample ROC Curve ● Jagged curve—one set of test data ● Smooth curve—use cross-validation

Cross-Validation and ROC Curves ● Simple method of getting a ROC curve using cross- validation:  Collect probabilities for instances in test folds  Sort instances according to probabilities ● This method is implemented in WEKA ● However, this is just one possibility  Another possibility is to generate an ROC curve for each fold and average them

ROC Curves for Two Schemes ● For a small, focused sample, use method A ● For a larger one, use method B ● In between, choose between A and B with appropriate probabilities

More Measures... ● Percentage of retrieved documents that are relevant: precision=TP/(TP+FP) ● Percentage of relevant documents that are returned: recall =TP/(TP+FN) ● Precision/recall curves have hyperbolic shape ● Summary measures: average precision at 20%, 50% and 80% recall (three-point average recall) ● F-measure=(2 × recall × precision)/(recall+precision) ● sensitivity × specificity = (TP / (TP + FN)) × (TN / (FP + TN)) ● Area under the ROC curve (AUC): probability that randomly chosen positive instance is ranked above randomly chosen negative one

Summary of Some Measures ExplanationPlotDomain TP/(TP+FN) TP/(TP+FP) Recall Precision Information retrieval Recall- precision curve TP/(TP+FN) FP/(FP+TN) TP rate FP rate CommunicationsROC curve TP (TP+FP)/(TP+FP+TN +FN) TP Subset size MarketingLift chart

The Mystery Sound  Bet you can’t tell what this is…

Similar presentations