Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services.

Similar presentations


Presentation on theme: "Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services."— Presentation transcript:

1 Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services

2 Copyright © 2003, SAS Institute Inc. All rights reserved. 2 Rule-Based Knowledge Extraction  A typical goal in extracting knowledge from data is the production of a classification rule that will assign a class membership to a future event with a specified probability  A binary classifier assigns an object to one of two classes The decision regarding the class assignment will be either correct or incorrect, so there are four possible outcomes: {Predicted Event, Actual Event}(True Positive) {Predicted Event, Actual Nonevent}(False Positive) {Predicted Nonevent, Actual Event}(False Negative) {Predicted Nonevent, Actual Nonevent}(True Negative)

3 Copyright © 2003, SAS Institute Inc. All rights reserved. 3 Evaluating Classifier Performance  Use 2x2 classification table of predicted vs actual class membership  A critical concept in the discussion of decisions is the definition of an event An observation or instance, I, has been classified into class e with probability if the classifier assigns a probability

4 Copyright © 2003, SAS Institute Inc. All rights reserved. 4 The Cost of a Decision  Correct Decision:TP = p(E|p) =  False Positive:FP = p(E|n) = …  Assume that correct decisions incur no cost  The theoretical expected cost of misclassifying an instance I is

5 Copyright © 2003, SAS Institute Inc. All rights reserved. 5 Receiver Operating Characteristic  Compute a 2x2 classification table for values of and plot the curve traced by (FP, TP) as ranges from 0 to 1  This curve is called the “receiver operating characteristic” and was developed during World War II to assess the performance of radar receivers in detecting targets accurately  The area under the ROC curve (AUC) is defined to be the performance index of interest

6 Copyright © 2003, SAS Institute Inc. All rights reserved. 6 ROC Plot (.29,.70) (.29,.67)

7 Copyright © 2003, SAS Institute Inc. All rights reserved. 7 ROC Curve and Decision Costs  ROC curve does not include any class distribution or misclassification cost information in its construction Does not give much guidance in the choice among competing classifiers unless one of them clearly dominates all of the others over all values of  Overlay class distribution and misclassification cost on ROC curve using average cost of decision

8 Copyright © 2003, SAS Institute Inc. All rights reserved. 8 ROC Curve and Decision Costs (cont’d)  For the 2x2 classification table, the equation becomes  At minimum average cost point, slope of ROC curve is  ROC operating point is sensitive to class distribution, misclassification costs

9 Copyright © 2003, SAS Institute Inc. All rights reserved. 9 Determine ROC Operating Point  Represent slope of ROC curve using adjacent points to form the isoperformance line  Compute slopes at adjacent points, determine interval containing slope, match with classifier point, find

10 Copyright © 2003, SAS Institute Inc. All rights reserved. 10 ROC Convex Hull (Provost and Fawcett,1997)  Overlay multiple ROC curves on same (FP, TP) axes

11 Copyright © 2003, SAS Institute Inc. All rights reserved. 11 ROC Convex Hull (cont’d)  Add convex hull to ROC curves

12 Copyright © 2003, SAS Institute Inc. All rights reserved. 12 ROC Convex Hull (cont’d)  Add isoperformance line

13 Copyright © 2003, SAS Institute Inc. All rights reserved. 13 Selecting Classifiers Using ROC Method  The isoperformance line, which is tangent to the ROCCH at the point of minimum expected cost, indicates which classifier to use for a specified combination of class distribution and misclassification costs  Furthermore, the ROCCH method indicates the range of slopes over which a particular classifier is optimal with respect to class and costs

14 Copyright © 2003, SAS Institute Inc. All rights reserved. 14 Selecting Classifiers Using ROCCH (cont’d)  Convex hull points + associated classifier

15 Copyright © 2003, SAS Institute Inc. All rights reserved. 15 Selecting Classifiers Using ROCCH (cont’d)  Range of slopes, points of tangency, classifier

16 Copyright © 2003, SAS Institute Inc. All rights reserved. 16 Selecting Classifiers Using ROCCH (cont’d)  Classifier and AUC for German credit ensemble classifiers

17 Copyright © 2003, SAS Institute Inc. All rights reserved. 17 Selecting Classifiers Using ROCCH (cont’d)  Ensemble classifiers for Catalog Direct Mail

18 Copyright © 2003, SAS Institute Inc. All rights reserved. 18 Selecting Classifiers Using ROCCH (cont’d)  Classifier and AUC for Catalog Direct Mail

19 Copyright © 2003, SAS Institute Inc. All rights reserved. 19 Selecting Classifiers Using ROCCH (cont’d)  Ensemble classifiers for KDD-98 Cup

20 Copyright © 2003, SAS Institute Inc. All rights reserved. 20 Selecting Classifiers Using ROCCH (cont’d)  Classifier and AUC for KDD-98 Cup

21 Copyright © 2003, SAS Institute Inc. All rights reserved. 21 Summary  The ROCCH methodology for selecting binary classifiers explicitly includes class distribution and misclassification costs in its formulation.  It is a robust alternative to whole-curve metrics like AUC, which reports global classifier performance but which may not indicate the best classifier (in the least-cost sense) for the range of operating conditions under which the classifier will assign class memberships.

22 Copyright © 2003, SAS Institute Inc. All rights reserved. 22 Copyright © 2003, SAS Institute Inc. All rights reserved. 22


Download ppt "Copyright © 2003, SAS Institute Inc. All rights reserved. Cost-Sensitive Classifier Selection Ross Bettinger Analytical Consultant SAS Services."

Similar presentations


Ads by Google