Download presentation

Presentation is loading. Please wait.

Published byChristian Marshall Modified over 2 years ago

1
Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los Angeles, CA

2
Inductive Learning Training Data Learner Classifier ($43.45,retail,10025,10040,..., nonfraud) ($246,70,weapon,10001,94583,...,fraud) 1. Decision trees 2. Rules 3. Naive Bayes... Transaction {fraud,nonfraud} Test Data ($99.99,pharmacy,10013,10027,...,?) ($1.00,gas,10040,00234,...,?) Classifier Class Labels nonfraud fraud

3

4
Cost-sensitive Problems ƒCharity Donation: Solicit to people who will donate large amount of charity. Costs $0.68 to send a letter. A(x): donation amount. Only solicit if A(x) > 0.68, otherwise lose money. ƒ Credit card fraud detection: Detect frauds with high transaction amount $90 to challenge a potential fraud A(x): fraudulant transaction amount. Only challenge if A(x) > $90, otherwise lose money.

5
Scalable Issues of Data Mining Scalable Issues of Data Mining ƒLearning algorithm: non-linear complexity in the size of dataset n. memory based due to random access pattern of record in dataset. significantly slower if dataset is not held entirely in memory. ƒState-of-the-art many scalable solutions are algorithm specific. general algorithms are not very scalable and only work for cost-insensitive problems Charity donation: solicit to people who will donate a lot. Credit card fraud: detect frauds with high transaction amount. ƒOur solution: general framework for both cost-sensitive and cost-insensitive problems.

6
Training D D1D1 D2D2 D2D2 large dataset partition into K subsets ML 1 ML 2 ML t C1C1 C2C2 CkCk generate K models

7
Testing D Test Set C1C1 C2C2 CkCk Sent to k models P1P1 P2P2 PkPk Compute k predictions Combine P Combine to one prediction

8
Cost-sensitive Decision Making ƒAssume that records the benefit received by predicting an example of class to be an instance of class. ƒThe expected benefit received to predict an example to be an instance of class (regardless of its true label) is ƒThe optimal decision-making policy chooses the label that maximizes the expected benefit, i.e., ƒWhen and is a traditional accuracy-based problem. ƒTotal benefits

9
Charity Donation Example ƒIt costs $.68 to send a solicitation. ƒAssume that is the best estimate of the donation amount, ƒThe cost-sensitive decision making will solicit an individual if and only if

10
Credit Card Fraud Detection Example ƒIt costs $90 to challenge a potential fraud ƒAssume that y(x) is the transaction amount ƒThe cost-sensitive decision making policy will predict a transaction to be fraudulent if and only if

11
Adult Dataset ƒDownloaded from UCI database. ƒAssociate a benefit factor 2 to positives and a benefit factor 1 to negatives ƒThe decision to predict positive is

12
Calculating probabilities For decision trees, n is the number of examples in a node and k is the number of examples with class label, then the probability is more sophisticated methods smoothing: early stopping, and early stopping plus smoothing For rules, probability is calucated in the same way as decision trees For naive Bayes, is the score for class label, then binning

13

14
Combining Technique-Averaging ƒEach model computes an expected benefit for example over every class label ƒCombining individual expected benefit together ƒWe choose the label with the highest combined expected benefit

15
1. Decision threshold line 2. Examples on the left are more profitable than those on the right 3. "Evening effect": biases towards big fish. Why accuracy is higher?

16
Experiments ƒDecision Tree Learner: C4.5 version 8 ƒDataset: Donation Credit Card Adult

17
Accuracy comparision

18
Accuracy comparison

19

20
Detailed Spread

21
Credit Card Fraud Dataset

22
Adult Dataset

23
Why accuracy is higher?

24
Pruning D D1D1 D2D2 D2D2 large dataset partition into K subsets ML 1 ML 2 ML t C1C1 C2C2 CkCk generate K models Pruning C1C1 C2C2 CkCk Keep k models

25
Techniques ƒAlways use greedy to choose the next classifier. ƒCriteria: Directly use accuracy or total benefits: choose the most accurate Most diversified Most accurate combinations ƒResult: directly use accuracy is the best

26
Pruning Results

27
Dynamic scheduling ƒFor a fixed number of classifiers, do we need every classifier to predict on every example? Not necessarily. ƒSome examples are easier to predict than others. For easier examples, we don't require as many classifiers as more difficult ones. ƒTechniques: Order the classifiers according their accuracy into a pipeline The most accurate classifier is always called first. Each prediction generates a confidence that describes the likelihood of the current prediction to be the same as the prediction by the fixed number of classifiers. If the confidence is too low, more classifiers will be employed.

28
Dynamic Scheduling D C1C1 C1C1 C1C1 C1C1 C1C1 C1C1 predicted examples (pred, conf) (C 1 ) (pred, conf) (C 1,C 2 ) (pred, conf) (C 1, C 2,C 3 )

29
Dynamic Scheduling Result

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google