Download presentation

Presentation is loading. Please wait.

Published bySavannah Roche Modified over 2 years ago

1
A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo Columbia University, New York City, New York

2
Inductive Learning Training Data Learner Classifier ($43.45,retail,10025,10040,..., nonfraud) ($246,70,weapon,10001,94583,...,fraud) 1. Decision trees 2. Rules 3. Naive Bayes... Transaction {fraud,nonfraud} Test Data ($99.99,pharmacy,10013,10027,...,?) ($1.00,gas,10040,00234,...,?) Classifier Class Labels nonfraud fraud

3

4
Distributed Data Mining Distributed Data Mining ƒdata is inherently distributed across the network. many credit card authorization servers are distributed. Data are collected at each individual site. other examples include supermarket customer and transaction database, hotel reservations, travel agency and so on... ƒIn some situations, data cannot even be shared. many different banks have their data servers. They rather share the model but cannot share the data due to many reasons such as privacy, legal, and competitive reasons.

5
Cost-sensitive Problems ƒCharity Donation: Solicit to people who will donate large amount of charity. Costs $0.68 to send a letter. A(x): donation amount. Only solicit if A(x) > 0.68, otherwise lose money. ƒ Credit card fraud detection: Detect frauds with high transaction amount $90 to challenge a potential fraud A(x): fraudulant transaction amount. Only challenge if A(x) > $90, otherwise lose money.

6
Different Learning Frameworks

7
Fully Distributed Framework (training) D1D1 D2D2 D2D2 K sites ML 1 ML 2 ML t C1C1 C2C2 CkCk generate K models

8
Fully-distributed Framework (predicting) D Test Set C1C1 C2C2 CkCk Sent to k models P1P1 P2P2 PkPk Compute k predictions Combine P Combine to one prediction

9
Cost-sensitive Decision Making ƒAssume that records the benefit received by predicting an example of class to be an instance of class. ƒThe expected benefit received to predict an example to be an instance of class (regardless of its true label) is ƒThe optimal decision-making policy chooses the label that maximizes the expected benefit, i.e., ƒWhen and is a traditional accuracy-based problem. ƒTotal benefits

10
Charity Donation Example ƒIt costs $.68 to send a solicitation. ƒAssume that is the best estimate of the donation amount, ƒThe cost-sensitive decision making will solicit an individual if and only if

11
Credit Card Fraud Detection Example ƒIt costs $90 to challenge a potential fraud ƒAssume that y(x) is the transaction amount ƒThe cost-sensitive decision making policy will predict a transaction to be fraudulent if and only if

12
Adult Dataset ƒDownloaded from UCI database. ƒAssociate a benefit factor 2 to positives and a benefit factor 1 to negatives ƒThe decision to predict positive is

13
Calculating probabilities For decision trees, n is the number of examples in a node and k is the number of examples with class label, then the probability is more sophisticated methods smoothing: early stopping, and early stopping plus smoothing For rules, probability is calucated in the same way as decision trees For naive Bayes, is the score for class label, then binning

14

15
Combining Technique-Averaging ƒEach model computes an expected benefit for example over every class label ƒCombining individual expected benefit together ƒWe choose the label with the highest combined expected benefit

16
1. Decision threshold line 2. Examples on the left are more profitable than those on the right 3. "Evening effect": biases towards big fish. Why accuracy is higher?

17
Partially distributed combining techniques ƒRegression: Treat base classifiers' outputs as indepedent variables of regression and the true label as dependent variables. ƒModify Meta-learning: Learning a classifier that maps the base classifiers' class label predictions to that the true class label. For cost-sensitive learning, the top level classifier output probability instead of just a label.

18
Communication Overhead Summary

19
Experiments ƒDecision Tree Learner: C4.5 version 8 ƒDataset: Donation Credit Card Adult

20
Accuracy comparision

21
Accuracy comparison

22

23
Detailed Spread

24
Credit Card Fraud Dataset

25
Adult Dataset

26
Why accuracy is higher?

27
Summary and Future Work ƒEvaluated a wide range of combining techniques include variations of averaging, regression and meta- learning for scalable cost-sensitive (and cost- insensitive learning). ƒAveraging, although simple, has the highest accuracy. ƒPreviously proposed approaches have significantly more overhead and only work well for tradtional accuracy-based problems. ƒFuture work: ensemble pruning and performance estimation

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google