Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.

Similar presentations


Presentation on theme: "Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge."— Presentation transcript:

1 Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge discovery and data mining, pp.226-235, 2003 Reporter :侯佩廷 2011/06/08 1

2 Outline Introduction Concept Drift Data Expiration Ensemble Classifiers Instance Based Pruning Experiments Conclusion 2011/06/08 2

3 Introduction The problem of mining data : – The tremendous amount of data is constantly evolving – Concept drift Propose: Using weighted classifiers ensemble to prove the problem. 2011/06/08 3

4 Concept Drift 2011/06/08 4 5/115/125/135/145/155/165/175/185/195/20 When the concept is updating or changing, there consist the concept drift. Figure 1: Concept drift

5 Data Expiration 2011/06/08 5 The fundamental problem: How to identify the data which is no longer useful? A straight forward solution : Discards the old data after a fixed period time T

6 Data Expiration Figure 2: data distributions and optimum boundaries Optimum boundary:positive: Overfitting:negative: 2011/06/08 6 t t0t0 t1t1 t2t2 t3t3 S0S0 S1S1 S2S2

7 Expiration Figure 3: Which training dataset to use? Optimum boundary: 2011/06/08 7 (a) S 1 + S 2 (b) S 0 + S 1 + S 2 (c) S 2 + S 0

8 Data Expiration 2011/06/08 8 Instead of discarding data using criteria based on their arrival time, we shall make decisions based on their class distribution.

9 Ensemble Classifiers y : a test example f c (y): the probability of y being an instance of class c The probability output of the ensemble(via averaging): 2011/06/08 9 Where is the probability output of the i-th classifier in the ensemble

10 2011/06/08 10 y Classifier1Classifier2Classifier3 = 0.4= 0.6= 0.8

11 2011/06/08 11 t t1t1 t2t2 t3t3 t4t4 titi t i+1 S1S1 S2S2 S3S3 …… S 10 C1C1 C2C2 C3C3 C 10 G9G9 E9E9 Ensemble Classifiers CiCi GkGk EkEk

12 S n consists of records in the form of (x, c), where c is the true label of the record. C i ’s classification error of example (x, c) is 1- Mean square error of classifier C i : 2011/06/08 12

13 Ensemble Classifiers A classifier predicts randomly will have mean square error: Ex: 2011/06/08 13 Classifier Class 2Class 1 P = 0.5

14 Ensemble Classifiers We discard classifiers whose error is equal to or larger than MSE r. Weight w i for classifier C i : w i = MSE r - MES i 2011/06/08 14

15 Ensemble Classifiers For cost-sensitive applications such as credit card fraud detection. 2011/06/08 15 Predict fraudPredict not fraud Actual fraudt(x)-cost0 Actual not fraud-cost0 Predict fraudPredict not fraud Actual fraud900-900 Actual not fraud-900

16 Instance Based Pruning Goal: – Use first k classifiers with high weights to reach the same decision when we use all K classifiers. 2011/06/08 16

17 Instance Based Pruning The conditions of this pipeline procedure stops: – The confident prediction can be made – No more classifiers in the pipeline 2011/06/08 17 Weight C1C1 C2C2 CkCk CKCK High Low ……

18 Instance Based Pruning After consulting the first k classifiers, we derive the current weighted probability: 2011/06/08 18

19 Instance Based Pruning Let ε k (x)=F k (x)- F K (x) be the error at stage k. We compute the mean and the variance of ε k (x) 2011/06/08 19

20 Experiments Two kinds of data: Synthetic Data – Create synthetic data with drifting concepts on moving hyperplane. Credit Card Fraud Data – One year and 5 million transactions 2011/06/08 20

21 Experiments 2011/06/08 21 Figure 4: Training Time, ChunkSize, and Error Rate

22 Experiments 2011/06/08 22 Figure 5: Effects of Instance Base Pruning

23 Experiments 2011/06/08 23 Figure 6: Average Error Rate of Single and Ensemble Decision Tree Classifiers

24 Experiments 2011/06/08 24 Figure 7: Averaged Benefits using Single Classifiers and Classifier Ensembles The benfits are averaged from multiple runs with different chunk size ( 3000 to 12000 transactions per chunk) Average the benefits of E k and G k ( K = 2,…8) for each fixed chunk size.

25 Conclusion The problem of mining data : – The tremendous amount of data is constantly evolving – Concept drift Weight ensemble classifiers is more efficient than the single classifiers 2011/06/08 25

26 Q & A 2011/06/08 26

27 THANKS FOR ATTENTATION. 2011/06/08 27


Download ppt "Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge."

Similar presentations


Ads by Google