Download presentation

Presentation is loading. Please wait.

Published byAlejandro Sanders Modified over 5 years ago

1
Study on Ensemble Learning By Feng Zhou

2
Content Introduction A Statistical View of M3 Network Future Works

3
Introduction Ensemble learning: – To combine a group of classifiers rather than to design a new one. – The decisions of multiple hypotheses are combined to produce more accurate results. Problems in traditional learning algorithms – Statistical Problem – Computational Problem – Representation Problem Related Works – Resampling techniques: Bagging, Boosting – Approaches for extending to multi-class problem: One-vs-One, One-vs-All.

4
Min-Max-Modular (M 3 ) Network (Lu, IEEE TNN 1999) Steps – Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) – Training pair-wise classifiers – Integrating the outcomes (Zhao, IJCNN 2005) Min process Max process 0.10.50.70.2 0.40.30.50.6 0.80.50.40.2 0.50.90.70.3 0.1 0.3 0.2 0.3

5
A Statistical View Assumption – The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): Bayesian decision theory

6
A Simple Discrete Example P(w|x) W+W+ W-W- X1X1 1/2 X2X2 2/5 X3X3 X4X4 1/5

7
A Simple Discrete Example (II) Classifier 1 (w + :w 1 - ) Classifier 2 (w + :w 2 - ) P c0 (w + |x=x 2 ) = 1/3 P c1 (w + |x=x 2 ) = 1/2 P c2 (w + |x=x 2 ) = 1/2 Classifier 0 (w + :w - ) P c0 < min(P c1,P c2 )

8
A More Complicated Example When consider a new more classifier, the evidence that x belong to w + is getting shrinking. P global (w + ) < min(P partial (w + )) The one reporting the minimum value contains the most information about w - (Minimization principle) If P partial (w + )=1, no information about w - is contained. Classifier 1 (w + :w 1 - )Classifier 2 (w + :w 2 - ) …… Information about w - is increasing

9
Analysis For each classifier c ij For each sub-positive class w i + For positive class w +

10
Analysis (II) Decomposition of a complex problem Restoration to the original resoluation

11
Composition of Training Sets w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Have been used Trivial set, useless Not used yet

12
Another Way of Combination w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Training and testing Time:

13
Experiments - Synthesis Data

14
Experiments – Text Categorization (20 Newsgroup copus) Experiments Setup Removing words : stemming stop words < 30 Using Naïve Bayes as the elementary classifier Estimating the probability with a sigmod function

15
Future Work Situation with consideration of noise – The virtue of the problem: To access the underlying distribution – Independent parameters for the model: – Constraints we get: – To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

16
References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist 1998. [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu &, Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min- max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006

Similar presentations

OK

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google