Download presentation

Presentation is loading. Please wait.

Published byAlejandro Sanders Modified over 2 years ago

1
Study on Ensemble Learning By Feng Zhou

2
Content Introduction A Statistical View of M3 Network Future Works

3
Introduction Ensemble learning: – To combine a group of classifiers rather than to design a new one. – The decisions of multiple hypotheses are combined to produce more accurate results. Problems in traditional learning algorithms – Statistical Problem – Computational Problem – Representation Problem Related Works – Resampling techniques: Bagging, Boosting – Approaches for extending to multi-class problem: One-vs-One, One-vs-All.

4
Min-Max-Modular (M 3 ) Network (Lu, IEEE TNN 1999) Steps – Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) – Training pair-wise classifiers – Integrating the outcomes (Zhao, IJCNN 2005) Min process Max process

5
A Statistical View Assumption – The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): Bayesian decision theory

6
A Simple Discrete Example P(w|x) W+W+ W-W- X1X1 1/2 X2X2 2/5 X3X3 X4X4 1/5

7
A Simple Discrete Example (II) Classifier 1 (w + :w 1 - ) Classifier 2 (w + :w 2 - ) P c0 (w + |x=x 2 ) = 1/3 P c1 (w + |x=x 2 ) = 1/2 P c2 (w + |x=x 2 ) = 1/2 Classifier 0 (w + :w - ) P c0 < min(P c1,P c2 )

8
A More Complicated Example When consider a new more classifier, the evidence that x belong to w + is getting shrinking. P global (w + ) < min(P partial (w + )) The one reporting the minimum value contains the most information about w - (Minimization principle) If P partial (w + )=1, no information about w - is contained. Classifier 1 (w + :w 1 - )Classifier 2 (w + :w 2 - ) …… Information about w - is increasing

9
Analysis For each classifier c ij For each sub-positive class w i + For positive class w +

10
Analysis (II) Decomposition of a complex problem Restoration to the original resoluation

11
Composition of Training Sets w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Have been used Trivial set, useless Not used yet

12
Another Way of Combination w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Training and testing Time:

13
Experiments - Synthesis Data

14
Experiments – Text Categorization (20 Newsgroup copus) Experiments Setup Removing words : stemming stop words < 30 Using Naïve Bayes as the elementary classifier Estimating the probability with a sigmod function

15
Future Work Situation with consideration of noise – The virtue of the problem: To access the underlying distribution – Independent parameters for the model: – Constraints we get: – To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

16
References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu &, Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min- max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google