# Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works.

## Presentation on theme: "Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works."— Presentation transcript:

Study on Ensemble Learning By Feng Zhou

Content Introduction A Statistical View of M3 Network Future Works

Introduction Ensemble learning: – To combine a group of classifiers rather than to design a new one. – The decisions of multiple hypotheses are combined to produce more accurate results. Problems in traditional learning algorithms – Statistical Problem – Computational Problem – Representation Problem Related Works – Resampling techniques: Bagging, Boosting – Approaches for extending to multi-class problem: One-vs-One, One-vs-All.

Min-Max-Modular (M 3 ) Network (Lu, IEEE TNN 1999) Steps – Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) – Training pair-wise classifiers – Integrating the outcomes (Zhao, IJCNN 2005) Min process Max process 0.10.50.70.2 0.40.30.50.6 0.80.50.40.2 0.50.90.70.3 0.1 0.3 0.2 0.3

A Statistical View Assumption – The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): Bayesian decision theory

A Simple Discrete Example P(w|x) W+W+ W-W- X1X1 1/2 X2X2 2/5 X3X3 X4X4 1/5

A Simple Discrete Example (II) Classifier 1 (w + :w 1 - ) Classifier 2 (w + :w 2 - ) P c0 (w + |x=x 2 ) = 1/3 P c1 (w + |x=x 2 ) = 1/2 P c2 (w + |x=x 2 ) = 1/2 Classifier 0 (w + :w - ) P c0 < min(P c1,P c2 )

A More Complicated Example When consider a new more classifier, the evidence that x belong to w + is getting shrinking. P global (w + ) < min(P partial (w + )) The one reporting the minimum value contains the most information about w - (Minimization principle) If P partial (w + )=1, no information about w - is contained. Classifier 1 (w + :w 1 - )Classifier 2 (w + :w 2 - ) …… Information about w - is increasing

Analysis For each classifier c ij For each sub-positive class w i + For positive class w +

Analysis (II) Decomposition of a complex problem Restoration to the original resoluation

Composition of Training Sets w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Have been used Trivial set, useless Not used yet

Another Way of Combination w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Training and testing Time:

Experiments - Synthesis Data

Experiments – Text Categorization (20 Newsgroup copus) Experiments Setup Removing words : stemming stop words < 30 Using Naïve Bayes as the elementary classifier Estimating the probability with a sigmod function

Future Work Situation with consideration of noise – The virtue of the problem: To access the underlying distribution – Independent parameters for the model: – Constraints we get: – To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist 1998. [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu &, Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min- max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006

Download ppt "Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works."

Similar presentations