Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Support Vector Machines

Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Ensemble Learning what is an ensemble? why use an ensemble?

2D1431 Machine Learning Boosting.

Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Ensemble Learning: An Introduction

Supervised learning: Mixture Of Experts (MOE) Network.

Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,

Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.

Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,

Radial Basis Function Networks

For Better Accuracy Eick: Ensemble Learning

Machine Learning CS 165B Spring 2012

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection Takafumi Kanamori Shohei Hido NIPS 2008.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Abstract This poster presents results of three studies dealing with application of ARTMAP neural networks for classification of remotely sensed multispectral.

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

Visual Information Systems multiple processor approach.

CS 391L: Machine Learning: Ensembles

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Benk Erika Kelemen Zsolt

1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

An Introduction to Support Vector Machines (M. Law)

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

Ensemble Learning (1) Boosting Adaboost Boosting is an additive model

Cooperative Classifiers Rozita Dara Supervisor: Prof. Kamel Pattern Analysis and Machine Intelligence Lab University of Waterloo.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Network Community Behavior to Infer Human Activities.

Co-operative Training in Classifier Ensembles Rozita Dara PAMI Lab University of Waterloo.

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems.

Classification Ensemble Methods 1

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Deep Feedforward Networks

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Neuro-Computing Lecture 5 Committee Machine

A Consensus-Based Clustering Method

Combining Base Learners

network of simple neuron-like computing elements

Computer Vision Chapter 4

Ensemble learning Reminder - Bagging of Trees Random Forest

INTRODUCTION TO Machine Learning 3rd Edition

Evolutionary Ensembles with Negative Correlation Learning

Presentation transcript:

Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo

Introduction Data Dependence Implicit Dependence Implicit Dependence Explicit Dependence Explicit Dependence Feature Based Architecture Training Algorithm Training AlgorithmResultsConclusions Outline

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Introduction Pattern Recognition Systems Best possible classification rates. Best possible classification rates. Increase efficiency and accuracy. Increase efficiency and accuracy. Multiple Classifier Systems Empirical Observation Empirical Observation Problem decomposed naturally from using various sensors Problem decomposed naturally from using various sensors Avoid making commitments to arbitrary initial conditions or parameters Avoid making commitments to arbitrary initial conditions or parameters “ Patterns mis-classified by different classifiers are not necessarily the same” [Kittler et. al., 98] Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS Architecture Input/Output Mapping Representation Specialized classifiers Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Architecture Parallel [Dasarathy, 94] Serial [Dasarathy, 94] Classifier 1 Classifier 2 Classifier N FUSIONFUSION Input 1 Output Input 2 Input N Classifier 1 Classifier 2 Classifier N Input 1 Input 2 Input N Output Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Input/Output Mapping Linear Mapping Sum Rule Sum Rule Weighted Average [Hashem 97] Weighted Average [Hashem 97] Non-linear Mapping Maximum Maximum Majority Majority Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Stacked Generalization [Wolpert 92] Stacked Generalization [Wolpert 92] Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Representation Similar representations Classifiers need to be different Classifiers need to be different Different representation Use of different sensors Use of different sensors Different features extracted from the same data set Different features extracted from the same data set Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Specialized Classifiers Specialized classifiers Encourage specialization in areas of the feature space Encourage specialization in areas of the feature space All classifiers must contribute to achieve a final decision All classifiers must contribute to achieve a final decision Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts Hierarchal Mixture of Experts Co-operative Modular Neural Networks [Auda and Kamel 98] Co-operative Modular Neural Networks [Auda and Kamel 98] Co-operative Modular Neural Networks Co-operative Modular Neural Networks Ensemble of classifiers Set of redundant classifiers Set of redundant classifiers Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Data Dependence Classifiers inherently dependent on the data. Classifiers inherently dependent on the data. Describe how the final aggregation uses the information present in the input pattern. Describe how the final aggregation uses the information present in the input pattern. Describe the relationship between the final output Q(x) and the pattern under classification x Describe the relationship between the final output Q(x) and the pattern under classification x Introduction

Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Dependence Data Independent Implicitly Dependent Explicitly Dependent Data Dependence

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence Solely rely on output of classifiers to determine final classification output. Solely rely on output of classifiers to determine final classification output. Q(x) is the final class assigned for pattern x C j is a vector composed of the output of the various classifiers in the ensemble {c 1j,c 2j,...,c Nj } for a given class y j c ij is the confidence classifier i has in pattern x belonging to class y j Mapping F j can be linear or non-linear Data Dependence

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence (cntd…) Example Average Vote Aggregation result only relies on the output confidences of the classifiers Aggregation result only relies on the output confidences of the classifiers The operator F j is the summation operation The operator F j is the summation operation Result skewed if individual confidences contain bias Result skewed if individual confidences contain bias Aggregation has no means of correcting this bias Aggregation has no means of correcting this bias Data Dependence

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence (cntd…) Simple voting techniques are data independent Average Average Maximum Maximum Maximum Majority Majority Majority Susceptible to incorrect estimates of the confidence Data Dependence

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence Train the combiner on global performance of the data W(C(x)) is the weighting matrix composed of elements w ij w ij is the weight assigned to class j in classifier i Implicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Example Weighted Average Based on the error correlation matrix the individual weights are assigned as Based on the error correlation matrix the individual weights are assigned as The weights are dependent on the behavior of the classifiers amongst themselves The weights are dependent on the behavior of the classifiers amongst themselves Weights can be represented as the function W(C j (x)) Weights can be represented as the function W(C j (x)) Implicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Example Weighted Average Mapping is the summation operator Mapping is the summation operator Hence Weighted average fits in the representation Hence Weighted average fits in the representation Implicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Implicitly data dependent approaches include Weighted average [Hashem 97] Weighted average [Hashem 97] Weighted average Weighted average Fuzzy Measures [Gader 96] Fuzzy Measures [Gader 96] Fuzzy Measures Fuzzy Measures Belief theory [Xu and Krzyzak, 92] Belief theory [Xu and Krzyzak, 92] Belief theory Belief theory Behavior Knowledge Space (BKS) [Huang, 95] Behavior Knowledge Space (BKS) [Huang, 95] Behavior Knowledge Space (BKS) Behavior Knowledge Space (BKS) Decision Templates [Kuncheva 01] Decision Templates [Kuncheva 01] Decision Templates Decision Templates Modular approaches [Auda and Kamel, 98] Modular approaches [Auda and Kamel, 98] Modular approaches Modular approaches Stacked Generalization [Wolpert 92] Stacked Generalization [Wolpert 92] Stacked Generalization Stacked Generalization Boosting [Schapire, 90] Boosting [Schapire, 90] Lacks consideration for local superiority of classifiers Implicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence Classifier selection or combining performed based on the sub-space which the input pattern belongs to. Final classification is dependent on the pattern being classified. Explicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence (cntd…) Example Dynamic Classifier Selection (DCS) Estimation of the accuracy of each classifier in local regions of the feature space Estimation of the accuracy of each classifier in local regions of the feature space Estimate determined by observing the input pattern Estimate determined by observing the input pattern Once superiority of classifier is identified, it’s output is used as the final decision Once superiority of classifier is identified, it’s output is used as the final decision i.e. Binary weights are assigned based on the local superiority of the classifiers. i.e. Binary weights are assigned based on the local superiority of the classifiers. Since weights are dependent on the input feature space they can be represented as W(x) Since weights are dependent on the input feature space they can be represented as W(x) DCS could therefore be considered explicitly data dependent with the mapping F j being the maximum operator DCS could therefore be considered explicitly data dependent with the mapping F j being the maximum operator Explicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence (cntd…) Explicitly Data Dependent approach include Dynamic Classifier Selection (DCS) Dynamic Classifier Selection (DCS) Dynamic Classifier Selection Dynamic Classifier Selection DCS With local Accuracy (DCS_LA) [Woods et. al.,97] DCS based on Multiple Classifier Behavior (DCS_MCB) [Giancinto and Roli, 01] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts Hierarchal Mixture of Experts Feature-based approach [Wanas et. al., 99] Feature-based approach [Wanas et. al., 99] Weights demonstrate dependence on the input pattern. Intuitively will perform better than other methods Explicit

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures Methodology to incorporate multiple classifiers in a dynamically adapting system Aggregation adapts to the behavior of the ensemble Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input A trained aggregation learns to combine the different decisions A trained aggregation learns to combine the different decisions Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Architecture I Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Classifiers Each individual classifier, C i, produces some output representing its interpretation of the input x Each individual classifier, C i, produces some output representing its interpretation of the input x Utilizing sub-optimal classifiers. Utilizing sub-optimal classifiers. The collection of classifier outputs for class y j is represented as C j (x) The collection of classifier outputs for class y j is represented as C j (x)Detector Detector D l is a classifier that uses input features to extract useful information for aggregation Detector D l is a classifier that uses input features to extract useful information for aggregation Doesn’t aim to solve the classification problem. Doesn’t aim to solve the classification problem. Detector output d lg (x) is a probablilty that the input pattern x is categorized to group g. Detector output d lg (x) is a probablilty that the input pattern x is categorized to group g. The output of all the detectors is represented by D(X) The output of all the detectors is represented by D(X) Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Aggregation Fusion layer for all the classifiers Fusion layer for all the classifiers Trained to adapt to the behavior of the various modules Trained to adapt to the behavior of the various modules Explicit data dependent Explicit data dependent Weights dependent on the input pattern being classified Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Architecture II Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Classifiers Each individual classifier, C i, produces some output representing its interpretation of the input x Each individual classifier, C i, produces some output representing its interpretation of the input x Utilizing sub-optimal classifiers. Utilizing sub-optimal classifiers. The collection of classifier outputs for class y j is represented as C j (x) The collection of classifier outputs for class y j is represented as C j (x)Detector Appends input to output of classifier ensemble. Appends input to output of classifier ensemble. Produces a weighting factor, w ij,for each class in a classifier output. Produces a weighting factor, w ij,for each class in a classifier output. The dependence of the weights on both the classifier output and the input pattern is represented by The dependence of the weights on both the classifier output and the input pattern is represented by W(x,C j (x)) Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Aggregation Fusion layer for all the classifiers Fusion layer for all the classifiers Trained to adapt to the behavior of the various modules Trained to adapt to the behavior of the various modules Combines implicit and explicit data dependence Combines implicit and explicit data dependence Weights dependent on the input pattern and the performance of the classifiers. Feature Based

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results Five one-hidden layer BP classifiers Training used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets 20 Class Gaussian 20 Class Gaussian Satimages Satimages Clouds data Clouds data Results

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results (cntd…) Data Set 20 Class CloudsSatimages Singlenet    1.33 Oracle 7.29    0.36 Data Dependent Approaches Maximum    0.21 Majority    0.16 Average    0.22 Borda    0.20 Implicitly Data Dependent Approaches Weighted Avg    0.21 Bayesian    0.16 Fuzzy Integral    0.19 Explicit Data Dependent Feature-based 8.64    0.19 Results

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Training Training each component independently Optimize individual components, may not lead to overall improvement Optimize individual components, may not lead to overall improvement Collinearity, high correlation between classifiers Collinearity, high correlation between classifiers Components, under-trained or over-trained Components, under-trained or over-trained Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Training (cntd…) Adaptive training Selective: Reducing correlation between components Selective: Reducing correlation between components Focused: Re-training focuses on misclassified patterns. Focused: Re-training focuses on misclassified patterns. Efficient: Determined the duration of training Efficient: Determined the duration of training Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Main loop Increase diversity among ensemble Incremental learning Evaluation of training to determine the re-training set Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Training Save classifier if it performs well on the evaluation set Determine when to terminate training for each module Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Evaluation Train aggregation modules Evaluate training sets for each classifier Compose new training data Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Data Selection New training data are composed by concatenating Error i : Misclassified entries of training data for classifier i. Error i : Misclassified entries of training data for classifier i. Correct i : Random choice of  R*(P*δ_i)  correctly classified entries of the training data for classifier i. Correct i : Random choice of  R*(P*δ_i)  correctly classified entries of the training data for classifier i. Training

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results Five one-hidden layer BP classifiers Training used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets 20 Class Gaussian 20 Class Gaussian Satimages Satimages Clouds data Clouds data Results

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results (cntd…) Data Set 20 Class CloudsSatimages Singlenet    1.33 Normal Training Best Classifier    0.43 Oracle 7.29    0.36 Feature Based 8.64    0.19 Ensemble Trained Adaptively using WA as the evaluation function Best Classifier    1.03 Oracle 6.79    0.17 Feature Based 8.62    0.12 Feature Based Architecture Trained Adaptively Best Classifier    0.87 Oracle 5.42    0.18 Feature Based 8.01    0.14 Results

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Conclusions Categorization of various combining approaches based on data dependence Independent : vulnerable to incorrect confidence estimates implicitly dependent: doesn’t take into account local superiority of classifiers Explicitly dependent: Literature focuses on selection not combining Conclusions

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Conclusions (cntd…) Feature-based approach Combines implicit and explicit data dependence Combines implicit and explicit data dependence Uses an Evolving training algorithm to enhance diversity amongst classifiers Uses an Evolving training algorithm to enhance diversity amongst classifiers Reduces harmful correlation Reduces harmful correlation Determines duration of training Determines duration of training Improved classification accuracy Improved classification accuracy Conclusions

Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers References [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans. PAMI, 20:3, , [Dasarthy, 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, [Hashem, 1997] S. Hashem, “Algorthims for Optimal Linear Combination of Neural Networks” Int. Conf. on Neural Networks, Vol 1, , [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM Algorithm”, Neural Computing, , [Wolpert, 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, , 1992 [Auda and Kamel, 98] [Auda and Kamel, 98] G. Auda and M. Kamel, “Modular Neural Network Classifiers: A Comparative Study”, J. Int. Rob. Sys., Vol. 21, 117–129, [Gader et. al., 96] [Gader et. al., 96] P. Gader, M. Mohamed, and J. Keller, “Fusion of Handwritten Word Classifiers”, Patt. Reco. Let.,17(6), 577–584, [Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their Applications to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), , 1992 [Kuncheva et. al., 01] [Kuncheva et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, “Decsion Templates for Multiple Classifier Fusion: An Experimental Comparison”, Patt. Reco., vol. 34, 299–314, [Huang et. al., 95] [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, “The Combination of Multiple Classifiers by a Neural Network Approach”, J. Patt. Reco. and Art. Int., Vol. 9, 579–597, [Schapire, 90] [Schapire, 90] R. Schapire, “The Strength of Weak Learnability”, Mach. Lear., Vol. 5, 197– 227,1990. [Giancinto and Roli, 01] G. Giancinto and F. Roli, “Dynamic Classifier Selection based on Multiple Classifier Behavior”, Patt. Reco., Vol. 34, , [Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation in Modular Neural Network Classifiers”, Patt. Reco. Lett., 20(11-13), , 1999.