Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo.

Similar presentations


Presentation on theme: "Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo."— Presentation transcript:

1 Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo

2 Introduction Data Dependence Implicit Dependence Implicit Dependence Explicit Dependence Explicit Dependence Feature Based Architecture Training Algorithm Training AlgorithmResultsConclusions Outline

3 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Introduction Pattern Recognition Systems Best possible classification rates. Best possible classification rates. Increase efficiency and accuracy. Increase efficiency and accuracy. Multiple Classifier Systems Empirical Observation Empirical Observation Problem decomposed naturally from using various sensors Problem decomposed naturally from using various sensors Avoid making commitments to arbitrary initial conditions or parameters Avoid making commitments to arbitrary initial conditions or parameters “ Patterns mis-classified by different classifiers are not necessarily the same” [Kittler et. al., 98] Introduction

4 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS Architecture Input/Output Mapping Representation Specialized classifiers Introduction

5 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Architecture Parallel [Dasarathy, 94] Serial [Dasarathy, 94] Classifier 1 Classifier 2 Classifier N FUSIONFUSION Input 1 Output Input 2 Input N Classifier 1 Classifier 2 Classifier N Input 1 Input 2 Input N Output Introduction

6 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Input/Output Mapping Linear Mapping Sum Rule Sum Rule Weighted Average [Hashem 97] Weighted Average [Hashem 97] Non-linear Mapping Maximum Maximum Majority Majority Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Stacked Generalization [Wolpert 92] Stacked Generalization [Wolpert 92] Introduction

7 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Representation Similar representations Classifiers need to be different Classifiers need to be different Different representation Use of different sensors Use of different sensors Different features extracted from the same data set Different features extracted from the same data set Introduction

8 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Specialized Classifiers Specialized classifiers Encourage specialization in areas of the feature space Encourage specialization in areas of the feature space All classifiers must contribute to achieve a final decision All classifiers must contribute to achieve a final decision Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts Hierarchal Mixture of Experts Co-operative Modular Neural Networks [Auda and Kamel 98] Co-operative Modular Neural Networks [Auda and Kamel 98] Co-operative Modular Neural Networks Co-operative Modular Neural Networks Ensemble of classifiers Set of redundant classifiers Set of redundant classifiers Introduction

9 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Categorization of MCS (cntd…) Data Dependence Classifiers inherently dependent on the data. Classifiers inherently dependent on the data. Describe how the final aggregation uses the information present in the input pattern. Describe how the final aggregation uses the information present in the input pattern. Describe the relationship between the final output Q(x) and the pattern under classification x Describe the relationship between the final output Q(x) and the pattern under classification x Introduction

10 Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Dependence Data Independent Implicitly Dependent Explicitly Dependent Data Dependence

11 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence Solely rely on output of classifiers to determine final classification output. Solely rely on output of classifiers to determine final classification output. Q(x) is the final class assigned for pattern x C j is a vector composed of the output of the various classifiers in the ensemble {c 1j,c 2j,...,c Nj } for a given class y j c ij is the confidence classifier i has in pattern x belonging to class y j Mapping F j can be linear or non-linear Data Dependence

12 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence (cntd…) Example Average Vote Aggregation result only relies on the output confidences of the classifiers Aggregation result only relies on the output confidences of the classifiers The operator F j is the summation operation The operator F j is the summation operation Result skewed if individual confidences contain bias Result skewed if individual confidences contain bias Aggregation has no means of correcting this bias Aggregation has no means of correcting this bias Data Dependence

13 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Data Independence (cntd…) Simple voting techniques are data independent Average Average Maximum Maximum Maximum Majority Majority Majority Susceptible to incorrect estimates of the confidence Data Dependence

14 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence Train the combiner on global performance of the data W(C(x)) is the weighting matrix composed of elements w ij w ij is the weight assigned to class j in classifier i Implicit

15 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Example Weighted Average Based on the error correlation matrix the individual weights are assigned as Based on the error correlation matrix the individual weights are assigned as The weights are dependent on the behavior of the classifiers amongst themselves The weights are dependent on the behavior of the classifiers amongst themselves Weights can be represented as the function W(C j (x)) Weights can be represented as the function W(C j (x)) Implicit

16 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Example Weighted Average Mapping is the summation operator Mapping is the summation operator Hence Weighted average fits in the representation Hence Weighted average fits in the representation Implicit

17 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Implicit Data Dependence (cntd…) Implicitly data dependent approaches include Weighted average [Hashem 97] Weighted average [Hashem 97] Weighted average Weighted average Fuzzy Measures [Gader 96] Fuzzy Measures [Gader 96] Fuzzy Measures Fuzzy Measures Belief theory [Xu and Krzyzak, 92] Belief theory [Xu and Krzyzak, 92] Belief theory Belief theory Behavior Knowledge Space (BKS) [Huang, 95] Behavior Knowledge Space (BKS) [Huang, 95] Behavior Knowledge Space (BKS) Behavior Knowledge Space (BKS) Decision Templates [Kuncheva 01] Decision Templates [Kuncheva 01] Decision Templates Decision Templates Modular approaches [Auda and Kamel, 98] Modular approaches [Auda and Kamel, 98] Modular approaches Modular approaches Stacked Generalization [Wolpert 92] Stacked Generalization [Wolpert 92] Stacked Generalization Stacked Generalization Boosting [Schapire, 90] Boosting [Schapire, 90] Lacks consideration for local superiority of classifiers Implicit

18 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence Classifier selection or combining performed based on the sub-space which the input pattern belongs to. Final classification is dependent on the pattern being classified. Explicit

19 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence (cntd…) Example Dynamic Classifier Selection (DCS) Estimation of the accuracy of each classifier in local regions of the feature space Estimation of the accuracy of each classifier in local regions of the feature space Estimate determined by observing the input pattern Estimate determined by observing the input pattern Once superiority of classifier is identified, it’s output is used as the final decision Once superiority of classifier is identified, it’s output is used as the final decision i.e. Binary weights are assigned based on the local superiority of the classifiers. i.e. Binary weights are assigned based on the local superiority of the classifiers. Since weights are dependent on the input feature space they can be represented as W(x) Since weights are dependent on the input feature space they can be represented as W(x) DCS could therefore be considered explicitly data dependent with the mapping F j being the maximum operator DCS could therefore be considered explicitly data dependent with the mapping F j being the maximum operator Explicit

20 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Explicit Data Dependence (cntd…) Explicitly Data Dependent approach include Dynamic Classifier Selection (DCS) Dynamic Classifier Selection (DCS) Dynamic Classifier Selection Dynamic Classifier Selection DCS With local Accuracy (DCS_LA) [Woods et. al.,97] DCS based on Multiple Classifier Behavior (DCS_MCB) [Giancinto and Roli, 01] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts [Jordon and Jacobs 94] Hierarchal Mixture of Experts Hierarchal Mixture of Experts Feature-based approach [Wanas et. al., 99] Feature-based approach [Wanas et. al., 99] Weights demonstrate dependence on the input pattern. Intuitively will perform better than other methods Explicit

21 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures Methodology to incorporate multiple classifiers in a dynamically adapting system Aggregation adapts to the behavior of the ensemble Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input Detectors generate weights for each classifier that reflect the degree of confidence in each classifier for a given input A trained aggregation learns to combine the different decisions A trained aggregation learns to combine the different decisions Feature Based

22 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Architecture I Feature Based

23 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Classifiers Each individual classifier, C i, produces some output representing its interpretation of the input x Each individual classifier, C i, produces some output representing its interpretation of the input x Utilizing sub-optimal classifiers. Utilizing sub-optimal classifiers. The collection of classifier outputs for class y j is represented as C j (x) The collection of classifier outputs for class y j is represented as C j (x)Detector Detector D l is a classifier that uses input features to extract useful information for aggregation Detector D l is a classifier that uses input features to extract useful information for aggregation Doesn’t aim to solve the classification problem. Doesn’t aim to solve the classification problem. Detector output d lg (x) is a probablilty that the input pattern x is categorized to group g. Detector output d lg (x) is a probablilty that the input pattern x is categorized to group g. The output of all the detectors is represented by D(X) The output of all the detectors is represented by D(X) Feature Based

24 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Aggregation Fusion layer for all the classifiers Fusion layer for all the classifiers Trained to adapt to the behavior of the various modules Trained to adapt to the behavior of the various modules Explicit data dependent Explicit data dependent Weights dependent on the input pattern being classified Feature Based

25 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Architecture II Feature Based

26 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Classifiers Each individual classifier, C i, produces some output representing its interpretation of the input x Each individual classifier, C i, produces some output representing its interpretation of the input x Utilizing sub-optimal classifiers. Utilizing sub-optimal classifiers. The collection of classifier outputs for class y j is represented as C j (x) The collection of classifier outputs for class y j is represented as C j (x)Detector Appends input to output of classifier ensemble. Appends input to output of classifier ensemble. Produces a weighting factor, w ij,for each class in a classifier output. Produces a weighting factor, w ij,for each class in a classifier output. The dependence of the weights on both the classifier output and the input pattern is represented by The dependence of the weights on both the classifier output and the input pattern is represented by W(x,C j (x)) Feature Based

27 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Feature Based Architectures (cntd…) Aggregation Fusion layer for all the classifiers Fusion layer for all the classifiers Trained to adapt to the behavior of the various modules Trained to adapt to the behavior of the various modules Combines implicit and explicit data dependence Combines implicit and explicit data dependence Weights dependent on the input pattern and the performance of the classifiers. Feature Based

28 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results Five one-hidden layer BP classifiers Training used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets 20 Class Gaussian 20 Class Gaussian Satimages Satimages Clouds data Clouds data Results

29 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results (cntd…) Data Set 20 Class CloudsSatimages Singlenet 13.82  1.16 10.92  0.08 14.06  1.33 Oracle 7.29  1.06 7.41  0.16 7.20  0.36 Data Dependent Approaches Maximum 12.92  0.35 10.68  0.04 13.61  0.21 Majority 13.13  0.36 10.71  0.02 13.40  0.16 Average 12.83  0.26 10.66  0.04 13.23  0.22 Borda 13.04  0.30 10.71  0.02 13.77  0.20 Implicitly Data Dependent Approaches Weighted Avg. 12.57  0.20 10.59  0.05 13.14  0.21 Bayesian 12.48  0.21 10.71  0.02 13.51  0.16 Fuzzy Integral 12.95  0.34 10.67  0.05 13.71  0.19 Explicit Data Dependent Feature-based 8.64  0.60 10.28  0.10 12.48  0.19 Results

30 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Training Training each component independently Optimize individual components, may not lead to overall improvement Optimize individual components, may not lead to overall improvement Collinearity, high correlation between classifiers Collinearity, high correlation between classifiers Components, under-trained or over-trained Components, under-trained or over-trained Training

31 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Training (cntd…) Adaptive training Selective: Reducing correlation between components Selective: Reducing correlation between components Focused: Re-training focuses on misclassified patterns. Focused: Re-training focuses on misclassified patterns. Efficient: Determined the duration of training Efficient: Determined the duration of training Training

32 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Main loop Increase diversity among ensemble Incremental learning Evaluation of training to determine the re-training set Training

33 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Training Save classifier if it performs well on the evaluation set Determine when to terminate training for each module Training

34 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Evaluation Train aggregation modules Evaluate training sets for each classifier Compose new training data Training

35 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Adaptive Training: Data Selection New training data are composed by concatenating Error i : Misclassified entries of training data for classifier i. Error i : Misclassified entries of training data for classifier i. Correct i : Random choice of  R*(P*δ_i)  correctly classified entries of the training data for classifier i. Correct i : Random choice of  R*(P*δ_i)  correctly classified entries of the training data for classifier i. Training

36 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results Five one-hidden layer BP classifiers Training used partially disjoint data sets No optimization is performed for the trained networks The parameters of all the networks are maintained for all the classifiers that are trained Three data sets 20 Class Gaussian 20 Class Gaussian Satimages Satimages Clouds data Clouds data Results

37 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Results (cntd…) Data Set 20 Class CloudsSatimages Singlenet 13.82  1.16 10.92  0.08 14.06  1.33 Normal Training Best Classifier 14.03  0.64 11.00  0.09 14.72  0.43 Oracle 7.29  1.06 7.41  0.16 7.20  0.36 Feature Based 8.64  0.60 10.28  0.10 12.48  0.19 Ensemble Trained Adaptively using WA as the evaluation function Best Classifier 14.75  1.06 12.03  0.52 17.13  1.03 Oracle 6.79  2.30 5.73  0.11 5.58  0.17 Feature Based 8.62  0.25 10.24  0.17 12.40  0.12 Feature Based Architecture Trained Adaptively Best Classifier 14.80  1.32 11.97  0.59 16.96  0.87 Oracle 5.42  1.30 5.43  0.11 5.48  0.18 Feature Based 8.01  0.19 10.06  0.13 12.33  0.14 Results

38 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Conclusions Categorization of various combining approaches based on data dependence Independent : vulnerable to incorrect confidence estimates implicitly dependent: doesn’t take into account local superiority of classifiers Explicitly dependent: Literature focuses on selection not combining Conclusions

39 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers Conclusions (cntd…) Feature-based approach Combines implicit and explicit data dependence Combines implicit and explicit data dependence Uses an Evolving training algorithm to enhance diversity amongst classifiers Uses an Evolving training algorithm to enhance diversity amongst classifiers Reduces harmful correlation Reduces harmful correlation Determines duration of training Determines duration of training Improved classification accuracy Improved classification accuracy Conclusions

40 Introduction Data Dependence Implicit Explicit Feature Based Training Results Conclusions MCS 2003Data Dependence in Combining Classifiers References [Kittler et. al., 98] J. Kittler, M. Hatef, R. Duin, and J. Matas, “On Combining Classifiers”, IEEE Trans. PAMI, 20:3, 226-239, 1998. [Dasarthy, 94] B. Dasarthy, “Decision Fusion”, IEEE Computer Soc. Press, 1994. [Hashem, 1997] S. Hashem, “Algorthims for Optimal Linear Combination of Neural Networks” Int. Conf. on Neural Networks, Vol 1, 242-247, 1997. [Jordon and Jacob, 94] M. Jordon, and R. Jacobs, “Hierarchical Mixture of Experts and the EM Algorithm”, Neural Computing, 181-214, 1994. [Wolpert, 92] D. Wolpert, “Stacked Generalization”, Neural Networks, Vol 5, 241-259, 1992 [Auda and Kamel, 98] [Auda and Kamel, 98] G. Auda and M. Kamel, “Modular Neural Network Classifiers: A Comparative Study”, J. Int. Rob. Sys., Vol. 21, 117–129, 1998. [Gader et. al., 96] [Gader et. al., 96] P. Gader, M. Mohamed, and J. Keller, “Fusion of Handwritten Word Classifiers”, Patt. Reco. Let.,17(6), 577–584, 1996. [Xu et. al., 92] L. Xu, A. Kazyzak, C. Suen, “Methods of Combining Multiple Classifiers and their Applications to Handwritten Recognition”, IEEE Sys. Man and Cyb., 22(3), 418- 435, 1992 [Kuncheva et. al., 01] [Kuncheva et. al., 01] L. Kuncheva, J. Bezdek, and R. Duin, “Decsion Templates for Multiple Classifier Fusion: An Experimental Comparison”, Patt. Reco., vol. 34, 299–314, 2001. [Huang et. al., 95] [Huang et. al., 95] Y. Huang, K. Liu, and C. Suen, “The Combination of Multiple Classifiers by a Neural Network Approach”, J. Patt. Reco. and Art. Int., Vol. 9, 579–597, 1995. [Schapire, 90] [Schapire, 90] R. Schapire, “The Strength of Weak Learnability”, Mach. Lear., Vol. 5, 197– 227,1990. [Giancinto and Roli, 01] G. Giancinto and F. Roli, “Dynamic Classifier Selection based on Multiple Classifier Behavior”, Patt. Reco., Vol. 34, 1879-1881, 2001. [Wanas et., al., 99] N. Wanas, M. Kamel, G. Auda, and F. Karray, “Feature Based Decision Aggregation in Modular Neural Network Classifiers”, Patt. Reco. Lett., 20(11-13), 1353- 1359, 1999.


Download ppt "Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo."

Similar presentations


Ads by Google