Visual Information Systems multiple processor approach.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

ECG Signal processing (2)
Neural networks Introduction Fitting neural networks
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Face Recognition & Biometric Systems Support Vector Machines (part 2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
Reduced Support Vector Machine
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Chapter 6: Multilayer Neural Networks
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Radial Basis Function Networks
For Better Accuracy Eick: Ensemble Learning
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Machine Learning CS 165B Spring 2012
This week: overview on pattern recognition (related to machine learning)
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Data Dependence in Combining Classifiers Mohamed Kamel PAMI Lab University of Waterloo.
1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.
An Introduction to Support Vector Machines (M. Law)
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
AUTOMATIC TARGET RECOGNITION AND DATA FUSION March 9 th, 2004 Bala Lakshminarayanan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Ensemble Methods in Machine Learning
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Classification Ensemble Methods 1
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Data Mining and Decision Support
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
KNN & Naïve Bayes Hongning Wang
CS 9633 Machine Learning Support Vector Machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
A Pool of Deep Models for Event Recognition
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
COMP61011 : Machine Learning Ensemble Models
Neuro-Computing Lecture 5 Committee Machine
A Consensus-Based Clustering Method
A “Holy Grail” of Machine Learing
Ensemble learning.
Parametric Methods Berlin Chen, 2005 References:
Evolutionary Ensembles with Negative Correlation Learning
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Visual Information Systems multiple processor approach

Objectives An introductive tutorial on multiple classifier combination An introductive tutorial on multiple classifier combination Motivation and basic concepts Motivation and basic concepts Main methods for creating multiple classifiers Main methods for creating multiple classifiers Main methods for fusing multiple classifiers Main methods for fusing multiple classifiers Applications, achievement, open issues and conclusion Applications, achievement, open issues and conclusion

Why? A natural move when trying to solve numerous complicated patterns A natural move when trying to solve numerous complicated patterns Efficiency Efficiency Dimension; Dimension; Complicated architecture such as neural network; Complicated architecture such as neural network; Speed; Speed; Accuracy Accuracy

Pattern Classification Processing Feature extraction Classification Fork spoon

Traditional approach to pattern classification Unfortunately, no dominant classifier exists for all the data distributions, and the data distribution of the task at hand is usually unknown Unfortunately, no dominant classifier exists for all the data distributions, and the data distribution of the task at hand is usually unknown Not one classifier can discriminative well enough if the number of classes are huge Not one classifier can discriminative well enough if the number of classes are huge For applications where the objects/classes of content are numerous, unlimited, unpredictable, one specific classifier/detector cannot solve the problem. For applications where the objects/classes of content are numerous, unlimited, unpredictable, one specific classifier/detector cannot solve the problem.

Combine individual classifiers Beside avoiding the selection of the worse classifier, under particular hypothesis, fusion of multiple classifiers can improve the performance of the best individual classifiers and, in some special cases, provide the optimal Bayes classifier Beside avoiding the selection of the worse classifier, under particular hypothesis, fusion of multiple classifiers can improve the performance of the best individual classifiers and, in some special cases, provide the optimal Bayes classifier This is possible if individual classifiers make “different” errors This is possible if individual classifiers make “different” errors For linear combiners, Turner and Ghosh (1996) showed that averaging outputs of individual classifiers with unbiased and uncorrelated errors can improve the performance of the best individual classifier and, for infinite number of classifiers, provide the optimal Bayes classifier For linear combiners, Turner and Ghosh (1996) showed that averaging outputs of individual classifiers with unbiased and uncorrelated errors can improve the performance of the best individual classifier and, for infinite number of classifiers, provide the optimal Bayes classifier

Definitions A “classifier” is any mapping from the space of features(measurements) to a space of class labels (names, tags, distances, probabilities) A “classifier” is any mapping from the space of features(measurements) to a space of class labels (names, tags, distances, probabilities) A classifier is a hypothesis about the real relation between features and class labels A classifier is a hypothesis about the real relation between features and class labels A “learning algorithm” is a method to construct hypotheses A “learning algorithm” is a method to construct hypotheses A learning algorithm applied to a set of samples (training set) outputs a classifier A learning algorithm applied to a set of samples (training set) outputs a classifier

Definitions A multiple classifier system (MCS) is a structured way to combine (exploit) the outputs of individual classifiers A multiple classifier system (MCS) is a structured way to combine (exploit) the outputs of individual classifiers MCS can be thought as: MCS can be thought as: Multiple expert systems Multiple expert systems Committees of experts Committees of experts Mixtures of experts Mixtures of experts Classifier ensembles Classifier ensembles Composite classifier systems Composite classifier systems

Basic concepts Multiple Classifier Systems (MCS) can be characterized by: Multiple Classifier Systems (MCS) can be characterized by: The Architecture The Architecture Fixed/Trained Combination strategy Fixed/Trained Combination strategy Others Others

MCS Architecture/Topology Serial Serial Expert 1 Expert 2 … Expert N

Parallel Parallel MCS Architecture/Topology Expert 1Expert 2 … Expert N Combining strategy

Hybrid Hybrid MCS Architecture/Topology Expert 1 Expert 2 … Expert N Combiner1 Combiner2

Multiple Classifiers Sources? Different feature spaces: face, voice fingerprint; Different feature spaces: face, voice fingerprint; Different training sets: Sampling; Different training sets: Sampling; Different classifiers: K_NN, Neural Net, SVM; Different classifiers: K_NN, Neural Net, SVM; Different architectures: Neural net: layers, Units, transfer function; Different architectures: Neural net: layers, Units, transfer function; Different parameter values: K in K_NN, Kernel in SVM; Different parameter values: K in K_NN, Kernel in SVM; Different initializations: Neural net Different initializations: Neural net

Multiple Classifiers Sources? Same feature space, three classifiers demonstrate different performance

Multiple Classifiers Sources? Different feature spaces: face, voice fingerprint; Different feature spaces: face, voice fingerprint; Different training sets: Sampling; Different training sets: Sampling; Different classifiers: K_NN, Neural Net, SVM; Different classifiers: K_NN, Neural Net, SVM; Different architectures: Neural net: layers, Units, transfer function; Different architectures: Neural net: layers, Units, transfer function; Different parameter values: K in K_NN, Kernel in SVM; Different parameter values: K in K_NN, Kernel in SVM; Different initializations: Neural net Different initializations: Neural net

Combination based on different feature spaces

Combining based on a single space but different classifiers

Architecture of multiple classifier combination

Fixed Combination Rules Product, Minimum Product, Minimum Independent feature spaces; Independent feature spaces; Different areas of expertise; Different areas of expertise; Error free posterior probability estimates Error free posterior probability estimates Sum(Mean), Median, Majority Vote Sum(Mean), Median, Majority Vote Equal posterior-estimation distributions in same feature space; Equal posterior-estimation distributions in same feature space; Differently trained classifiers, but drawn from the same distribution Differently trained classifiers, but drawn from the same distribution Bad if some classifiers(experts) are very good or very bad Bad if some classifiers(experts) are very good or very bad Maximum Rule Maximum Rule Trust the most confident classifier/expert; Trust the most confident classifier/expert; Bad if some classifiers(experts) are badly trained. Bad if some classifiers(experts) are badly trained. Ever optimal?

Fixed combining rules are sub- optimal Base classifiers are never really independent(Product) Base classifiers are never really independent(Product) Base classifiers are never really equally imperfectly trained(sum,median,majority) Base classifiers are never really equally imperfectly trained(sum,median,majority) Sensitivity to over-confident base classifiers(product, min,max) Sensitivity to over-confident base classifiers(product, min,max) Fixed combining rules are never optimal

Trainedcombiner

Remarks on fixed and trained combination strategies Fixed rules Fixed rules Simplicity Simplicity Low memory and time requirements Low memory and time requirements Well-suited for ensembles of classifiers with independent/low correlated errors and similar performances Well-suited for ensembles of classifiers with independent/low correlated errors and similar performances Trained rules Trained rules Flexibility: potentially better performances than fixed rules Flexibility: potentially better performances than fixed rules Trained rules are claimed to be more suitable than fixed ones for classifiers correlated or exhibiting different performances Trained rules are claimed to be more suitable than fixed ones for classifiers correlated or exhibiting different performances High memory and time requirements High memory and time requirements

Methods for fusing multiple classifiers Methods for fusing multiple classifiers can be classified according to the type of information produced by the individual classifiers (Xu et al., 1992) Methods for fusing multiple classifiers can be classified according to the type of information produced by the individual classifiers (Xu et al., 1992) The abstract level output: a classifier only outputs a unique label for each input pattern; The abstract level output: a classifier only outputs a unique label for each input pattern; The rank level output: each classifier outputs a list of possible classes, with ranking, for each input pattern The rank level output: each classifier outputs a list of possible classes, with ranking, for each input pattern The measurement level output: each classifier outputs class “confidence” levels for each input pattern The measurement level output: each classifier outputs class “confidence” levels for each input pattern For each of the above categories, methods can be further subdivided into: Integration vs Selection rules and Fixed rules vs trained rules

Example The majority voting rule The majority voting rule fixed rules at the abstract-level fixed rules at the abstract-level

Fuser (“Combination” rule) Two main categories of fuser: Two main categories of fuser: Integration (fusion) functions: for each pattern, all the classifiers contribute to the final decision. Integration assumes competitive classifiers Integration (fusion) functions: for each pattern, all the classifiers contribute to the final decision. Integration assumes competitive classifiers Selection functions: for each pattern, just one classifier, or a subset, is responsible for the final decision. Selection assumes complementary classifiers Selection functions: for each pattern, just one classifier, or a subset, is responsible for the final decision. Selection assumes complementary classifiers Integration and Selection can be “merged” for designing the hybrid fuser Integration and Selection can be “merged” for designing the hybrid fuser Multiple functions for non-parallel architecture can be necessary Multiple functions for non-parallel architecture can be necessary

Classifiers “Diversity” vs Fuser Complexity Fusion is obviously useful only if the combined classifiers are mutually complementary Fusion is obviously useful only if the combined classifiers are mutually complementary Ideally, classifiers with high accuracy and high diversity Ideally, classifiers with high accuracy and high diversity The required degree of error diversity depends on the fuser complexity The required degree of error diversity depends on the fuser complexity Majority vote fuser: the majority should be always correct Majority vote fuser: the majority should be always correct Ideal selector: only one classifier should correct for each pattern ?? Ideal selector: only one classifier should correct for each pattern ??

Classifiers “Diversity” vs Fuser Complexity An example, four diversity levels (A. Sharkey, 1999) An example, four diversity levels (A. Sharkey, 1999) Level 1: no more than one classifier is wrong for each pattern Level 1: no more than one classifier is wrong for each pattern Level 2: the majority is always correct Level 2: the majority is always correct Level 3: at least one classifier is correct for each pattern Level 3: at least one classifier is correct for each pattern Level 4: all classifiers are wrong for some patterns Level 4: all classifiers are wrong for some patterns

Classifiers Diversity Measures of diversity in classifier ensembles are a matter of ongoing research (L. I. Kuncheva) Measures of diversity in classifier ensembles are a matter of ongoing research (L. I. Kuncheva) Key issue: how are the diversity measures related to the accuracy of the ensemble? Key issue: how are the diversity measures related to the accuracy of the ensemble? Simple fusers can be used for classifiers that exhibit a simple complementary pattern (e.g. majority voting) Simple fusers can be used for classifiers that exhibit a simple complementary pattern (e.g. majority voting) Complex fusers, for example, a dynamic selector, are necessary for classifiers with a complex dependency model Complex fusers, for example, a dynamic selector, are necessary for classifiers with a complex dependency model The required “complexity” of the fuser depends on the degree of classifiers diversity The required “complexity” of the fuser depends on the degree of classifiers diversity

Analogy Between MCS and Single Classifier Design Feature Design Classifier Design Performance Evaluation Ensemble Design Fuser Design Performance Evaluation

MCS Design The design of MCS involves two main phases: the design of the classifier ensemble, and the design of the fuser The design of MCS involves two main phases: the design of the classifier ensemble, and the design of the fuser The design of the classifier ensemble is aimed to create a set of complementary/diverse classifiers The design of the classifier ensemble is aimed to create a set of complementary/diverse classifiers The design of the combination function/fuser is aimed to create a fusion mechanism that can exploit the complementarity/diversity of classifiers and optimally combine them The design of the combination function/fuser is aimed to create a fusion mechanism that can exploit the complementarity/diversity of classifiers and optimally combine them The two above design phases are obviously linked (Roli and Giacinto, 2002) The two above design phases are obviously linked (Roli and Giacinto, 2002)

Methods for Constructing MCS The effectiveness of MCS relies on combining diverse/complentary classifiers The effectiveness of MCS relies on combining diverse/complentary classifiers Several approaches have been proposed to construct ensembles made up of complementary classifiers. Among the others: Several approaches have been proposed to construct ensembles made up of complementary classifiers. Among the others: Using problem and designer knowledge Using problem and designer knowledge Injecting randomness Injecting randomness Varying the classifier type, architecture, or parameters Varying the classifier type, architecture, or parameters Manipulating training data Manipulating training data Manipulating input features Manipulating input features Manipulating output features Manipulating output features

Using problem and designer knowledge When problem or designer knowledge is available, “complementary” classification algorithms can be designed When problem or designer knowledge is available, “complementary” classification algorithms can be designed In applications with multiple sensors In applications with multiple sensors In applications where complementary representations of patterns are possible (e.g., statistical and structural representations) In applications where complementary representations of patterns are possible (e.g., statistical and structural representations) When designer knowledge allows varying the classifier type, architecture, or parameters to create complementary classifiers When designer knowledge allows varying the classifier type, architecture, or parameters to create complementary classifiers There are heuristic approaches, perform as well as the problem designer knowledge allows to design complementary classifiers

Two main method for MCS design (T. K. Ho, 2000) Coverage optimisation methods Coverage optimisation methods A simple fuser is given without any design. The goal is to create a set of complementary classifiers that can be fused optimally A simple fuser is given without any design. The goal is to create a set of complementary classifiers that can be fused optimally Decision optimisation methods Decision optimisation methods A set of carefully designed and optimised classifiers is given and unchangeable, the goal is to optimise the fuser A set of carefully designed and optimised classifiers is given and unchangeable, the goal is to optimise the fuser

Two main method for MCS design Decision optimisation method to MCS design is often used when previously carefully designed classifiers are available, or valid problem and designer knowledge is available Decision optimisation method to MCS design is often used when previously carefully designed classifiers are available, or valid problem and designer knowledge is available Coverage optimisation method makes sense when creating carefully designed, “strong”, classifiers is difficult, or time consuming Coverage optimisation method makes sense when creating carefully designed, “strong”, classifiers is difficult, or time consuming Integration of the two basic approaches is often used Integration of the two basic approaches is often used However, no design method guarantees to obtain the “optimal” ensemble for a given fuser or a given application “Roli and Giacinto, 2002) However, no design method guarantees to obtain the “optimal” ensemble for a given fuser or a given application “Roli and Giacinto, 2002) The base MCS can only be determined by performance evaluation The base MCS can only be determined by performance evaluation

Rank-level Fusion Methods Some classifiers provide class “scores”, or some sort of class probabilities Some classifiers provide class “scores”, or some sort of class probabilities This information can be used to “rank” each class This information can be used to “rank” each class Pc1=0.10 Rc1=1 Pc1=0.10 Rc1=1 Classifier -> Pc2=0.75 -> Rc2=3 Classifier -> Pc2=0.75 -> Rc2=3 Pc3=0.15 Rc3=2 Pc3=0.15 Rc3=2 In general if Ω={c1,…ck} is the set of classes, the classifiers can provide an “ordered” (ranked) list of class labels In general if Ω={c1,…ck} is the set of classes, the classifiers can provide an “ordered” (ranked) list of class labels

The Borda Count Method: an example Let N=3 and k=4, Ω={a,b,c,d} Let N=3 and k=4, Ω={a,b,c,d} For a given pattern, the ranked ouptuts of the three classfiers are as follows For a given pattern, the ranked ouptuts of the three classfiers are as follows Rank value Classifier1 Classifer2 Classifier3 4 c a b 3 b b a 2 d d c 1 a c d

The Borda Count Methods: an example So we have So we have r a = r a 1 r a = r a 1 +r a 2 + r a 3 = 1+4+3=8 r b = r b 1 r b = r b 1 +r b 2 + r b 3 = 3+3+4=10 r c = r c 1 r c = r c 1 +r c 2 + r c 3 = 4+1+2=7 r d = r d 1 r d = r d 1 +r d 2 + r d 3 = 2+2+1=5 The winner-class is b because it has the maximum overall rank

Remarks on Rank level Methods Advantage over abstract level (majority vote) Advantage over abstract level (majority vote) Ranking is suitable in problems with many classes, where the correct class may appear often near the top of the list, although not at the top Ranking is suitable in problems with many classes, where the correct class may appear often near the top of the list, although not at the top Example: word recognition with sizeablee lexicon Example: word recognition with sizeablee lexicon Advantages over measurement level: Advantages over measurement level: Rankings can be preferred to soft outputs to avoid lack of conssitency when using different classifier Rankings can be preferred to soft outputs to avoid lack of conssitency when using different classifier Rankins can be preferred to soft outputs to simplify the combiner design Rankins can be preferred to soft outputs to simplify the combiner design Drawbacks: Drawbacks: Rank-level method are not supported by clear theorectical underpinnings Rank-level method are not supported by clear theorectical underpinnings Results depend on the scale of numbers assigned to the choices Results depend on the scale of numbers assigned to the choices

Open issues General combination strategies are only sub-optimal solutions to most applications; General combination strategies are only sub-optimal solutions to most applications;

References 1. Dr K Sirlantzis “Diversity in Multiple Classifier Systems”, University of Kent; F. Roli, Tutorial Fusion of Multiple Pattern Classifier”, University of Cagliari Robert P.W.Duin, “The Combining Classifier: to Train or Not to Train?”, ICPR 2002, Pattern Recognition Group, Faculty of Applied Sciences; 4. L. Xu, A. Kryzak, C. V. Suen, “Methods of Combining Multiple Classifiers and Their Applications to Handwriting Recognition”, IEEE Transactions on Systems, Man Cybernet, 22(3), 1992, pp J. Kittle, M. Hatef, R. Duin and J. Matas, “On Combining Classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), March 1998, pp D. Tax, M. Breukelen, R. Duin, J. Kittle, “Combining Multiple Classifiers by Averaging or by Multiplying?”, Patter Recognition, 33(2000), pp L. I. Kuncheva, “A Theoretical Study on Six Classifier Fusion Strategies”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 2002, pp