Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK

Similar presentations


Presentation on theme: "Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK"— Presentation transcript:

1 Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK l.i.kuncheva@bangor.ac.uk

2 classifier feature values (object description) classifier class label combiner classifier ensemble

3 Congratulations! The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum.their algorithmLeaderboard Forum We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love. classifier feature values (object description) classifier class label combiner classifier ensemble

4 cited 7194 times by 28 July 2013 (Google Scholar) classifier feature values (object description) classifier class label combiner classifier ensemble

5 Saso Dzeroski David Hand S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, 255-273. David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21 (1), 1-14. Classifier combination? Hmmmm….. We are kidding ourselves; there is no real progress in spite of ensemble methods. Chances are that the single best classifier will be better than the ensemble.

6 Quo Vadis? "combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning")

7 Gartner’s Hype Cycle: a typical evolution pattern of a new technology Where are we?...

8 (6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics JASA = Journal of the American Statistical Association IJCV = International Journal of Computer Vision JTB = Journal of Theoretical Biology (2) PPL = Protein and Peptide Letters JAE = Journal of Animal Ecology PR = Pattern Recognition (4) ML = Machine Learning NN = Neural Networks CC = Cerebral Cortex top cited paper is from… application paper

9

10 International Workshop on Multiple Classifier Systems 2000 – 2013 - continuing

11 Combiner Features Classifier 2Classifier 1Classifier L… Data set A Combination level selection or fusion? voting or another combination method? trainable or non-trainable combiner? B Classifier level same or different classifiers? decision trees, neural networks or other? how many? C Feature level all features or subsets of features? random or selected subsets? D Data level independent/dependent bootstrap samples? selected data sets? Levels of questions

12 Number of classifiers L 1 The perfect classifier 3-8 classifiers heterogeneous trained combiner (stacked generalisation) 100+ classifiers same model non-trained combiner (bagging, boosting, etc.)  Large ensemble of nearly identical classifiers - REDUNDANCY  Small ensembles of weak classifiers - INSUFFICIENCY ? ? Must engineer diversity… Strength of classifiers How about here? 30-50 classifiers same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT?

13 Number of classifiers L 1 The perfect classifier 3-8 classifiers heterogeneous trained combiner (stacked generalisation) 100+ classifiers same model non-trained combiner (bagging, boosting, etc.)  Large ensemble of nearly identical classifiers - REDUNDANCY  Small ensembles of weak classifiers - INSUFFICIENCY Must engineer diversity… Strength of classifiers 30-50 classifiers same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT? Diversity is absolutely CRUCIAL! Diversity is pretty impossible…

14 Label outputsContinuous-valued outputs 1 1 2 2 3 3 x 1 1 2 2 3 3 x Decision profile

15 Ensemble (label outputs, R,G,B) 204 R 102 G 54 B Red Blue Red Green Red Majority vote

16 Ensemble (label outputs, R,G,B) 200 R 219 G 190 B Red Blue Red Green Red Majority vote Green Weighted Majority vote 0.05 0.50 0.02 0.10 0.70 0.10 0.27 0.70 0.50

17 Ensemble (label outputs, R,G,B) Red Blue Red Green Red RBRRGR Classifier Green

18 Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0 0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8]

19 Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0 0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R = 0.45

20 Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0 0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R = 0.45 Mean G = 0.48

21 Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0 0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN

22 Ensemble (continuous outputs, [R,G,B]) [0.6 0.3 0.1] [0.1 0.0 0.6] [0.7 0.6 0.5] [0.4 0.3 0.1] [0 1 0] [0.9 0.7 0.8] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN Decision profile 0.6 0.3 0.1 0.1 0.0 0.6 0.7 0.6 0.5 0.4 0.3 0.1 0.0 1.0 0.0 0.9 0.7 0.8

23 Decision profile 0.6 0.3 0.1 0.1 0.0 0.6 0.7 0.6 0.5 0.4 0.3 0.1 0.0 1.0 0.0 0.9 0.7 0.8 classes classifiers Support that classifier #4 gives to the hypothesis that the object to classify comes from class #3. Would be nice if these were probability distributions...

24 Decision profile classes classifiers … We can take probability outputs from the classifiers

25 Combination Rules For label outputs For continuous-valued outputs Majority (plurality) vote Weighted majority vote Naïve Bayes BKS A classifier Simple rules: minimum, maximum, product, average (sum) c Regressions A classifier

26 Combination Rules For label outputs For continuous-valued outputs Majority (plurality) vote Weighted majority vote Naïve Bayes BKS A classifier Simple rules: minimum, maximum, product, average (sum) c Regressions A classifier Decision profile

27

28

29 classifier feature values (object description) classifier class label combiner classifier ensemble

30 classifier feature values (object description) classifier class label classifier classifier ensemble

31 http://samcnitt.tumblr.com/ Bob Duin: The Combining Classifier: to Train or Not to Train?

32 Tin Ho: “Multiple Classifier Combination: Lessons and Next Steps”, 2002 “Instead of looking for the best set of features and the best classifier, now we look for the best set of classifiers and then the best combination method. One can imagine that very soon we will be looking for the best set of combination methods and then the best way to use them all. If we do not take the chance to review the fundamental problems arising from this challenge, we are bound to be driven into such an infinite recurrence, dragging along more and more complicated combination schemes and theories and gradually losing sight of the original problem.”

33 Classifier ensembles: Does the combination rule matter? In a word, yes. But its merit depends upon the base classifier model, the training of the individual classifiers, the diversity, the possibility to train the combiner, and more. Conclusions - 1

34 1.The choice of the combiner should not be side-lined. 2.The combiner should be chosen in relation to the rest of the ensemble and the available data. Conclusions - 2

35 Questions to you: 1.What is the future of classifier ensembles? (Are they here to stay or are they a mere phase?) 2.In what direction(s) will they evolve/dissolve? 3.What will be the ‘classifier of the future’? Or the ‘classification paradigm of the future’? 4.And one last question: How can we get a handle of the ever growing scientific literature in each and every area? How can we find the gems among the pile of stones?


Download ppt "Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK"

Similar presentations


Ads by Google