Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK

Slides:



Advertisements
Similar presentations
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Advertisements

Introduction to Ensemble Learning Featuring Successes in the Netflix Prize Competition Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Ensemble Learning what is an ensemble? why use an ensemble?
MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?
Ensemble Learning: An Introduction
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Three kinds of learning
Bayesian Learning Rong Jin.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning: Ensemble Methods
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
Classifier Ensembles Ludmila Kuncheva School of Computer Science Bangor University Part 2 1.
Machine Learning CS 165B Spring 2012
AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Are we still talking about diversity in classifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK.
Are we still talking about diversity in classifier ensembles? Ludmila I Kuncheva School of Computer Science Bangor University, UK.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Benk Erika Kelemen Zsolt
1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Ensemble Based Systems in Decision Making Yu-Mei, Chang Speech Lab, CSIE National Taiwan Normal University Robi Polikar Third Quarter 2006 IEEE circuits.
CLASSIFICATION: Ensemble Methods
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Ensemble Methods in Machine Learning
Konstantina Christakopoulou Liang Zeng Group G21
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Bagging and Random Forests
Ensembles (Bagging, Boosting, and all that)
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Neuro-Computing Lecture 5 Committee Machine
ECE 5424: Introduction to Machine Learning
A “Holy Grail” of Machine Learing
Combining Base Learners
Introduction to Data Mining, 2nd Edition
Introduction to Boosting
Ensembles.
Ensemble learning.
Ensemble learning Reminder - Bagging of Trees Random Forest
Data Mining Ensembles Last modified 1/9/19.
Ch13. Ensemble method (draft)
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classifier ensembles: Does the combination rule matter? Ludmila Kuncheva School of Computer Science Bangor University, UK

classifier feature values (object description) classifier class label combiner classifier ensemble

Congratulations! The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences. On September 21, 2009 we awarded the $1M Grand Prize to team “BellKor’s Pragmatic Chaos”. Read about their algorithm, checkout team scores on the Leaderboard, and join the discussions on the Forum.their algorithmLeaderboard Forum We applaud all the contributors to this quest, which improves our ability to connect people to the movies they love. classifier feature values (object description) classifier class label combiner classifier ensemble

cited 7194 times by 28 July 2013 (Google Scholar) classifier feature values (object description) classifier class label combiner classifier ensemble

Saso Dzeroski David Hand S. Dzeroski, and B. Zenko. (2004) Is combining classifiers better than selecting the best one? Machine Learning, 54, David J. Hand (2006) Classifier technology and the illusion of progress, Statist. Sci. 21 (1), Classifier combination? Hmmmm….. We are kidding ourselves; there is no real progress in spite of ensemble methods. Chances are that the single best classifier will be better than the ensemble.

Quo Vadis? "combining classifiers" OR "classifier combination" OR "classifier ensembles" OR "ensemble of classifiers" OR "combining multiple classifiers" OR "committee of classifiers" OR "classifier committee" OR "committees of neural networks" OR "consensus aggregation" OR "mixture of experts" OR "bagging predictors" OR adaboost OR (( "random subspace" OR "random forest" OR "rotation forest" OR boosting) AND "machine learning")

Gartner’s Hype Cycle: a typical evolution pattern of a new technology Where are we?...

(6) IEEE TPAMI = IEEE Transactions on Pattern Analysis and Machine Intelligence IEEE TSMC = IEEE Transactions on Systems, Man and Cybernetics JASA = Journal of the American Statistical Association IJCV = International Journal of Computer Vision JTB = Journal of Theoretical Biology (2) PPL = Protein and Peptide Letters JAE = Journal of Animal Ecology PR = Pattern Recognition (4) ML = Machine Learning NN = Neural Networks CC = Cerebral Cortex top cited paper is from… application paper

International Workshop on Multiple Classifier Systems 2000 – continuing

Combiner Features Classifier 2Classifier 1Classifier L… Data set A Combination level selection or fusion? voting or another combination method? trainable or non-trainable combiner? B Classifier level same or different classifiers? decision trees, neural networks or other? how many? C Feature level all features or subsets of features? random or selected subsets? D Data level independent/dependent bootstrap samples? selected data sets? Levels of questions

Number of classifiers L 1 The perfect classifier 3-8 classifiers heterogeneous trained combiner (stacked generalisation) 100+ classifiers same model non-trained combiner (bagging, boosting, etc.)  Large ensemble of nearly identical classifiers - REDUNDANCY  Small ensembles of weak classifiers - INSUFFICIENCY ? ? Must engineer diversity… Strength of classifiers How about here? classifiers same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT?

Number of classifiers L 1 The perfect classifier 3-8 classifiers heterogeneous trained combiner (stacked generalisation) 100+ classifiers same model non-trained combiner (bagging, boosting, etc.)  Large ensemble of nearly identical classifiers - REDUNDANCY  Small ensembles of weak classifiers - INSUFFICIENCY Must engineer diversity… Strength of classifiers classifiers same or different models? trained or non-trained combiner? selection or fusion? IS IT WORTH IT? Diversity is absolutely CRUCIAL! Diversity is pretty impossible…

Label outputsContinuous-valued outputs x x Decision profile

Ensemble (label outputs, R,G,B) 204 R 102 G 54 B Red Blue Red Green Red Majority vote

Ensemble (label outputs, R,G,B) 200 R 219 G 190 B Red Blue Red Green Red Majority vote Green Weighted Majority vote

Ensemble (label outputs, R,G,B) Red Blue Red Green Red RBRRGR Classifier Green

Ensemble (continuous outputs, [R,G,B]) [ ] [ ] [ ] [ ] [0 1 0] [ ]

Ensemble (continuous outputs, [R,G,B]) [ ] [ ] [ ] [ ] [0 1 0] [ ] Mean R = 0.45

Ensemble (continuous outputs, [R,G,B]) [ ] [ ] [ ] [ ] [0 1 0] [ ] Mean R = 0.45 Mean G = 0.48

Ensemble (continuous outputs, [R,G,B]) [ ] [ ] [ ] [ ] [0 1 0] [ ] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN

Ensemble (continuous outputs, [R,G,B]) [ ] [ ] [ ] [ ] [0 1 0] [ ] Mean R = 0.45 Mean G = 0.48 Mean B = 0.35 Class GREEN Decision profile

Decision profile classes classifiers Support that classifier #4 gives to the hypothesis that the object to classify comes from class #3. Would be nice if these were probability distributions...

Decision profile classes classifiers … We can take probability outputs from the classifiers

Combination Rules For label outputs For continuous-valued outputs Majority (plurality) vote Weighted majority vote Naïve Bayes BKS A classifier Simple rules: minimum, maximum, product, average (sum) c Regressions A classifier

Combination Rules For label outputs For continuous-valued outputs Majority (plurality) vote Weighted majority vote Naïve Bayes BKS A classifier Simple rules: minimum, maximum, product, average (sum) c Regressions A classifier Decision profile

classifier feature values (object description) classifier class label combiner classifier ensemble

classifier feature values (object description) classifier class label classifier classifier ensemble

Bob Duin: The Combining Classifier: to Train or Not to Train?

Tin Ho: “Multiple Classifier Combination: Lessons and Next Steps”, 2002 “Instead of looking for the best set of features and the best classifier, now we look for the best set of classifiers and then the best combination method. One can imagine that very soon we will be looking for the best set of combination methods and then the best way to use them all. If we do not take the chance to review the fundamental problems arising from this challenge, we are bound to be driven into such an infinite recurrence, dragging along more and more complicated combination schemes and theories and gradually losing sight of the original problem.”

Classifier ensembles: Does the combination rule matter? In a word, yes. But its merit depends upon the base classifier model, the training of the individual classifiers, the diversity, the possibility to train the combiner, and more. Conclusions - 1

1.The choice of the combiner should not be side-lined. 2.The combiner should be chosen in relation to the rest of the ensemble and the available data. Conclusions - 2

Questions to you: 1.What is the future of classifier ensembles? (Are they here to stay or are they a mere phase?) 2.In what direction(s) will they evolve/dissolve? 3.What will be the ‘classifier of the future’? Or the ‘classification paradigm of the future’? 4.And one last question: How can we get a handle of the ever growing scientific literature in each and every area? How can we find the gems among the pile of stones?