For Better Accuracy Eick: Ensemble Learning

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Longin Jan Latecki Temple University
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Model Evaluation Metrics for Performance Evaluation
Ensemble Learning what is an ensemble? why use an ensemble?
Ensemble Learning: An Introduction
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Examples of Ensemble Methods
Machine Learning: Ensemble Methods
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Issues with Data Mining
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.
CLASSIFICATION: Ensemble Methods
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning with AdaBoost
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Boosting ---one of combining models Xin Li Machine Learning Course.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
Trees, bagging, boosting, and stacking
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
A “Holy Grail” of Machine Learing
INTRODUCTION TO Machine Learning
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Ensembles.
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Data Mining Ensembles Last modified 1/9/19.
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

For Better Accuracy Eick: Ensemble Learning Ensembles—Combining Multiple Learners For Better Accuracy Reading Material Ensembles: http://www.scholarpedia.org/article/Ensemble_learning (optional) http://en.wikipedia.org/wiki/Ensemble_learning http://en.wikipedia.org/wiki/Adaboost Rudin video(optional): http://videolectures.net/mlss05us_rudin_da/ You do not need to read the textbook, as long you understand the transparencies!

Rationale No Free Lunch Theorem: There is no algorithm that is always the most accurate http://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization Generate a group of base-learners which when combined has higher accuracy Each algorithm makes assumptions which might be or not be valid for the problem at hand or not. Different learners use different Algorithms Hyperparameters Representations (Modalities) Training sets Subproblems Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

General Idea Eick: Ensemble Learning

Fixed Combination Rules Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Why does it work? Suppose there are 25 base classifiers Each classifier has error rate,  = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction: Eick: Ensemble Learning

Why are Ensembles Successful? Bayesian perspective: If dj are independent Bias does not change, variance decreases by L If dependent, error increase with positive correlation

What is the Main Challenge for Developing Ensemble Models? The main challenge is not to obtain highly accurate base models, but rather to obtain base models which make different kinds of errors. For example, if ensembles are used for classification, high accuracies can be accomplished if different base models misclassify different training examples, even if the base classifier accuracy is low. Independence between two base classifiers can be assessed in this case by measuring the degree of overlap in training examples they misclassify (|AB|/|AB|)—more overlap means less independence between two models. Eick: Ensemble Learning

Bagging Use bootstrapping to generate L training sets and train one base-learner with each (Breiman, 1996) Use voting (Average or median with regression) Unstable algorithms profit from bagging Eick: Ensemble Learning

Bagging Sampling with replacement Build classifier on each bootstrap sample Each sample has probability (1 – 1/n)n of being selected Eick: Ensemble Learning

Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of boosting round Eick: Ensemble Learning

Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds Eick: Ensemble Learning

Basic AdaBoost Loop D1= initial dataset with equal weights FOR i=1 to k DO Learn new classifier Ci; Compute ai (classifier’s importance); Update example weights; Create new training set Di+1 (using weighted sampling) END FOR Construct Ensemble which uses Ci weighted by ai (i=1,k) Eick: Ensemble Learning

Example: AdaBoost Base classifiers: C1, C2, …, CT Error rate (weights add up to 1): Importance of a classifier: Eick: Ensemble Learning

Example: AdaBoost Weight update: http://en.wikipedia.org/wiki/Adaboost Example: AdaBoost Increase weight if misclassification; Increase is proportional to classifiers Importance. Weight update: If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the re-sampling procedure is repeated Classification (aj is a classifier’s importance for the whole dataset): Eick: Ensemble Learning

Illustrating AdaBoost Initial weights for each data point Data points for training Video Containing Introduction to AdaBoost: http://videolectures.net/mlss05us_rudin_da/ Eick: Ensemble Learning

Illustrating AdaBoost Eick: Ensemble Learning

Mixture of Experts Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Stacking Combiner f () is another learner (Wolpert, 1992) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Cascading Use dj only if preceding ones are not confident Cascade learners in order of complexity Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Summary Ensemble Learning Ensemble approaches use multiple models in their decision making. They frequently accomplish high accuracies, are less likely to over-fit and exhibit a low variance. They have been successfully used in the Netflix contest and for other tasks. However, some research suggest that they are sensitive to noise (http://www.phillong.info/publications/LS10_potential.pdf ). The key of designing ensembles is diversity and not necessarily high accuracy of the base classifiers: Members of the ensemble should vary in the examples they misclassify. Therefore, most ensemble approaches, such as AdaBoost, seek to promote diversity among the models they combine. The trained ensemble represents a single hypothesis. This hypothesis, however, is not necessarily contained within the hypothesis space of the models from which it is built. Thus, ensembles can be shown to have more flexibility in the functions they can represent. Example: http://www.scholarpedia.org/article/Ensemble_learning Current research on ensembles centers on: more complex ways to combine models, understanding the convergence behavior of ensemble learning algorithms, parameter learning, understanding over-fitting in ensemble learning, characterization of ensemble models, sensitivity to noise. Eick: Ensemble Learning