Ensemble learning Reminder - Bagging of Trees Random Forest

Slides:



Advertisements
Similar presentations
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
Advertisements

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Longin Jan Latecki Temple University
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Supervised learning in high-throughput data  General considerations  Dimension reduction with outcome variables  Classification models.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Bagging and Random Forests
Week 2 Presentation: Project 3
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Chapter 13 – Ensembles and Uplift
Trees, bagging, boosting, and stacking
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Ungraded quiz Unit 6.
A “Holy Grail” of Machine Learing
INTRODUCTION TO Machine Learning
Combining Base Learners
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Ensembles.
Ensemble learning.
Support Vector Machine _ 2 (SVM)
Model Combination.
Model generalization Brief summary of methods
Classification with CART
INTRODUCTION TO Machine Learning 3rd Edition
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Ensemble learning Reminder - Bagging of Trees Random Forest Boosting (1) Adaboost

Ensemble learning Aggregating a group of classifiers (“base classifiers”) as an ensemble committee and making the prediction by consensus. Weak learner ensembles (each base learner has high EPE, but is easy to train): Current Bioinformatics, 5, (4):296-308, 2010.

Ensemble learning Strong learner ensembles (“Stacking” and beyond): Current Bioinformatics, 5, (4):296-308, 2010.

Ensemble learning Why? Statistical A learning algorithm searches a space of hypotheses for the best fit to the data. With insufficient data (almost always), the algorithm can find many equally good solutions. Averaging reduces risk. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Ensemble learning Why? (2) Computational Modern learning algorithms represent complicated optimization problems. Often a search cannot guarantee global optimum. Ensemble can be seen as running the search from many starting points. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Ensemble learning Why? (3) Representational A true function may not be represented by any of the (group of) hypotheses. Ensemble expands the space of representable functions. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”

Reminder - Bootstrapping Directly assess uncertainty from the training data Basic thinking: assuming the data approaches true underlying density, re-sampling from it will give us an idea of the uncertainty caused by sampling

Bagging “Bootstrap aggregation.” Resample the training dataset. Build a prediction model on each resampled dataset. Average the prediction. It’s a Monte Carlo estimate of , where is the empirical distribution putting equal probability 1/N on each of the data points. Bagging only differs from the original estimate when f() is a non-linear or adaptive function of the data! When f() is a linear function, Tree is a perfect candidate for bagging – each bootstrap tree will differ in structure.

Bagging trees Bagged trees are of different structure.

Random Forest Bagging can be seen as a method to reduce variance of an estimated prediction function. It mostly helps high-variance, low-bias classifiers. Comparatively, boosting build weak classifiers one-by-one, allowing the collection to evolve to the right direction. Random forest is a substantial modification to bagging – build a collection of de-correlated trees. - Similar performance to boosting - Simpler to train and tune compared to boosting

Random Forest The intuition – the average of random variables. B i.i.d. random variables, each with variance The mean has variance B i.d. random variables, each with variance , with pairwise correlation , ------------------------------------------------------------------------------------- Bagged trees are similar to i.d. samples. Random forest aims at reducing the correlation to reduce variance. This is achieved by random selection of variables.

Random Forest

Random Forest Benefit of RF – out of bag (OOB) sample  cross validation error. For sample i, find its RF error from only trees built from samples where sample i did not appear. The OOB error rate is close to N-fold cross validation error rate. Unlike many other nonlinear estimators, RF can be fit in a single sequence. Stop growing forest when OOB error stabilizes.

Random Forest Variable importance – find the most relevant predictors. At every split of every tree, a variable contributed to the improvement of the impurity measure. Accumulate the reduction of i(N) for every variable, we have a measure of relative importance of the variables. The predictors that appears the most times at split points, and lead to the most reduction of impurity, are the ones that are important. ------------------ Another method – Permute the predictor values of the OOB samples at every tree, the resulting decrease in prediction accuracy is also a measure of importance. Accumulate it over all trees.

Random Forest

Random Forest Finding interactions between variables? Y=sin(2V2)+V52+V2V5+V8V9+|V9|

Boosting Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to do feature selection. Good generalization. Could fit noise.

Boosting Adaboost:

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire

Boosting

Boosting This is the weight of the current weak classifier in the final model. This weight is for individual observations. Notice it is stacked from step 1. If an observation is correctly classified at this step, its weight doesn’t change. If incorrectly classified, its weight increases.

Boosting

Boosting

Boosting

Boosting 10 predictors The weak classifier is a Stump: a two-level tree.