E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

On-line learning and Boosting
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Boosting Rong Jin.
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
CMPUT 466/551 Principal Source: CMU
AdaBoost & Its Applications
Longin Jan Latecki Temple University
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
2D1431 Machine Learning Boosting.
Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Examples of Ensemble Methods
Machine Learning: Ensemble Methods
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
ECE 8443 – Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML and Bayesian Model Comparison Combining Classifiers Resources: MN:
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.
1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Boosting ---one of combining models Xin Li Machine Learning Course.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Reading: R. Schapire, A brief introduction to boosting
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
A “Holy Grail” of Machine Learing
Data Mining Practical Machine Learning Tools and Techniques
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Introduction to Data Mining, 2nd Edition
Ensemble learning.
Model Combination.
Presentation transcript:

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

E NSEMBLE L EARNING A machine learning paradigm where multiple learners are used to solve the problem Proble m …... Proble m Learner Previously: Ensemble: The generalization ability of the ensemble is usually significantly better than that of an individual learner Boosting is one of the most important families of ensemble methods

3 Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) A B RIEF H ISTORY Resampling for estimating statistic Resampling for classifier design

B OOTSTRAP E STIMATION Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

B AGGING - A GGREGATE B OOTSTRAPPING For i = 1.. M Draw n * <n samples from D with replacement Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

B AGGING f1f1 f2f2 fTfT ML f Random sample with replacement Random sample with replacement

B OOSTING Training Sample Weighted Sample fTfT f1f1 … f2f2 f ML

R EVISIT B AGGING

B OOSTING C LASSIFIER

B AGGING VS B OOSTING Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods. Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.

B OOSTING (S CHAPIRE 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1 Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2 Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on Train weak learner C 3 Final classifier is vote of weak learners

A DA B OOST (S CHAPIRE 1995) Instead of sampling, re-weight Previous weak learner has only 50% accuracy over new distribution Can be used to learn weak classifiers Final classification based on weighted vote of weak classifiers

A DABOOST T ERMS Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs

AdaBoost Adaptive A learning algorithm Building a strong classifier a lot of weaker ones Boosting

A DA B OOST C ONCEPT weak classifiers slightly better than random strong classifier

W EAKER C LASSIFIERS weak classifiers slightly better than random strong classifier Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected How to – define features? – select beneficial features? – train weak classifiers? – manage (weight) training samples? – associate weight to each weak classifier?

T HE S TRONG C LASSIFIERS weak classifiers slightly better than random strong classifier How good the strong one will be?

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution:

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier:

B OOSTING ILLUSTRATION Weak Classifier 1

B OOSTING ILLUSTRATION Weights Increased

T HE A DA B OOST A LGORITHM typicallywhere the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set where

B OOSTING ILLUSTRATION Weak Classifier 2

B OOSTING ILLUSTRATION Weights Increased

B OOSTING ILLUSTRATION Weak Classifier 3

B OOSTING ILLUSTRATION Final classifier is a combination of weak classifiers

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach?

T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach? They are goal dependent.

G OAL Minimize exponential loss Final classifier:

G OAL Minimize exponential loss Final classifier: Maximize the margin yH(x)

G OAL Final classifier: Minimize Definewith Then,

Final classifier: Minimize Definewith Then, Set 0

Final classifier: Minimize Definewith Then, 0

with Final classifier: Minimize Define Then, 0

with Final classifier: Minimize Define Then, 0

with Final classifier: Minimize Define Then,

with Final classifier: Minimize Define Then,

with Final classifier: Minimize Define Then, maximized when

with Final classifier: Minimize Define Then, At time t

with Final classifier: Minimize Define Then, At time t At time 1 At time t+1

41 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers

I NTUITION Train a set of weak hypotheses: h 1, …., h T. The combined hypothesis H is a weighted majority vote of the T weak hypotheses.  Each hypothesis h t has a weight α t. During the training, focus on the examples that are misclassified.  At round t, example x i has the weight D t (i).

B ASIC S ETTING Binary classification problem Training data: D t (i): the weight of x i at round t. D 1 (i)=1/m. A learner L that finds a weak hypothesis h t : X  Y given the training set and D t The error of a weak hypothesis h t :

T HE BASIC A DA B OOST ALGORITHM For t=1, …, T Train weak learner using training data and D t Get ht: X  {-1,1} with error Choose Update

T HE GENERAL A DA B OOST ALGORITHM

46 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers