Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.

Slides:



Advertisements
Similar presentations
On-line learning and Boosting
Advertisements

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Data Mining and Machine Learning
Boosting Rong Jin.
ICML Linear Programming Boosting for Uneven Datasets Jurij Leskovec, Jožef Stefan Institute, Slovenia John Shawe-Taylor, Royal Holloway University.
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Boosting CMPUT 615 Boosting Idea We have a weak classifier, i.e., it’s error rate is a little bit better than 0.5. Boosting combines a lot of such weak.
Sparse vs. Ensemble Approaches to Supervised Learning
2D1431 Machine Learning Boosting.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Examples of Ensemble Methods
Bayesian Learning Rong Jin.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
A speech about Boosting Presenter: Roberto Valenti.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
CS 391L: Machine Learning: Ensembles
Lecture 7 Ensemble Algorithms MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University of South.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Boosting ---one of combining models Xin Li Machine Learning Course.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Boosting and Additive Trees (2)
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
A “Holy Grail” of Machine Learing
A New Boosting Algorithm Using Input-Dependent Regularizer
Introduction to Data Mining, 2nd Edition
Model Combination.
Recitation 10 Oznur Tastan
Presentation transcript:

Boosting Rong Jin

Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example has equal chance to be sampled No distinction between “easy” examples and “difficult” examples Inefficiency with model combination A constant weight for each classifier No distinction between accurate classifiers and inaccurate classifiers

Improve the Efficiency of Bagging  Better sampling strategy Focus on the examples that are difficult to classify correctly  Better combination strategy Accurate model should be assigned with more weights

Intuition: Education in China Training Examples X1Y1X1Y1 X2Y2X2Y2 X3Y3X3Y3 X4Y4X4Y4 Mistakes X1Y1X1Y1 X3Y3X3Y3 Classifier1 Classifier2 Mistakes X1Y1X1Y1 + Classifier3  No training mistakes !!  May overfitting to training data !! +

AdaBoost Algorithm

AdaBoost Example:  t =ln2 x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 1/5 D0:D0: x 5, y 5 x 3, y 3 x 1, y 1 Sample h1h1 Training 2/71/72/7 1/7 D1:D1: x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5 Update Weights h1h1   Sample x 3, y 3 x 1, y 1 h2h2 Training x 1, y 1 x 2, y 2 x 3, y 3 x 4, y 4 x 5, y 5    3/5h 1 + 2/5h 2 Update Weights 2/91/94/9 1/9 D2:D2: Sample …

How To Choose  t in AdaBoost?  Problem with constant weight  t No distinguish between accurate classifiers and inaccurate classifiers  Consider how to construct the best distribution D t+1 (i) given D t (i) and h t 1. D t+1 (i) should be significantly differen from D t (i) 2. D t+1 (i) should create a situation that classifier h t performs poorly

Optimization View for Choosing  t  h t (x): x  {1,-1}; a basis (weak) classifier  H T (x): a linear combination of basic classifiers  Goal: minimize training error  Approximate the training error with a exponential function

AdaBoost: A Greedy Approach to Optimize the Exponential Function Exponential cost function Use the inductive form H T (x)=H T-1 (x)+  T h T (x) Minimize the exponential function  Data points that h T (x) predict correctly Data points that h T (x) predict incorrectly AdaBoost is a greedy approach  overfitting ? Empirical studies show that AdaBoost is robust in general AdaBoost tends to overfit with noisy data

Empirical Study of AdaBoost  AdaBoosting decision trees Generate 50 decision trees through the AdaBoost procedure Linearly combine decision trees using the weights computed by the AdaBoost Algorithm  In general: AdaBoost = Bagging > C4.5 AdaBoost usually needs less number of classifiers than Bagging

Bia-Variance Tradeoff for AdaBoost  AdaBoost can reduce both model variance and model bias single decision tree Bagging decision tree bias variance AdaBoosting decision trees