AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)

Slides:



Advertisements
Similar presentations
Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Advertisements

Data Mining and Machine Learning
Boosting Rong Jin.
A D ECISION -T HEORETIC G ENERALIZATION OF O N -L INE L EARNING AND AN A PPLICATION TO B OOSTING By Yoav Freund and Robert E. Schapire Presented by David.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
Boosting Approach to ML
AdaBoost & Its Applications
Longin Jan Latecki Temple University
Face detection Many slides adapted from P. Viola.
Review of : Yoav Freund, and Robert E
Ensemble Learning what is an ensemble? why use an ensemble?
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Robust Real-Time Object Detection Paul Viola & Michael Jones.
Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS Presentation
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Face Detection using the Viola-Jones Method
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
CS 391L: Machine Learning: Ensembles
A D ECISION -T HEORETIC G ENERALIZATION OF O N -L INE L EARNING AND AN A PPLICATION TO B OOSTING By Yoav Freund and Robert E. Schapire (Updated from ICDM.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at
BOOSTING David Kauchak CS451 – Fall Admin Final project.
Ensemble Methods: Bagging and Boosting
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Learning with AdaBoost
Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
1 CHUKWUEMEKA DURUAMAKU.  Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Ensemble Methods in Machine Learning
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Boosting ---one of combining models Xin Li Machine Learning Course.
Face detection Many slides adapted from P. Viola.
Hand Detection with a Cascade of Boosted Classifiers Using Haar-like Features Qing Chen Discover Lab, SITE, University of Ottawa May 2, 2006.
AdaBoost Algorithm and its Application on Object Detection Fayin Li.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Machine Learning: Ensemble Methods
HW 2.
Reading: R. Schapire, A brief introduction to boosting
2. Skin - color filtering.
The Boosting Approach to Machine Learning
Boosting and Additive Trees
COMP61011 : Machine Learning Ensemble Models
The Boosting Approach to Machine Learning
Adaboost Team G Youngmin Jun
Data Mining Practical Machine Learning Tools and Techniques
A New Boosting Algorithm Using Input-Dependent Regularizer
Introduction to Boosting
Ensemble learning.
Model Combination.
Lecture 29: Face Detection Revisited
Presentation transcript:

AdaBoost Robert E. Schapire (Princeton University) Yoav Freund (University of California at San Diego) Presented by Zhi-Hua Zhou (Nanjing University)

Ensemble Learning A machine learning paradigm where multiple learners are used to solve the problem Problem …... Problem Learner Previously: Ensemble: The generalization ability of the ensemble is usually significantly better than that of an individual learner Boosting is one of the most important families of ensemble methods

Boosting Significant advantageous:  Solid theoretical foundation  Very accurate prediction  Very simple (“just 10 lines of code” [R. Schapire] )  Wide and successful applications  … … R. Schapire and Y. Freund won the 2003 Godel Prize (one of the most prestigious awards in theoretical computer science) Prize winning paper (which introduced AdaBoost): "A decision theoretic generalization of on-line learning and an application to Boosting,“ Journal of Computer and System Sciences, 1997, 55:

How was AdaBoost born? In 1988, M. Kearns and L.G. Valiant posed an interesting question:  Whether a “weak” learning algorithm that performs just slightly better than random guess can be “boosted” into an arbitrarily accurate “strong” learning algorithm Or in other words, whether two complexity classes, “weakly learnable” and “strongly learnable” problems, were equal

How was AdaBoost born ? (con’t) In R. Schapire’s MLJ90 paper, Rob said “yes” and gave a proof to the question. The proof is a construction, which is the first Boosting algorithm Then, in Y. Freund’s Phd thesis (1993), Yoav gave a scheme of combining weak learner by majority voting But, these algorithms are not very practical Later, at AT&T Bell Labs, they published the 1997 paper (in fact the work was done in 1995), which proposed the AdaBoost algorithm, a practical algorithm

The AdaBoost Algorithm From [R. Schapire, NE&C03] typically where the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set

An Easy Flow Data set 1 Data set 2 Data set T Learner 1 Learner 2 Learner T …... training instances that are wrongly predicted by Learner 1 will play more important roles in the training of Learner 2 weighted combination Original training set

Theoretical Properties Y. Freund and R. Schapire [JCSS97] have proved that the training error of AdaBoost is bounded by: where Thus, if each base classifier is slightly better than random so that for some, then the training error drops exponentially fast in T since the above bound is at most

Theoretical Properties (con’t) Y. Freund and R. Schapire [JCSS97] have tried to bound the generalization error as: where denotes empirical probability on training sample, s is the sample size, d is the VC-dim of base learner The above bounds suggest that Boosting will overfit if T is large. However, empirical studies show that Boosting often does not overfit R. Schapire et al. [AnnStat98] gave a margin-based bound: for any  > 0 with high probability where

Application AdaBoost and its variants have been applied to diverse domains with great success. Here I only show one example P. Viola & M. Jones [CVPR’01] combined AdaBoost with a cascade process for face detection They regarded rectangular features as weak classifiers

Application (con’t) By using AdaBoost to weight the weak classifiers, they got two very intuitive features for face detection In order to get high accuracy as well as high efficiency, they used a cascade process (which is beyond the scope of this talk) Finally, a very strong face detector: On a 466MHz SUN machine, 384  288 image costed only seconds! (in average, only 8 features needed to be evaluated per image)

Application (con’t) A result of Viola & Jones

Application (con’t) Comparable accuracy, but 15 times faster than state- of-the-art of face detectors (at that time) The Viola-Jones detector has been recognized as one of the most exciting breakthrough in computer vision (in particular, face detection) during the past ten years. It is the most popularly used face-detector by far “Boosting” has become a buzzword in computer vision

Interesting problems Here I only list two (of course there are more) : Theory-oriented:  Why Boosting often does not overfit? Application-oriented:  AdaBoost-based feature selection

Interesting problems: Why Boosting does not overfit? Many researchers have studied this and several theoretical explanations have been given, but no one has convinced others The margin theory of Boosting (see pp.9) is particularly interesting  If it succeeds, a strong connection of Boosting and SVM can be found.  But L. Breiman [NCJ99] indicated that larger margin does not necessarily mean better generalization This almost sentenced the margin theory of Boosting to death

Interesting problems: Why Boosting does not overfit? (con’t) A favorable turn appears:  L. Reyzin & R. Schapire [ICML’06 best paper] found that L. Breiman considered minimum margin instead of average or median margin … Can the margin theory of Boosting survive?

Thanks ! Applause goes to R. Schapire & Y. Freund