Download presentation
Presentation is loading. Please wait.
Published byCorey Rice Modified over 9 years ago
1
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda, Hart, Stork.
2
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 2. Boosting Basics. Suppose we have weak classifiers which work slightly better than change. Can we combine these classifiers to make a strong classifier? This is Boosting. History. It was developed by Computer Scientists as an example of Valiant’s version of PAC learning theory. But it now has a statistical interpretation.
3
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 3. Basic AdaBoost algorithm Data Set of weak classifiers Assign weights to the data Initialize Update rule: set Set Strong Classifier:
4
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 4. Basic AdaBoost Algorithm Intuition. Update rule for weights D’s gives greater weight to data x which has been misclassified. E,g. Our estimate of the strong classifier after 1 iteration is So weight increases for i (misclassified) and decreases for i if (correctly classified). In general Strong classifier is
5
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 5. Basic AdaBoost Algorithm Z_t is an estimate of the error of the new weak classifer h_t(.) and its weight over the weighted samples. Recall is obtained by minimizing wrt and The choice of weak classifier. So each iteration, selects a new weak classifier and weight to minimize an error of the weighted samples. Then reweigh the samples to emphasize misclassified.
6
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 6. So Why Does It Work? We can write: AdaBoost is a greedy algorithm to minimize. Minimize to estimate Then minimize to estimate And so on.
7
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 7. So Why Does It Work? G(t_max) is an upper bound of the strong classification error So if we can ensure that Exponential convergence rate.
8
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 8. Details of Learning Minimize wrt For classifier h, let Then Minimizing wrt So to get we just need to find a classifier st.
9
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 9. AdaBoost Warning This results says we can keep on improving our classification rate as we add weak classifiers. But we will probably be memorizing not generalizing. Require validation or cross-validation to check that we are memorizing.. Theoretical Machine Learning results suggest that the critical factor is the size of the set of weak learners.
10
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 10. AdaBoost Example Face Detection in Natural Images (Viola and Jones). Dataset of face images and non-face images. E.g. x is a 32 x 32 image. Set of linear filters This defines a set of weak classifiers Apply AdaBoost. Extras: need to run the algorithm at different scales. Need to modify the algorithm to prune out image regions which easily non-faces.
11
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 11. Statistical Interpretation AdaBoost is motivated by minimizing the bound of the classification error rate. But Minimizing wrt
12
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 12. Statistical Interpretation As Hence we obtain: If the
13
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 13. Statistical Interpretation Hence Hence AdaBoost gives us the posterior distribution Of y conditioned on the data x. As In this limit,AdaBoost’s strong classifier becomes the log- likelihood ratio test.
14
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 14. Summary AdaBoost is a novel learning algorithm for binary classification. It works by combining weak classifiers and dynamically reweighting the data to emphasize misclassified. It converges exponentially fast under mild conditions. It has been applied successfully to face detection. It can also be given a statistical interpretation, as a way To estimate the posterior distributions of the classes conditioned On the data.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.