Download presentation

Presentation is loading. Please wait.

Published byLucas Durk Modified over 2 years ago

1
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting Feb 18, 2008 10-601 Machine Learning Thanks to Citeseer and : A Short Introduction to Boosting. Yoav Freund, Robert E. Schapire, Journal of Japanese Society for Artificial Intelligence,14(5):771-780, September, 1999

2
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Valiant CACM 1984 and PAC- learning: partly inspired by Turing FormalInformal AIValiant (1984) Turing test (1950) ComplexityTuring machine (1936) 1936 - T Question: what sort of AI questions can we formalize and study with formal methods ?

3
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … “Weak” pac-learning (Kearns & Valiant 88) (PAC learning) say, ε=0.49

4
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … “Weak” PAC-learning is equivalent to “strong” PAC-learning (!) (Schapire 89) (PAC learning) say, ε=0.49 =

5
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … “Weak” PAC-learning is equivalent to “strong” PAC-learning (!) (Schapire 89) The basic idea exploits the fact that you can learn a little on every distribution: –Learn h1 from D0 with error < 49% –Modify D0 so that h1 has error 50% (call this D1) Flip a coin; if heads wait for an example where h1(x)=f(x), otherwise wait for an example where h1(x)!=f(x). –Learn h2 from D1 with error < 49% –Modify D1 so that h1 and h2 always disagree (call this D2) –Learn h3 from D2 with error <49%. –Now vote h1,h2, and h3. This has error better than any of the “weak” hypotheses. –Repeat this as needed to lower the error rate more….

6
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting can actually help experimentally…but… (Drucker, Schapire, Simard) The basic idea exploits the fact that you can learn a little on every distribution: –Learn h1 from D0 with error < 49% –Modify D0 so that h1 has error 50% (call this D1) Flip a coin; if heads wait for an example where h1(x)=f(x), otherwise wait for an example where h1(x)!=f(x). –Learn h2 from D1 with error < 49% –Modify D1 so that h1 and h2 always disagree (call this D2) –Learn h3 from D2 with error <49%. –Now vote h1,h2, and h3. This has error better than any of the “weak” hypotheses. –Repeat this as needed to lower the error rate more….

7
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … AdaBoost: Adaptive Boosting (Freund & Schapire, 1995) Theoretically, one can upper bound an upper bound on the training error of boosting.

8
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting improved decision trees…

9
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting single features performed well…

10
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting didn’t seem to overfit…(!)

11
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting is closely related to margin classifiers like SVM, voted perceptron, … (!)

12
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting and optimization 1999 - FHT Jerome Friedman, Trevor Hastie and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 2000. Compared using AdaBoost to set feature weights vs direct optimization of feature weights to minimize log- likelihood, squared error, …

13
1984 - V 1988 - KV 1989 - S 1993 - DSS 1995 - FS 1950 - T … … Boosting in the real world William’s wrap up: –Boosting is not discussed much in the ML research community any more It’s much too well understood –It’s really useful in practice as a meta-learning method Eg, boosted Naïve Bayes usually beats Naïve Bayes –Boosted decision trees are almost always competitive with respect to accuracy very robust against rescaling numeric features, extra features, non-linearities, … somewhat slower to learn and use than many linear classifiers But getting probabilities out of them is a little less reliable.

Similar presentations

OK

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on christian leadership Ppt on 555 timer projects Ppt on merger and amalgamation companies act 2013 Ppt on council of ministers sudan Ppt on rational numbers Free ppt on smart note taker ppt Ppt on e-sales order processing system Ppt on sets theory Ppt on standing order act scores A ppt on leadership