Presentation is loading. Please wait.

Presentation is loading. Please wait.

A speech about Boosting Presenter: Roberto Valenti.

Similar presentations


Presentation on theme: "A speech about Boosting Presenter: Roberto Valenti."— Presentation transcript:

1 A speech about Boosting Presenter: Roberto Valenti

2 The Paper* *R.Schapire. The boosting approach to Machine Learning An Overview, 2001

3 I want YOU… …TO UNDERSTAND

4 Overview Introduction Adaboost –How Does it work? –Why does it work? –Demo –Extensions –Performance & Applications Summary & Conclusions Questions

5 Introduction to Boosting Let’s start

6 Introduction An example of Machine Learning: Spam classifier Highly accurate rule: difficult to find Inaccurate rule: ”BUY NOW” Introducing Boosting: “An effective method of producing an accurate prediction rule from inaccurate rules”

7 Introduction History of boosting: –1989: Schapire First provable polynomial time boosting –1990: Freund Much more efficient, but practical drawbacks –1995: Freund & Schapire Adaboost: Focus of this Presentation –…–…

8 Introduction The Boosting Approach –Lots of Weak Classifiers –One Strong Classifier Boosting key points: –Give importance to misclassified data –Find a way to combine weak classifiers in general rule.

9 Adaboost How does it work?

10 Adaboost – How does it work?

11 Base Learner Job: –Find a base Hypothesis: –Minimize the error: Choose  t

12 Adaboost – How does it work?

13 Adaboost Why does it work?

14 Adaboost – Why does it work? Basic property: reduce the training error On binary Distributions:  t Training error bounded by: Is at most e -2T   ->drops exponentially!

15 Generalization Error bounded by: –T= number of iterations –m=sample size –d= Vapnik-Chervonenkis dimension 2 –Pr [.]= empirical probability –Õ = Logarithmic and constant factors Overfitting in T! Adaboost – Why does it work?

16 Margins of the training examples margin(x,y)= Positive only if correctly classified by H Confidence in prediction: Qualitative Explanation of Effectiveness –Not Quantitative.

17 Adaboost – Other View Adaboost as a zero-sum Game –Game matrix M –Row Player: Adaboost –Column Player: Base Learner –Row player plays rows with distribution P –Column player plays with distribution Q –Expected Loss: P T MQ Play a Repeated game Matrix

18 Adaboost – Other View Von Neumann’s minmax theorem: If exist a classifier with  Then exist a combination of base classifiers with margin > 2  Adaboost has potential of success Relations with Linear Programming and Online Learning

19 Adaboost Demo

20 Demo

21 Adaboost Extensions

22 Adaboost - Extensions History of Boosting: –…–… –1997: Freund & Schapire Adaboost.M1 –First Multiclass Generalization –Fails if weak learner achieves less than 50% Adaboost.M2 –Creates a set of binary problems –For x, better l1 or l2? –1999: Schapire & Singer Adaboost.MH –For x, better l1 or one of the others?

23 Adaboost - Extensions –2001: Rochery, Schapire et al. Incorporating Human Knowledge Adaboost is data-driven Human Knowledge can compensate lack of data Human expert: –Chose rule p mapping x to p(x) Є [0,1] –Difficult! –Simple rules should work..

24 Adaboost - Extensions To incorporate human knowledge Where RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))

25 Adaboost Performance and Applications

26 Adaboost - Performance & Applications Error Rates on Text categorization Reuters newswire articlesAP newswire headlines

27 Adaboost - Performance & Applications Six Class Text Classification (TREC) Training Error Test Error

28 Adaboost - Performance & Applications “How may I help you” Spoken Language Classification “Help desk”

29 Adaboost - Performance & Applications class, label1/weight1,label2/weight2 OCR: Outliers Rounds: 12 25 4

30 Adaboost - Applications Text filtering –Schapire, Singer, Singhal. Boosting and Rocchio applied to text filtering.1998 Routing –Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000 “Ranking” problems –Freund, Iyer, Schapire, Singer. An efficient boostingalgorithm for combining preferences.1998 Image retrieval –Tieu, Viola. Boosting image retrieval.2000 Medical diagnosis –Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001

31 Adaboost - Applications Learning problems in natural language processing –Abney, Schapire, Singer. Boosting applied to tagging and PP attachment.1999 –Collins. Discriminative reranking for natural language parsing.2000 –Escudero, Marquez, Rigau. Boosting applied to word sense disambiguation.2000 –Haruno, Shirai, Ooyama. Using decision trees to construct a practical parser.1999 –Moreno, Logan, Raj. A boosting approach for confidence scoring.2001 –Walker, Rambow, Rogati. SPoT: A trainable sentence planner.2001

32 Summary and Conclusions At last…

33 Summary Boosting takes a weak learner and converts it to a strong one Works by asymptotically minimizing the training error Effectively maximizes the margin of the combined hypothesis Adaboost is related to other many topics It Works!

34 Conclusions Adaboost advantages: –Fast, simple and easy to program –No parameter required Performance Dependency: –(Skurichina, 2001) Boosting is only useful for large sample size. –Choice of weak classifier –Incorporation of classifier weights –Data distribution

35 Questions ? (don’t be mean)


Download ppt "A speech about Boosting Presenter: Roberto Valenti."

Similar presentations


Ads by Google