Presentation is loading. Please wait.

Presentation is loading. Please wait.

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

Similar presentations


Presentation on theme: "E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte."— Presentation transcript:

1 E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte

2 E NSEMBLE L EARNING A machine learning paradigm where multiple learners are used to solve the problem Proble m …... Proble m Learner Previously: Ensemble: The generalization ability of the ensemble is usually significantly better than that of an individual learner Boosting is one of the most important families of ensemble methods

3 3 Bootstrapping Bagging Boosting (Schapire 1989) Adaboost (Schapire 1995) A B RIEF H ISTORY Resampling for estimating statistic Resampling for classifier design

4 B OOTSTRAP E STIMATION Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

5 B AGGING - A GGREGATE B OOTSTRAPPING For i = 1.. M Draw n * <n samples from D with replacement Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

6 B AGGING f1f1 f2f2 fTfT ML f Random sample with replacement Random sample with replacement

7 B OOSTING Training Sample Weighted Sample fTfT f1f1 … f2f2 f ML

8 R EVISIT B AGGING

9 B OOSTING C LASSIFIER

10 B AGGING VS B OOSTING Bagging: the construction of complementary base-learners is left to chance and to the unstability of the learning methods. Boosting: actively seek to generate complementary base-learner--- training the next base-learner based on the mistakes of the previous learners.

11 B OOSTING (S CHAPIRE 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1 Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2 Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on Train weak learner C 3 Final classifier is vote of weak learners

12 A DA B OOST (S CHAPIRE 1995) Instead of sampling, re-weight Previous weak learner has only 50% accuracy over new distribution Can be used to learn weak classifiers Final classification based on weighted vote of weak classifiers

13 A DABOOST T ERMS Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs

14 AdaBoost Adaptive A learning algorithm Building a strong classifier a lot of weaker ones Boosting

15 A DA B OOST C ONCEPT...... weak classifiers slightly better than random strong classifier

16 W EAKER C LASSIFIERS...... weak classifiers slightly better than random strong classifier Each weak classifier learns by considering one simple feature T most beneficial features for classification should be selected How to – define features? – select beneficial features? – train weak classifiers? – manage (weight) training samples? – associate weight to each weak classifier?

17 T HE S TRONG C LASSIFIERS...... weak classifiers slightly better than random strong classifier How good the strong one will be?

18 T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution:

19 T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier:

20 B OOSTING ILLUSTRATION Weak Classifier 1

21 B OOSTING ILLUSTRATION Weights Increased

22 T HE A DA B OOST A LGORITHM typicallywhere the weights of incorrectly classified examples are increased so that the base learner is forced to focus on the hard examples in the training set where

23 B OOSTING ILLUSTRATION Weak Classifier 2

24 B OOSTING ILLUSTRATION Weights Increased

25 B OOSTING ILLUSTRATION Weak Classifier 3

26 B OOSTING ILLUSTRATION Final classifier is a combination of weak classifiers

27 T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach?

28 T HE A DA B OOST A LGORITHM Given: Initialization: For : Find classifier which minimizes error wrt D t,i.e., Weight classifier: Update distribution: Output final classifier: What goal the AdaBoost wants to reach? They are goal dependent.

29 G OAL Minimize exponential loss Final classifier:

30 G OAL Minimize exponential loss Final classifier: Maximize the margin yH(x)

31 G OAL Final classifier: Minimize Definewith Then,

32 Final classifier: Minimize Definewith Then, Set 0

33 Final classifier: Minimize Definewith Then, 0

34 with Final classifier: Minimize Define Then, 0

35 with Final classifier: Minimize Define Then, 0

36 with Final classifier: Minimize Define Then,

37 with Final classifier: Minimize Define Then,

38 with Final classifier: Minimize Define Then, maximized when

39 with Final classifier: Minimize Define Then, At time t

40 with Final classifier: Minimize Define Then, At time t At time 1 At time t+1

41 41 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers

42 I NTUITION Train a set of weak hypotheses: h 1, …., h T. The combined hypothesis H is a weighted majority vote of the T weak hypotheses.  Each hypothesis h t has a weight α t. During the training, focus on the examples that are misclassified.  At round t, example x i has the weight D t (i).

43 B ASIC S ETTING Binary classification problem Training data: D t (i): the weight of x i at round t. D 1 (i)=1/m. A learner L that finds a weak hypothesis h t : X  Y given the training set and D t The error of a weak hypothesis h t :

44 T HE BASIC A DA B OOST ALGORITHM For t=1, …, T Train weak learner using training data and D t Get ht: X  {-1,1} with error Choose Update

45 T HE GENERAL A DA B OOST ALGORITHM

46 46 P ROS AND CONS OF A DA B OOST Advantages Very simple to implement Does feature selection resulting in relatively simple classifier Fairly good generalization Disadvantages Suboptimal solution Sensitive to noisy data and outliers


Download ppt "E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte."

Similar presentations


Ads by Google