Download presentation

Published byTomas Bees Modified over 3 years ago

1
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

2
General Idea

3
**Why does it work? Suppose there are 25 base classifiers**

Each classifier has error rate, = 0.35 Assume classifiers are independent Probability that the ensemble classifier makes a wrong prediction:

4
Methods By manipulating the training dataset: a classifier is built for a sampled subset of the training dataset. Two ensemble methods: bagging (bootstrap averaging) and boosting.

5
Characteristics Ensemble methods work better with unstable classifiers. Base classifiers that are sensitive to minor perturbations in the training set. For example, decision trees and ANNs. The variability among training examples is one of the primary sources of errors in a classifier.

6
**Bias-Variance Decomposition**

Consider the trajectories of a projectile launched at a particular angle. The observed distance can be divided into 3 components. Force (f) and angle (θ) Suppose the target is t, but the projectile hits at x at distance d away from t.

7
Two Decision Trees (1)

8
Two Decision Trees (2) Bias: The stronger the assumptions made by a classifier about the nature of its decision boundary, the larger the classifier’s bias will be. A smaller tree has a stronger assumption. An algorithm cannot learn the target. Variance: Variability in the training data affects the expected error, because different compositions of the training set may lead to different decision boundaries. Intrinsic noise in the target class. Target class for some domain can be non- deterministic. Same attributes values with different class labels.

9
**Bagging Sampling with replacement**

Build classifier on each bootstrap sample Each sample has probability 1 - (1 – 1/n)n of being selected. When n is large, a bootstrap sample Di contains about 63.2% of the training data.

10
Bagging Algorithm

11
A Bagging Example (1) Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. Without bagging, the best decision stump is x <= 0.35 or x >= 0.75

12
A Bagging Example (2)

13
A Bagging Example (3)

14
A Bagging Example (4)

15
Summary on Bagging Bagging improves generalization error by reducing the variance of the base classifier. Bagging depends on the stability of the base classifier. If a base classifier is unstable, bagging helps to reduce the errors associated with random fluctuations in the training data. If a base classifier is stable, then the error of the ensemble is primarily caused by bias in the base classifier. Bagging may make error larger, because the sample size is 37% smaller.

16
Boosting An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights Unlike bagging, weights may change at the end of boosting round The weights can be used by a base classifier to learn a model that is biased toward higher- weight examples.

17
Boosting Records that are wrongly classified will have their weights increased Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

18
**Example: AdaBoost Base classifiers: C1, C2, …, CT Error rate:**

Importance of a classifier:

19
**Example: AdaBoost Weight update:**

If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated Classification:

21
A Boosting Example (1) Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. Without bagging, the best decision stump is x <= 0.35 or x >= 0.75

22
A Boosting Example (2)

23
A Boosting Example (3)

Similar presentations

OK

Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.

Training of Boosted DecisionTrees Helge Voss (MPI–K, Heidelberg) MVA Workshop, CERN, July 10, 2009.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on natural disasters free download Ppt on indian textile industries in usa Ppt on industrial employment standing orders act Ppt on air water and land pollution Ppt on regular expression cheat Ppt on heritage of india Ppt on vegetarian and non vegetarian relationships Ppt on column chromatography pdf Ppt on biodegradable and non-biodegradable objects Ppt on ocean currents