Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training.

Similar presentations


Presentation on theme: "© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training."— Presentation transcript:

1 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Ensemble Methods l An ensemble method constructs a set of base classifiers from the training data –Ensemble or Classifier Combination l Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

2 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ General Idea

3 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate,  = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

4 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Methods l By manipulating the training dataset: a classifier is built for a sampled subset of the training dataset. –Two ensemble methods: bagging (bootstrap averaging) and boosting.

5 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Characteristics l Ensemble methods work better with unstable classifiers. –Base classifiers that are sensitive to minor perturbations in the training set.  For example, decision trees and ANNs. –The variability among training examples is one of the primary sources of errors in a classifier.

6 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bias-Variance Decomposition l Consider the trajectories of a projectile launched at a particular angle. The observed distance can be divided into 3 components. –Force (f) and angle (θ) –Suppose the target is t, but the projectile hits at x at distance d away from t.

7 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Two Decision Trees (1)

8 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Two Decision Trees (2) l Bias: The stronger the assumptions made by a classifier about the nature of its decision boundary, the larger the classifier’s bias will be. –A smaller tree has a stronger assumption. –An algorithm cannot learn the target. l Variance: Variability in the training data affects the expected error, because different compositions of the training set may lead to different decision boundaries. l Intrinsic noise in the target class. –Target class for some domain can be non- deterministic. –Same attributes values with different class labels.

9 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability 1 - (1 – 1/n) n of being selected. When n is large, a bootstrap sample Di contains about 63.2% of the training data.

10 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bagging Algorithm

11 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Bagging Example (1) l Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. l Without bagging, the best decision stump is –x = 0.75

12 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Bagging Example (2)

13 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Bagging Example (3)

14 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Bagging Example (4)

15 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Summary on Bagging l Bagging improves generalization error by reducing the variance of the base classifier. l Bagging depends on the stability of the base classifier. l If a base classifier is unstable, bagging helps to reduce the errors associated with random fluctuations in the training data. l If a base classifier is stable, then the error of the ensemble is primarily caused by bias in the base classifier. Bagging may make error larger, because the sample size is 37% smaller.

16 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round –The weights can be used by a base classifier to learn a model that is biased toward higher- weight examples.

17 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

18 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

19 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated l Classification:

20 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/

21 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Boosting Example (1) l Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. l Without bagging, the best decision stump is –x = 0.75

22 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Boosting Example (2)

23 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ A Boosting Example (3)


Download ppt "© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Ensemble Methods l An ensemble method constructs a set of base classifiers from the training."

Similar presentations


Ads by Google