Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Similar presentations


Presentation on theme: "Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn."— Presentation transcript:

1 Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn EmilyRedAverageHeavyNoSunburn PeteBrownTallHeavyNoNone JohnBrownAverageHeavyNoNone KatieBlondeShortLightYesNone

2 Ensemble methods Use multiple models to obtain better predictive performance Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis Combine multiple weak learners to produce a strong learner Typically much more computation, since you are training multiple learners

3 Ensemble learners Typically combine multiple fast learners (like decision trees) Tend to overfit Tend to get better results since there is deliberately introduced significant diversity among models  Diversity does not mean reduced performance Note that empirical studies have shown that random forests do better than an ensemble of decision trees  Random forest is an ensemble of decisions trees that do not minimize entropy to choose tree nodes.

4 Bayes optimal classifier is an ensemble learner

5 Bagging: Bootstrap aggregating Each model in the ensemble votes with equal weight Train each model with a random training set Random forests do better than bagged entropy reducing DTs

6 Bootstrap estimation Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

7 Bagging For i = 1.. M  Draw n * <n samples from D with replacement  Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

8 Boosting Incremental Build new models that try to do better on previous model's mis-classifications  Can get better accuracy  Tends to overfit Adaboost is canonical boosting algorithm

9 Boosting (Schapire 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1  Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2  Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on  Train weak learner C 3 Final classifier is vote of weak learners

10 Adaboost Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs

11 Discrete Adaboost

12 Real Adaboost

13 Comparison


Download ppt "Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn."

Similar presentations


Ads by Google