Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn EmilyRedAverageHeavyNoSunburn PeteBrownTallHeavyNoNone JohnBrownAverageHeavyNoNone KatieBlondeShortLightYesNone

Ensemble methods Use multiple models to obtain better predictive performance Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis Combine multiple weak learners to produce a strong learner Typically much more computation, since you are training multiple learners

Ensemble learners Typically combine multiple fast learners (like decision trees) Tend to overfit Tend to get better results since there is deliberately introduced significant diversity among models  Diversity does not mean reduced performance Note that empirical studies have shown that random forests do better than an ensemble of decision trees  Random forest is an ensemble of decisions trees that do not minimize entropy to choose tree nodes.

Bayes optimal classifier is an ensemble learner

Bagging: Bootstrap aggregating Each model in the ensemble votes with equal weight Train each model with a random training set Random forests do better than bagged entropy reducing DTs

Bootstrap estimation Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

Bagging For i = 1.. M  Draw n * <n samples from D with replacement  Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

Boosting Incremental Build new models that try to do better on previous model's mis-classifications  Can get better accuracy  Tends to overfit Adaboost is canonical boosting algorithm

Boosting (Schapire 1989) Randomly select n 1 < n samples from D without replacement to obtain D 1  Train weak learner C 1 Select n 2 < n samples from D with half of the samples misclassified by C 1 to obtain D 2  Train weak learner C 2 Select all samples from D that C 1 and C 2 disagree on  Train weak learner C 3 Final classifier is vote of weak learners

Adaboost Learner = Hypothesis = Classifier Weak Learner: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak learner outputs

Discrete Adaboost

Real Adaboost

Comparison

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Similar presentations

Presentation on theme: "Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Similar presentations

Presentation on theme: "Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn."— Presentation transcript:

Similar presentations

About project

Feedback