Download presentation

Presentation is loading. Please wait.

Published byJulius Ingmire Modified over 2 years ago

1
Ensemble Learning – Bagging, Boosting, and Stacking, and other topics Professor Carolina Ruiz Department of Computer Science WPI Worcester, Massachusetts

2
Prof. Carolina Ruiz Constructing predictors/models 1. Given labeled data, use a data mining technique to train a model 2. Given a new unlabeled data instance, use the trained model to predict its label 2 Data new data prediction Techniques: -Decision trees - Bayesian nets - Neural nets - … Wish list: - Good predictor: low error - Stable: small variations in training data => small variations in resulting model Wish list: - Good predictor: low error - Stable: small variations in training data => small variations in resulting model

3
Prof. Carolina Ruiz Looking for a good model 3 Varying data usedVarying DM technique/parameters - subset of the attributes- different parameters for a technique - subset of the data instances- different techniques - …- … Until a good (low error, stable, …) model is found. But, what if a good model is not found? And even if one is found, how can we improve it? Data prediction

4
Prof. Carolina Ruiz Approach : Ensemble of models 4 Form an ensemble of models and combine their predictions into a single prediction Data prediction

5
Prof. Carolina Ruiz Constructing Ensembles – How? 1. Given labeled data, how to construct an ensemble of models? 2. Given a new unlabeled data instance, how to use the ensemble to predict its label? 5 Data new data prediction Data: What (part of the) data to use to train each model in the ensemble? Data Mining Techniques: What technique and/or what parameters to use to train each model? How to combine the individual model predictions into a unified prediction?

6
Prof. Carolina Ruiz Several Approaches Bagging (Bootstrap Aggregating) Breiman, UC Berkeley Boosting Schapire, ATT Research (now at Princeton U). Friedman, Stanford U. Stacking Wolpert, NASA Ames Research Center Model Selection Meta-learning Floyd, Ruiz, Alvarez, WPI and Boston College Mixture of Experts in Neural Nets Alvarez, Ruiz, Kawato, Kogel, Boston College and WPI … 6

7
Prof. Carolina Ruiz 1. Create bootstrap replicates of the data (e.g., randomly sampled subsets of the data instances) and train a model on each replicate. 2. Given a new unlabeled data instance, input it to each model: 7 data R1 new data prediction Usually the same data mining technique is used to train each model the ensemble prediction is the (weighted) average of the individual model predictions (voting system) Bagging (Bootstrap Aggregation) Breiman, UC Berkeley data R2 data Rn … … …… May help stabilize models

8
Prof. Carolina Ruiz 1. Assign equal weights to data instances. 2. Train a model. Increase (decrease) the weight of incorrectly (correctly) predicted data instances. Repeat Given a new unlabeled data instance, run it by the merged model: 8 data new data prediction Usually same data mining technique the ensemble prediction is the prediction of the merged model (e.g., majority vote, weighted average, …) Boosting Schapire, ATT Research/Princeton U. Friedman, Stanford U. data ……… May help decrease prediction error

9
Prof. Carolina Ruiz 1. Train different models on the same data (Level-0 models) 2. Train a new (Level-1) model with the outputs of the Level-0 models 2. Given a new unlabeled data instance, input it to each Level-0 model: 9 new data prediction Using different parameters and/or different data mining techniques the ensemble prediction is the Level-1 model prediction based on the Level-0 model predictions Stacking Wolpert, NASA Ames Research Center data …… May help reduce prediction error … Level-0 Level-1

10
Prof. Carolina Ruiz 1. Train different Level-0 models 2. Train a Level-1 model to predict which is the best Level-0 model for a given data instance 2. Given a new unlabeled data instance, input it to the Level-1 model: 10 new data prediction Using different parameters and/or different data mining techniques the ensemble prediction is the prediction of the Level-0 model selected by the Level-1 model for the input data instance Model Selection Meta-learning Floyd, Ruiz, Alvarez, WPI and Boston College data …… May help determine what technique/model works best on given data … Level-0 Level-1 … … …

11
Prof. Carolina Ruiz 1. Split data attributes into domain meaningful subgroups: A, A, … 2. Create and train a Mixture of Experts Feed-Forward Neural Net: 3. Given a new unlabeled data instance, feed it forward through the mixture of experts 11 ANN layers: input hidden output Note that not all connections between input and hidden nodes are included the mixture of experts prediction is the output produced by the network Mixture of Experts Architecture Alvarez, Ruiz, Kawato, Kogel, Boston College and WPI May help speed-up ANN training without increasing prediction error Data A A A new data A A A prediction

12
Prof. Carolina Ruiz Conclusions Ensemble methods construct and/or combine collection of predictors with the purpose of improving upon the properties of the individual predictors: stabilize models reduce prediction error aggregate individual predictors that make different errors more resistant to noise 12

13
Prof. Carolina Ruiz References J.F. Elder, G. Ridgeway. Combining Estimators to Improve Performance KDD- 99 tutorial notes L. Breiman. Bagging Predictors. Machine Learning, 24(2), R.E. Schapire. The strength of weak learnability. Machine Learning. 5(2), Y. Freund, R. Schapire. Experiments with a new boosting algorithm. Proc. of the 13 th Intl. Conf. on Machine Learning J. Friedman, T. Hastie, R. Tibshirani. Additive Logistic Regression: a statistical view of boosting. Annals of Statistics D.H Wolpert. Stacked Generalization. Neural Networks. 5(2), S. Floyd, C. Ruiz, S. A. Alvarez, J. Tseng, and G. Whalen. "Model Selection Meta- Learning for the Prognosis of Pancreatic Cancer", full paper, Proc. 3rd Intl. Conf. on Health Informatics (HEALTHINF 2010), pp S.A. Alvarez, C. Ruiz, T. Kawato, and W. Kogel. Faster neural networks for combined collaborative and content based recommendation. Journal of Computational Methods in Sciences and Engineering (JCMSE). IOS Press. Vol. 11, N. 4, pp

14
Prof. Carolina Ruiz The End Questions? 14

15
Prof. Carolina Ruiz Bagging (Bootstrap Aggregation) Model Creation: Create bootstrap replicates of the dataset and fit a model to each one Prediction: Average/vote predictions of each model Advantages Stabilizes unstable methods Easy to implement, parallelizable. 15

16
Prof. Carolina Ruiz Bagging Algorithm 1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models 16

17
Prof. Carolina Ruiz Boosting Creating the model: Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it. Prediction: Merge the models in the sequence Advantages: Improves classification accuracy 17

18
Prof. Carolina Ruiz Generic Boosting Algorithm 1. Equally weight all instance in dataset 2. For I = 1 to T 2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances 3. Merge the models in the sequence to obtain the final model 18

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google