Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Similar presentations


Presentation on theme: "Ensemble Learning Spring 2009 Ben-Gurion University of the Negev."— Presentation transcript:

1 Ensemble Learning Spring 2009 Ben-Gurion University of the Negev

2 Sensor Fusion Spring 2009 Instructor Dr. H. B Mitchell email: harveymitchell@walla.co.il

3 Sensor Fusion Spring 2009 Ensemble Learning Ensemble learning: Uses a collection (ensemble) of hypothesis and combine their predictions. Example. Generate 100 different decision trees from the same or different training set and have them vote on the best classification for a new example. Motivation: reduce the error rate. Hope is that it will become much more unlikely that the ensemble of 100 decision trees will misclassify an example.

4 Sensor Fusion Spring 2009 Why Ensembles No single algorithm wins all the time However when combing multiple independent and diverse decisions each of which is at least more accurate than random guessing, random errors cancel each other out, correct decisions are reinforced. Source: Ray Mooney

5 Sensor Fusion Spring 2009 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different learning algorithms. Combine decisions of multiple definitions, e.g. using weighted voting. Training Data Data1Data mData2                Learner1 Learner2 Learner m                Model1Model2Model m                Model Combiner Final Model

6 Sensor Fusion Spring 2009 Methods for constructing ensembles Subsampling the training examples Multiple hypotheses are generated by training individual classifiers on different datasets obtained by resampling a common training set (Bagging, Boosting) Manipulating the input features Multiple hypotheses are generated by training individual classifiers on different representations, or different subsets of a common feature vector Manipulating the output targets The output targets for C classes are encoded with an L-bit codeword, and an individual classifier is built to predict each one of the bits in the codeword Modifying the learning parameters of the classifier A number of classifiers are built with different learning parameters, such as number of neighbors in a k Nearest Neighbor rule, initial weights in an MLP, etc

7 Sensor Fusion Spring 2009 Combinational strategies static combiners Voting -- based on label Averaging -- based on confidence Borda counts --based on rank Weighted Averaging -- based on performance etc adaptive combiners Mixture of Experts (ME) -- base on region etc

8 Sensor Fusion Spring 2009 Suppose we have 5 completely independent classifiers  If accuracy is 70% for each then probability of 3, 4 or 5 being correct is (.7 5 )+5(.7 4 )(.3)+ 10 (.7 3 )(.3 2 ) 83.7% majority vote accuracy  If we have 101 such classifiers then obtain 99.9% majority vote accuracy Majority Vote

9 Sensor Fusion Spring 2009 9 Bootstrap Estimation Repeatedly draw n samples from D For each set of samples, estimate a statistic The bootstrap estimate is the mean of the individual estimates Used to estimate a statistic (parameter) and its variance

10 Sensor Fusion Spring 2009 Training set S...... X1X1 X2X2 X3X3 X4X4 XnXn Bagging Introduced by Breiman in 1996 Based on bootstraping with replacement Useful with unstable algorithms (e.g. decision trees) S2S2 SbSb S1S1 X 21 X 22 X 23 X 24 X 2n X 11 X 12 X 13 X 14 X 1n X b1 X b2 X b3 X b4 X bn Learning algorithm Classifier C 1 Learning algorithm Classifier C 2 Learning algorithm Classifier C b ENSEMBLE

11 Sensor Fusion Spring 2009 Bagging Create ensembles by “bootstrap aggregation”, i.e., repeatedly randomly resampling the training data (Brieman, 1996). Bootstrap: draw N items from D with replacement Bagging  Train M learners on M bootstrap samples  Combine outputs by voting (e.g., majority vote) Decreases error by decreasing the variance in the results due to unstable learner algorithms (like decision trees and neural networks) whose output can change dramatically when the training data is slightly changed.

12 Sensor Fusion Spring 2009 Bagging - Aggregate Bootstrapping Given a standard training set D of size n For i = 1.. M  Draw a sample of size n * <n from D uniformly and with replacement  Learn classifier C i Final classifier is a vote of C 1.. C M Increases classifier stability/reduces variance

13 Sensor Fusion Spring 2009 Random Subspace Method Introduced by Ho in 1998 Modification of training data in feature space Useful with high dimensional data...... X1X1 X2X2 X3X3 X4X4 XnXn Learning algorithm Learning algorithm Learning algorithm Classifier C 1 Classifier C 2 Classifier C b S’ 1 S’ 2 S’ b S1S1 X’ 13 X’ 14 X’ 12 X’ 11 X’ 1n 123P’… S1S1 X’ 13 X’ 14 X’ 12 X’ 11 X’ 1n 123P’… S1S1 X’ 13 X’ 14 X’ 12 X’ 11 X’ 1n 123P’… ENSEMBLE

14 Sensor Fusion Spring 2009 Strong and Weak Learners Strong Learner  Objective of machine learning  Take labeled data for training  Produce a classifier which can be arbitrarily accurate Weak Learner  Take labeled data for training  Produce a classifier which is more accurate than random guessing

15 Sensor Fusion Spring 2009 15 Adaboost - Adaptive Boosting Instead of resampling, uses training set re-weighting  Each training sample uses a weight to determine the probability of being selected for a training set. AdaBoost is an algorithm for constructing a “strong” classifier as linear combination of “simple” “weak” classifier Final classification based on weighted vote of weak classifiers

16 Sensor Fusion Spring 2009 Construct Weak Classifiers Using Different Data Distribution  Start with uniform weighting  During each step of learning Increase weights of the examples which are not correctly learned by the weak learner Decrease weights of the examples which are correctly learned by the weak learner Idea  Focus on difficult examples which are not correctly classified in the previous steps

17 Sensor Fusion Spring 2009 Combine Weak Classifiers Weighted Voting  Construct strong classifier by weighted voting of the weak classifiers Idea  Better weak classifier gets a larger weight  Iteratively add weak classifiers Increase accuracy of the combined classifier through minimization of a cost function

18 Sensor Fusion Spring 2009 18 Adaboost Terminology h t (x) is “weak” or basis classifier (Classifier = Learner = Hypothesis) is “strong” or final classifier. For a binary classifier: Weak Classifier: < 50% error over any distribution Strong Classifier: thresholded linear combination of weak classifier outputs

19 Sensor Fusion Spring 2009 Adaptive Boosting: High Level Description t=0; iteration counter T required number of hypotheses (counter number) Set same weight for all the examples (typically each example has weight = 1); While (t < T) Increase iteration counter : t =t+1. Generate new hypothesis (classifier) h t. Increase the weight of the misclassified examples in hypothesis h t Weighted majority rule used to combine all T hypotheses where the weights tell us how well h t performed on the training set.

20 Sensor Fusion Spring 2009 20 Discrete Adaboost Algorithm Each training sample has weight, which determines probability of being selected for training the classifier

21 Sensor Fusion Spring 2009 Supervised learning task  Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies.  Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix’s current movie recommender/classifier (MSE = 0.9514) Began October 2006

22 Sensor Fusion Spring 2009 Just three weeks after it began, at least 40 teams had bested the Netflix classifier. Top teams showed about 5% improvement.

23 Sensor Fusion Spring 2009 Today, the top team has posted a 8.5% improvement. Ensemble methods are the best performers…

24 Sensor Fusion Spring 2009 “Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies

25 Sensor Fusion Spring 2009 “My approach is to combine the results of many methods (also two- way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

26 Sensor Fusion Spring 2009 “When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

27 Sensor Fusion Spring 2009 Gravity home.mit.bme.hu/~gtakacs/download/gravity.pdf

28 Sensor Fusion Spring 2009 “Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor / KorBell


Download ppt "Ensemble Learning Spring 2009 Ben-Gurion University of the Negev."

Similar presentations


Ads by Google