Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Similar presentations


Presentation on theme: "Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley."— Presentation transcript:

1 Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley

2 Example: Weather Forecast (Two heads are better than one)
Reality 1 2 3 4 5 Combine X X X X X Picture Source: Carla Gomez Introduction to Machine Learning and Data Mining, Carla Brodley

3 Majority Vote Model Majority vote
Choose the class predicted by more than ½ the classifiers If no agreement return an error When does this work? Introduction to Machine Learning and Data Mining, Carla Brodley 3

4 Majority Vote Model Let p be the probability that a classifier makes an error Assume that classifier errors are independent The probability that k of the n classifiers make an error is: Therefore the probability that a majority vote classifier is in error is: What happens when p > .5??? Introduction to Machine Learning and Data Mining, Carla Brodley 4

5 Value of Ensembles “No Free Lunch” Theorem
No single algorithm wins all the time! When combing multiple independent decisions, each of which is at least more accurate than random guessing, random errors cancel each other out and correct decisions are reinforced Human ensembles are demonstrably better How many jelly beans are in the jar? Who Wants to be a Millionaire: “Ask the audience” Majority vote is just one kind of ensemble, we will look at several So what is our goal? We want to create an ensemble of classifiers that make independent errors Introduction to Machine Learning and Data Mining, Carla Brodley

6 What is Ensemble Learning?
Ensemble: collection of base learners Each learns the target function Combine their outputs for a final predication Often called “meta-learning” How can you get different learners? How can you combine learners? Give class one idea like using different learning algorithms and ask them to break up into groups and thinking of other ways to create ensembles. Introduction to Machine Learning and Data Mining, Carla Brodley

7 Ensemble Method1: Bagging
Create ensembles by “bootstrap aggregation”, i.e., repeatedly randomly re-sampling training data Bootstrap: draw n items from X with replacement Given a training set X of m instances For Draw sample of size n < m from X uniformly w/ replacement Learn classifier Ci from sample i Final classifier is an unweighted vote of C1 .. CT Introduction to Machine Learning and Data Mining, Carla Brodley

8 Will Bagging Improve Accuracy?
Depends on the stability of the base classifiers If small changes in the sample cause small changes in the base-level classifier, then the ensemble will not be much better than the base classifiers If small changes in the sample cause large changes and the error is < ½ then we will see a big improvement What algorithms are stable/unstable? Class discussion on which algorithms are stable. -- Stable: k-NN, linear discriminant functions -- Unstable: decision trees Introduction to Machine Learning and Data Mining, Carla Brodley

9 Bias-Variance Decomposition
The distance from f Variance of the predictions Independent of predictor Introduction to Machine Learning and Data Mining, Carla Brodley

10 Bias and Variance Bias Problem Variance Problem
The hypothesis space made available by a particular classification method does not include the true hypothesis Variance Problem The hypothesis space is “too large” for the amount of training data – thus selected hypothesis may be inaccurate on unseen data Introduction to Machine Learning and Data Mining, Carla Brodley

11 Why Bagging Improves Accuracy
Decreases error by decreasing the variance in the results due to unstable learners, algorithms (like decision trees and neural networks) whose output can change dramatically when the training data is slightly changed Introduction to Machine Learning and Data Mining, Carla Brodley

12 Why Bagging Improves Accuracy
“Bagging goes a ways toward making a silk purse out of a sows ear especially if the sow’s ear is twitchy” – Leo Breiman Introduction to Machine Learning and Data Mining, Carla Brodley

13 Ensemble Method 2: Boosting:
Key idea: Instead of sampling (as in bagging) re-weigh examples Let m be the number of hypotheses to generate Initialize all training instances to have the same weight for i=1,m generate hypothesis hi increase weights of the training instances that hi misclassifies Final classifier is a weighted vote of all m hypotheses (where the weights are set based on training set accuracy) There are many variants – differ in how to set the weights and how to combine hypotheses. Introduction to Machine Learning and Data Mining, Carla Brodley

14 Adaptive Boosting Each rectangle corresponds to an example, with weight proportional to its height h1 h2 h3 Introduction to Machine Learning and Data Mining, Carla Brodley

15 How do these algorithms handle instance weights?
Linear discriminant functions Decision trees K-NN Introduction to Machine Learning and Data Mining, Carla Brodley

16 Adaboost for if then break for if then else end end return
T is the number of iterations m is the number of instances H is the classifier algorithm for if then break for if then Z_t is a normalization factor that makes the D_i sum to 1 and be a probability distribution. Z is calculated by adding up all of the D_t(i)e^\alpha^t for incorrect + D_t(i)e^-\alpha^t for the correct Homework assignment – explain how to calculate the Z_t Or just give them a different example and have them calculate the D_t+1(i) else end end return Introduction to Machine Learning and Data Mining, Carla Brodley

17 Example Let m =20, then Imagine that is correct on 15 and incorrect on 5 instances, then ε = 0.25 and α = We reweight as follows: Correct Instances: Incorrect Instances: Note that after normalization Introduction to Machine Learning and Data Mining, Carla Brodley

18 Boosting Originally developed by computational learning theorists to guarantee performance improvements on fitting training data for a weak learner that only needs to generate a hypothesis with a training accuracy greater than 0.5 (Schapire, 1990). Revised to be a practical algorithm, AdaBoost, for building ensembles that empirically improves generalization performance (Freund & Shapire, 1996). Introduction to Machine Learning and Data Mining, Carla Brodley

19 Strong and Weak Learners
“Strong learner” produces a classifier which can be arbitrarily accuracy “Weak Learner” produces a classifier more accurate than random guessing Original question: Can a set of weak learners create a single strong learner? Introduction to Machine Learning and Data Mining, Carla Brodley

20 Summary of Boosting and Bagging
Called “homogenous ensembles” Both use a single learning algorithm but manipulate training data to learn multiple models Data1  Data2  …  Data T Learner1 = Learner2 = … = Learner T Methods for changing training data: Bagging: Resample training data Boosting: Reweight training data In WEKA, these are called meta-learners, they take a learning algorithm as an argument (base learner) and create a new learning algorithm Introduction to Machine Learning and Data Mining, Carla Brodley

21 What is Ensemble Learning?
Ensemble: collection of base learners Each learns the target function Combine their outputs for a final predication Often called “meta-learning” How can you get different learners? How can you combine learners? Give class one idea like using different learning algorithms and ask them to break up into groups and thinking of other ways to create ensembles. Introduction to Machine Learning and Data Mining, Carla Brodley

22 Where do Learners come from?
Bagging Boosting Partitioning the data (must have a large amount) Using different feature subsets, different algorithms, different parameters of the same algorithm Introduction to Machine Learning and Data Mining, Carla Brodley

23 Ensemble Method 3: Random Forests
For i = 1 to T, Take a bootstrap sample (bag) Grow a random decision tree T_i At each node choose a feature from one of n features (n < total number of features) Grow a full tree (do not prune) Classify new objects by taking a majority vote of the T random trees Grow trees deep to avoid bias. Introduction to Machine Learning and Data Mining, Carla Brodley

24 Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1), 5-32
Introduction to Machine Learning and Data Mining, Carla Brodley

25 What is Ensemble Learning?
Ensemble: collection of base learners Each learns the target function Combine their outputs for a final predication Often called “meta-learning” How can you get different learners? How can you combine learners? Give class one idea like combining using unweighted votes from bagging, ask for other ideas. Introduction to Machine Learning and Data Mining, Carla Brodley

26 Methods for Combining Classifiers
Unweighted vote (Bagging) If classifiers produce class probabilities rather than votes we can combine probabilities Weighted vote (typically a function of the accuracy) Stacking – learning how to combine classifiers Introduction to Machine Learning and Data Mining, Carla Brodley

27 Introduction to Machine Learning and Data Mining, Carla Brodley

28 Supervised learning task
Began October 2006 Supervised learning task Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix’s current movie recommender Introduction to Machine Learning and Data Mining, Carla Brodley

29 Ensemble methods are the best performers…
. Ensemble methods are the best performers… Introduction to Machine Learning and Data Mining, Carla Brodley

30 “Our final solution (RMSE=0.8712) consists of blending
107 individual results. “ Introduction to Machine Learning and Data Mining, Carla Brodley


Download ppt "Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley."

Similar presentations


Ads by Google