My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.

Slides:



Advertisements
Similar presentations
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Indian Statistical Institute Kolkata
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
Ensemble Learning: An Introduction
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Three kinds of learning
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
For Better Accuracy Eick: Ensemble Learning
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
Benk Erika Kelemen Zsolt
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Ensemble with Neighbor Rules Voting Itt Romneeyangkurn, Sukree Sinthupinyo Faculty of Computer Science Thammasat University.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Machine Learning Supervised Learning Classification and Regression
Introduction to Machine Learning
Data Mining Practical Machine Learning Tools and Techniques
第 3 章 神经网络.
Machine Learning: Ensembles
COMP61011 : Machine Learning Ensemble Models
Neuro-Computing Lecture 5 Committee Machine
A “Holy Grail” of Machine Learing
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
Introduction to Data Mining, 2nd Edition
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Artificial Intelligence Chapter 3 Neural Networks
CSE 573 Introduction to Artificial Intelligence Decision Trees
Artificial Intelligence Chapter 3 Neural Networks
Ensemble learning.
Model Combination.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
INTRODUCTION TO Machine Learning 3rd Edition
Artificial Intelligence Chapter 3 Neural Networks
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon

Classification problem through supervised learning. (Notation) S Given a set S of training examples { x 1, x 2, … x m } with corresponding class labels { y 1, y 2, … y m } Each x i is a vector with n “features”. Each y i is one of K class labels. S A learning algorithm takes S and produces a hypothesis h

Pictoral View of the Learning Process x1x1 y1y1 x2x2 y2y2 xmxm ymym Learning Algorithm (Neural Net, Nearest Neighbor, etc…) Hypothesis (gimme an x, I’ll give you a y) x y

x1x1 y1y1 x2x2 y2y2 xmxm ymym Ensembled Learning Algorithm x y The Ensemble Method Hey, what’s going on inside there? Give us an x, we’ll give you a y. H1 H2 H3... HL

Characteristics of Ensemble Classifiers Ensemble classifiers are often more accurate than any of its individual members A necessary and sufficient condition for this is that individuals be accurate and diverse. Accuracy of a classifier means an error rate  of less than 1/2. Two individual classifiers are diverse when their out of sample errors are uncorrelated (the  are independent random variables)

Fundamental Reasons why Ensembles might perform better. Statistical. When training data is too small, there are many hypothesi which satisfy it. Ensembling reduces the chance of picking a bad classifier. Computational. Depending on the learning algorithm, individuals might get stuck in local minima of training errors. Ensembling reduces the chance of getting stuck in a bad minima Representational. An ensembled cast may represent a classifier which was not possible in the original set of hypothesi.

Methods for obtaining an Ensemble. Problem: Given that we only have one training set and one learning algorithm, how can we produce multiple hypthesi? Solution: Fiddle with everything we can get our hands on! Manipulate the training examples Manipulate the input data points Manipulate the target output (the class labels) of the training data Inject Randomness

Manipulating the training examples. Run the learning algorithm multiple times with subsets of training data. This works well for unstable learning algorithms. Unstable - decision tree’s - Neural Networks - rule learning Stable - linear regression - Nearest Neighbor - linear threshold

Bagging (manipulating training examples) Take multiple bootstrap replicates of the training data. Question: If you sample N points from a batch of N (with replacement), how many of the original N do you expect to have? Poll the audience… – On each sample, a given point has a (N-1)/N chance of being missed. – To be completely left out after N samples occurs with prob [ (N-1)/N ] ^ N. – Thus to be included at least once occurs with prob 1 - [ (N-1)/N ] ^ N = 1 - [ 1 - 1/N ] ^ N = 1 - 1/e =.63 as N gets large. – Thus we expect 63% of the points to be in the bootstrap replicate

AdaBoosting (still manipulating with the training set) - chooses a series of hypothesi, but the latter ones are designed to excel in the places (the training examples) that the earlier hypothesi did not.

Manipulation of the input data - Each input x is a vector of n features. - Train multiple hypothesi based on the same training set, but for each x i, only a subset of the n features are taken. - Cherkauer (1996) used this method to train an ensemble of neural nets to identify volcanoes on Venus. - there were 119 input features - they were grouped (by hand) into subsets of features based on different image processing operations, like PCA, Fourier, etc… - the resulting ensemble matched the ability of expert humans - Tumer and Ghosh (1996) applied this technique to sonar data and found that removing any of the input features hurt the performance - The technique only works when the features contain redundant data.

Manipulation of the output targets (of the input data) - Each x is mapped to one of K classes (where K is large). - Divide the set of K classes into 2 groups A and B. - Learn that new (and simpler) learning problem for various partitions A and B. - Each member of the ensemble then implicitly votes for K/2 of the K classes that are in A or B (whichever was voted for). - Think of it like classifying cities to the states where they are, but first classifying which region (southwest, northwest, etc) first. - Benefit: Can use any 2-classifier to classify arbitrary K class problem.

Injection of Randomness - Neural Networks - initial weights can be randomly chosen. - the ensemble consists of NN’s trained with different initial weights - Between: a) 10-fold cross-validated committees b) bagging and c) random initial weights, they performed in that order: a) was the best, c) worst. - Injecting randomness into the input vectors is another option.

Comparison of Ensemble Methods (empirical) 1) C4.5 2) C4.5 with injected randomness (in the tree-building) 3) Bagged C4.5 4) AdaBoost C data sets with little or no noise - AdaBoost performed the best - same 33 data sets with artificial 20% class label noise - Bagging was the best (AdaBoost overfit) - Analogy: AdaBoost tries to come up with a theory that explains everything, Bagging makes sure to know most of it.

Interpretation of the Methods by appealing to the Fundamental Reasons for Ensemble performance - Bagging and Randomness work by attacking the Statistical issue. - AdaBoost attacks the Representational Problem (it recognizes and uses the fact that not every hypothesis will be correct for all the training points).