Sparse vs. Ensemble Approaches to Supervised Learning

Slides:



Advertisements
Similar presentations
Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.
Advertisements

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Boosting CMPUT 615 Boosting Idea We have a weak classifier, i.e., it’s error rate is a little bit better than 0.5. Boosting combines a lot of such weak.
Ensemble Learning: An Introduction
Optimization of Signal Significance by Bagging Decision Trees Ilya Narsky, Caltech presented by Harrison Prosper.
Three kinds of learning
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
7/17/2002 Greg Grudic: Nonparametric Modeling 1 High Dimensional Nonparametric Modeling Using Two-Dimensional Polynomial Cascades Greg Grudic University.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.
CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ensemble Methods in Machine Learning
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Analytics CMIS Short Course part II Day 1 Part 3: Ensembles Sam Buttrey December 2015.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Boosting and Additive Trees (Part 1) Ch. 10 Presented by Tal Blum.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
Bagging and Random Forests
Boosting and Additive Trees (2)
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Lecture 18: Bagging and Boosting
Ensemble learning.
Support Vector Machine _ 2 (SVM)
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Classification with CART
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Sparse vs. Ensemble Approaches to Supervised Learning Greg Grudic Intro AI Ensembles

Goal of Supervised Learning? Minimize the probability of model prediction errors on future data Two Competing Methodologies Build one really good model Traditional approach Build many models and average the results Ensemble learning (more recent) Intro AI Ensembles

The Single Model Philosophy Motivation: Occam’s Razor “one should not increase, beyond what is necessary, the number of entities required to explain anything” Infinitely many models can explain any given dataset Might as well pick the smallest one… Intro AI Ensembles

Which Model is Smaller? In this case It’s not always easy to define small! Intro AI Ensembles

Exact Occam’s Razor Models Exact approaches find optimal solutions Examples: Support Vector Machines Find a model structure that uses the smallest percentage of training data (to explain the rest of it). Bayesian approaches Minimum description length Intro AI Ensembles

How Do Support Vector Machines Define Small? Minimize the number of Support Vectors! Maximized Margin Intro AI Ensembles

Approximate Occam’s Razor Models Approximate solutions use a greedy search approach which is not optimal Examples Kernel Projection Pursuit algorithms Find a minimal set of kernel projections Relevance Vector Machines Approximate Bayesian approach Sparse Minimax Probability Machine Classification Find a minimum set of kernels and features Intro AI Ensembles

Other Single Models: Not Necessarily Motivated by Occam’s Razor Minimax Probability Machine (MPM) Trees Greedy approach to sparseness Neural Networks Nearest Neighbor Basis Function Models e.g. Kernel Ridge Regression Intro AI Ensembles

Ensemble Philosophy Build many models and combine them Only through averaging do we get at the truth! It’s too hard (impossible?) to build a single model that works best Two types of approaches: Models that don’t use randomness Models that incorporate randomness Intro AI Ensembles

Ensemble Approaches Bagging Boosting Random Forests Bootstrap aggregating Boosting Random Forests Bagging reborn Intro AI Ensembles

Bagging Main Assumption: Hypothesis Space Combining many unstable predictors to produce a ensemble (stable) predictor. Unstable Predictor: small changes in training data produce large changes in the model. e.g. Neural Nets, trees Stable: SVM (sometimes), Nearest Neighbor. Hypothesis Space Variable size (nonparametric): Can model any function if you use an appropriate predictor (e.g. trees) Intro AI Ensembles

The Bagging Algorithm For Given data: For Obtain bootstrap sample from the training data Build a model from bootstrap data Intro AI Ensembles

The Bagging Model Regression Classification: Vote over classifier outputs Intro AI Ensembles

Bagging Details On average each bootstrap sample has 63% of instances Bootstrap sample of N instances is obtained by drawing N examples at random, with replacement. On average each bootstrap sample has 63% of instances Encourages predictors to have uncorrelated errors This is why it works Intro AI Ensembles

Bagging Details 2 Usually set The models need to be unstable Or use validation data to pick The models need to be unstable Usually full length (or slightly pruned) decision trees. Intro AI Ensembles

Boosting Main Assumption: Hypothesis Space Combining many weak predictors (e.g. tree stumps or 1-R predictors) to produce an ensemble predictor The weak predictors or classifiers need to be stable Hypothesis Space Variable size (nonparametric): Can model any function if you use an appropriate predictor (e.g. trees) Intro AI Ensembles

Commonly Used Weak Predictor (or classifier) A Decision Tree Stump (1-R) Intro AI Ensembles

Boosting Each classifier is trained from a weighted Sample of the training Data Intro AI Ensembles

Boosting (Continued) Each predictor is created by using a biased sample of the training data Instances (training examples) with high error are weighted higher than those with lower error Difficult instances get more attention This is the motivation behind boosting Intro AI Ensembles

Background Notation The function is defined as: The function is the natural logarithm Intro AI Ensembles

The AdaBoost Algorithm (Freund and Schapire, 1996) Given data: Initialize weights For Fit classifier to data using weights Compute Set Intro AI Ensembles

The AdaBoost Model AdaBoost is NOT used for Regression! Intro AI Ensembles

The Updates in Boosting Intro AI Ensembles

Boosting Characteristics Simulated data: test error rate for boosting with stumps, as a function of the number of iterations. Also shown are the test error rate for a single stump, and a 400 node tree. Intro AI Ensembles

Loss Functions for Misclassification Exponential (Boosting) Binomial Deviance (Cross Entropy) Squared Error Support Vectors Incorrect Classification Correct Classification Intro AI Ensembles

Other Variations of Boosting Gradient Boosting Can use any cost function Stochastic (Gradient) Boosting Bootstrap Sample: Uniform random sampling (with replacement) Often outperforms the non-random version Intro AI Ensembles

Gradient Boosting Intro AI Ensembles

Boosting Summary Good points Bad points Fast learning Capable of learning any function (given appropriate weak learner) Feature weighting Very little parameter tuning Bad points Can overfit data Only for binary classification Learning parameters (picked via cross validation) Size of tree When to stop Software http://www-stat.stanford.edu/~jhf/R-MART.html Intro AI Ensembles

Random Forests (Leo Breiman, 2001) http://www. stat. berkeley Injecting the right kind of randomness makes accurate models As good as SVMs and sometimes better As good as boosting Very little playing with learning parameters is needed to get very good models Traditional tree algorithms spend a lot of time choosing how to split at a node Random forest trees put very little effort into this Intro AI Ensembles

Algorithmic Goal of Random Forests Create many trees (50-1,000) Inject randomness into trees such that Each tree has maximal strength i.e. a fairly good model on its own Each tree has minimum correlation with the other trees i.e. the errors tend to cancel out Intro AI Ensembles

RFs Use Out-of-Bag Samples Out-of-bag samples for tree in a forest are those training examples that are NOT used to construct tree Out-of-bag samples can be used for model selection. They give unbiased estimates of: Error on future data Don’t need to use cross validation!!!! Internal strength Internal correlation Intro AI Ensembles

R.I. (Random Input) Forests For K trees: Build each tree by: Selecting, at random, at each node a small set of features (F) to split on (given M features). Common values of F are For each node split on the best of this subset Grow tree to full length Regression: average over trees Classification: vote Intro AI Ensembles

R.C. (Random Combination) Forests For K trees: Build each tree by: Create F random linear sums of L variables At each node split on the best of these linear boundaries Grow tree to full length Regression: average over trees Classification: vote Intro AI Ensembles

Classification Data: Intro AI Ensembles

Results: RI Intro AI Ensembles

Results: RC Intro AI Ensembles

Strength, Correlation and Out-of-Bag Error Intro AI Ensembles

Adding Noise: random 5% of training data with outputs flipped Percent increase in error RFs robust To noise! Intro AI Ensembles

Random Forests Regression: The Data Intro AI Ensembles

Results: RC 25 Features 2 Random Inputs Mean Square Test Error Intro AI Ensembles

Results: Test Error and OOB Error Intro AI Ensembles

Random Forests Regression: Adding Noise to Output vs. Bagging Intro AI Ensembles

Random Forests Summery Adding the right kind of noise is good! Inject randomness such that Each model has maximal strength i.e. a fairly good model on its own Each model has minimum correlation with the other models i.e. the errors tend to cancel out Intro AI Ensembles