Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
Today: Quizz 11: review. Last quizz! Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions DEC 8 – 9am FINAL.
Model Assessment, Selection and Averaging
Longin Jan Latecki Temple University
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Sparse vs. Ensemble Approaches to Supervised Learning
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Ensemble Learning what is an ensemble? why use an ensemble?
Resampling techniques
Ensemble Learning: An Introduction
Evaluating Hypotheses
Adaboost and its application
Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability Software complexity and software quality.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Bootstrapping LING 572 Fei Xia 1/31/06.
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning (2), Tree and Forest
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
by B. Zadrozny and C. Elkan
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Biostatistics IV An introduction to bootstrap. 2 Getting something from nothing? In Rudolph Erich Raspe's tale, Baron Munchausen had, in one of his many.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Case Selection and Resampling Lucila Ohno-Machado HST951.
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bagging and Boosting Cross-Validation ML.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
COMP61011 : Machine Learning Ensemble Models
A “Holy Grail” of Machine Learing
Introduction to Data Mining, 2nd Edition
System Combination LING 572 Fei Xia 01/31/06.
Section 7.7 Introduction to Inference
Multiple Decision Trees ISQS7342
Bootstrapping Jackknifing
Ensemble learning.
Model Combination.
Presentation transcript:

Bagging LING 572 Fei Xia 1/24/06

Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results? One solution: generating and combining multiple predictors –Bagging: bootstrap aggregating –Boosting –…–…

Outline An introduction to the bootstrap Bagging: basic concepts (Breiman, 1996) Case study: bagging a treebank parser (Henderson and Brill, ANLP 2000)

Introduction to bootstrap

Motivation What’s the average price of house prices? From F, get a sample x=(x 1, x 2, …, x n ), and calculate the average u. Question: how reliable is u? What’s the standard error of u? what’s the confidence interval?

Solutions One possibility: get several samples from F. Problem: it is impossible (or too expensive) to get multiple samples. Solution: bootstrap

The general bootstrap algorithm Let the original sample be L=(x 1,x 2,…,x n ) Repeat B time: –Generate a sample L k of size n from L by sampling with replacement. –Compute for x*.  Now we end up with bootstrap values Use these values for calculating all the quantities of interest (e.g., standard deviation, confidence intervals)

An example X=(3.12, 0, 1.57, 19.67, 0.22, 2.20) Mean=4.46 X1=(1.57,0.22,19.67, 0,0,2.2,3.12) Mean=4.13 X2=(0, 2.20, 2.20, 2.20, 19.67, 1.57) Mean=4.64 X3=(0.22, 3.12,1.57, 3.12, 2.20, 0.22) Mean=1.74

A quick view of bootstrapping Introduced by Bradley Efron in 1979 Named from the phrase “to pull oneself up by one’s bootstraps”, which is widely believed to come from “the Adventures of Baron Munchausen”. Popularized in 1980s due to the introduction of computers in statistical practice. It has a strong mathematical background. It is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters.

Bootstrap distribution The bootstrap does not replace or add to the original data. We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data.

Sampling distribution vs. bootstrap distribution The population: certain unknown quantities of interest (e.g., mean) Multiple samples  sampling distribution Bootstrapping: –One original sample  B bootstrap samples –B bootstrap samples  bootstrap distribution

Bootstrap distributions usually approximate the shape, spread, and bias of the actual sampling distribution. Bootstrap distributions are centered at the value of the statistic from the original sample plus any bias. The sampling distribution is centered at the value of the parameter in the population, plus any bias.

Cases where bootstrap does not apply Small data sets: the original sample is not a good approximation of the population Dirty data: outliers add variability in our estimates. Dependence structures (e.g., time series, spatial problems): Bootstrap is based on the assumption of independence. …

How many bootstrap samples are needed? Choice of B depends on Computer availability Type of the problem: standard errors, confidence intervals, … Complexity of the problem

Resampling methods Boostrap Permutation tests Jackknife: we ignore one observation at each time …

Bagging: basic concepts

Bagging Introduced by Breiman (1996) “Bagging” stands for “bootstrap aggregating”. It is an ensemble method: a method of combining multiple predictors.

Predictors Let L be a training set {(x i, y i ) | x i in X, y i in Y}, drawn from the set Λ of possible training sets. A predictor Φ: X  Y is a function that for any given x, it produces y=Φ(x). A learning algorithm Ψ: Λ  that given any L in Λ, it produces a predictor Φ=Ψ(L) in. Types of predictors: –Classifiers: DTs, DLs, TBLs, … –Estimators: Regression trees –Others: parsers

Bagging algorithm Let the original training data be L Repeat B times: –Get a bootstrap sample L k from L. –Train a predictor using L k. Combine B predictors by –Voting (for classification problem) –Averaging (for estimation problem) –…–…

Bagging decision trees 1. Splitting the data set into training set T1 and test set T2. 2. Bagging using 50 bootstrap samples. 3. Repeat Steps times, and calculate average test set misclassification rate.

Bagging regression trees Bagging with 25 bootstrap samples. Repeat 100 times.

How many bootstrap samples are needed? Bagging decision trees for the waveform task: Unbagged rate is 29.0%. We are getting most of the improvement using only 10 bootstrap samples.

Bagging k-nearest neighbor classifiers 100 bootstrap samples. 100 iterations. Bagging does not help.

Experiment results Bagging works well for “unstable” learning algorithms. Bagging can slightly degrade the performance of “stable” learning algorithms.

Learning algorithms Unstable learning algorithms: small changes in the training set result in large changes in predictions. –Neural network –Decision tree –Regression tree –Subset selection in linear regression Stable learning algorithms: –K-nearest neighbors

Case study

Experiment settings Henderson and Brill ANLP-2000 paper Parser: Collins’s Model 2 (1997) Training data: sections Test data: Section 23 Bagging: Different ways of combining parsing results

Techniques for combining parsers (Henderson and Brill, EMNLP-1999) Parse hybridization: combining the substructures of the input parses –Constituent voting –Naïve Bayes Parser switching: selecting one of the input parses –Similarity switching –Naïve Bayes

Experiment results Baseline (no bagging): Initial (one bag): Final (15 bags): 89.17

Training corpus size effects

Summary Bootstrap is a resampling method. Bagging is directly related to bootstrap. –It uses bootstrap samples to train multiple predictors. –Output of predictors are combined by voting or other methods. Experiment results: –It is effective for unstable learning methods. –It does not help stable learning methods.

Uncovered issues How to determine whether a learning method is stable or unstable? Why bagging works for unstable algorithms?