When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.

Slides:

Advertisements

Similar presentations

Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey

Advertisements

Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works.

Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.

Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Kun Zhang,

A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.

On Sample Selection Bias and Its Efficient Correction via Model Averaging and Unlabeled Examples Wei Fan Ian Davidson.

On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.

An Improved Categorization of Classifiers Sensitivity on Sample Selection Bias Wei Fan Ian Davidson Bianca Zadrozny Philip S. Yu.

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.

Is Random Model Better? -On its accuracy and efficiency-

Decision Tree Evolution using Limited number of Labeled Data Items from Drifting Data Streams Wei Fan 1, Yi-an Huang 2, and Philip S. Yu 1 1 IBM T.J.Watson.

Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun.

ReverseTesting: An Efficient Framework to Select Amongst Classifiers under Sample Selection Bias Wei Fan IBM T.J.Watson Ian Davidson SUNY Albany.

Ensemble Learning – Bagging, Boosting, and Stacking, and other topics

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Data Mining Classification: Alternative Techniques

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.

Ensemble Learning what is an ensemble? why use an ensemble?

Ensemble Learning: An Introduction

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Data mining and statistical learning - lecture 13 Separating hyperplane.

Ensemble Learning (2), Tree and Forest

For Better Accuracy Eick: Ensemble Learning

Machine Learning CS 165B Spring 2012

Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.

WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.

Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Ensemble Methods in Machine Learning

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

COMP24111: Machine Learning Ensemble Models Gavin Brown

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Trees, bagging, boosting, and stacking

Chapter 3.1 Probability Students will learn several ways to model situations involving probability, such as tree diagrams and area models. They will.

A “Holy Grail” of Machine Learing

Bayesian Averaging of Classifiers and the Overfitting Problem

Introduction to Data Mining, 2nd Edition

Statistical Learning Dong Liu Dept. EEIS, USTC.

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

Mathematical Foundations of BME

Presentation transcript:

When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson

Ensemble Techniques Techniques such as boosting and bagging are methods of combining models. Used extensively in ML and DM seems to work well in a large variety of situations. But model averaging is the correct Bayesian method of using multiple models. Does model averaging have a place in ML and DM?

What is Model Averaging? Posterior weighting Class Probability Integration Over Model Space Averaging of class probabilities weighted by posterior Removes model uncertainty by averaging Prohibitive for large model spaces such as decision trees

Efficient Model Averaging: PBMA and Random DT PBMA (Davidson 04): parametric bootstrap model averaging –Use parametric model to generate multiple bootstraps computed from a single training set. Random Decision Tree (Fan et al 03) –Construct each trees structure randomly Categorical feature used once in a decision path Random threshold for continuous features. –Leaf node statistics estimated from data. –Average probability of multiple trees.

Our Empirical Study Idea: When model uncertainty occurs, model averaging should perform well Four specific but common situations when factoring in model uncertainty is beneficial –Class label noise –Many label problem –Sample selection bias –Small data sets

Class Label Noise Randomly flip 10% of labels

Data Set with Many Classes

Biased Training Sets See ICDM 2005 for a formal analysis See KDD 2006 to look at estimating accuracy See ICDM 2006 for a case study

Universe of Examples Two classes: red and green red: f2>f1 green: f2<=f1

Unbiased and Biased Samples

Single Decision Tree Unbiased 97.1%Biased 92.1%

Random Decision Tree Unbiased 96.9%Biased 95.9%

Bagging Unbiased 97.82%Biased 93.52%

PBMA Unbiased 99.08%Biased 94.55

Boosting Unbiased %Biased 92.7%

Scope of This Paper Identifies conditions where model averaging should outperform bagging and boosting. Empirically verifies these claims. Other questions: –Why does bagging and boosting perform badly in these conditions?