Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.
Data Mining and Machine Learning
Boosting Rong Jin.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Data Mining Classification: Alternative Techniques
CMPUT 466/551 Principal Source: CMU
Longin Jan Latecki Temple University
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Sparse vs. Ensemble Approaches to Supervised Learning
Ensemble Learning what is an ensemble? why use an ensemble?
Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.
A Brief Introduction to Adaboost
Ensemble Learning: An Introduction
Adaboost and its application
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Examples of Ensemble Methods
Machine Learning: Ensemble Methods
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
CS 391L: Machine Learning: Ensembles
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.
Class Imbalance in Text Classification
Konstantina Christakopoulou Liang Zeng Group G21
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Data Mining Practical Machine Learning Tools and Techniques
Reading: R. Schapire, A brief introduction to boosting
Chapter 13 – Ensembles and Uplift
Trees, bagging, boosting, and stacking
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
A “Holy Grail” of Machine Learing
Data Mining Practical Machine Learning Tools and Techniques
Genetic Algorithms (GA)
Introduction to Data Mining, 2nd Edition
Classification of class-imbalanced data
Ensembles.
Ensemble learning.
Model Combination.
Ensemble learning Reminder - Bagging of Trees Random Forest
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University

 Ensemble methods  Use a combination of models to increase accuracy  Combine a series of k learned models, M1, M2, …, Mk, with the aim of creating an improved model M*  Popular ensemble methods  Bagging: averaging the prediction over a collection of classifiers  Boosting: weighted vote with a collection of classifiers  Ensemble: combining a set of heterogeneous classifiers 2

Bagging: Boostrap Aggregation  Analogy: Diagnosis based on multiple doctors’ majority vote  Training  Given a set D of d instances, at each iteration i, a training set D i of d instances is sampled with replacement from D (i.e., bootstrap sampling)  A classifier model M i is learned for each training set D i  Classification: classify an unknown sample X  Each classifier M i returns its class prediction  The bagged classifier M* counts the votes and assigns the class with the most votes to X  Prediction:  Can be applied to the prediction of continuous values by taking the average value of each prediction for a given test tuple  Accuracy  Often significantly better than a single classifier derived from D  For noise data: not considerably worse, more robust 3

Boosting  Analogy: Consult several doctors, based on a combination of weighted diagnoses—weight assigned based on the previous diagnosis accuracy  How boosting works?  Weights are assigned to each training tuple  A series of k classifiers is iteratively learned  After a classifier M i is learned, the weights are updated to allow the subsequent classifier, M i+1, to pay more attention to the training tuples that were misclassified by M i  The final M* combines the votes of each individual classifier, where the weight of each classifier's vote is a function of its accuracy  Boosting algorithm can be extended for numeric prediction  Compared to bagging: Boosting tends to have greater accuracy, but it also risks overfitting the model to misclassified data 4

5 Adaboost (Freund and Schapire, 1997)  Given a set of d class-labeled instances: (X 1, y 1 ), …, (X d, y d )  Initially, all the weights of instances are set the same (1/d)  Generate k classifiers in k rounds. At round i,  D is sampled (with replacement) to form a training set D i of the same size  Each tuple’s chance of being selected is based on its weight  A classification model M i is derived from D i  Its error rate is calculated using D i as a test set  If an instance is misclassified, its weight is increased, o.w. it is decreased  Classifier M i error rate is the weighted sum of the misclassified instance errors:  The weight of classifier M i ’s vote is misclassification error of instance X j.

Random Forest (Breiman 2001)  Random Forest:  Each classifier in the ensemble is a decision tree classifier and is generated using a random selection of attributes at each node to determine the split  During classification, each tree votes and the most popular class is returned  Two Methods to construct Random Forest:  Forest-RI (random input selection): Randomly select, at each node, F attributes as candidates for the split at the node. The CART methodology is used to grow the trees to maximum size  Forest-RC (random linear combinations): Creates new attributes (or features) that are a linear combination of the existing attributes (reduces the correlation between individual classifiers)  Comparable in accuracy to Adaboost, but more robust to errors and outliers  Insensitive to the number of attributes selected for consideration at each split, and faster than bagging or boosting 6

Random Forest  Features and Advantages  One of the most accurate learning algorithms available for most data sets  Fairly efficiently on large data sets; can be parallelized  Can handle lots of variables without variable deletion  It gives estimates of what variables are important in the classification  It has methods for balancing error in class population unbalanced data sets  Generated forests can be saved for future use on other data  Disadvantages  For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels  Variable importance scores from random forest are not reliable for this type of data  Random forests have been observed to overfit for some datasets with noisy classification/regression tasks  large number of trees may make the algorithm slow for real-time prediction

Class-Imbalanced Data Sets  Class-imbalance problem: Rare positive example but numerous negative ones, e.g., medical diagnosis, fraud, oil- spill, fault, etc.  Traditional methods assume a balanced distribution of classes and equal error costs: not suitable for class- imbalanced data  Typical methods for imbalance data in 2-class classification:  Oversampling: re-sampling of data from positive class  Under-sampling: randomly eliminate tuples from negative class  Threshold-moving: moves the decision threshold, t, so that the rare class tuples are easier to classify, and hence, less chance of costly false negative errors  Ensemble techniques: Ensemble multiple classifiers 8