Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014.

Slides:



Advertisements
Similar presentations
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Advertisements

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Random Forest Predrag Radenković 3237/10
The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
1 An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Yan-Nei Law & Carlo Zaniolo University of California, Los Angeles PKDD, Porto,
Ensemble Learning: An Introduction
1 Mining Decision Trees from Data Streams Tong Suk Man Ivy CSIS DB Seminar February 12, 2003.
Machine Learning: Ensemble Methods
Sparse vs. Ensemble Approaches to Supervised Learning
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
For Better Accuracy Eick: Ensemble Learning
Machine Learning CS 165B Spring 2012
Issues with Data Mining
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Boosting Neural Networks Published by Holger Schwenk and Yoshua Benggio Neural Computation, 12(8): , Presented by Yong Li.
Department of Computer Science, University of Waikato, New Zealand Geoffrey Holmes, Bernhard Pfahringer and Richard Kirkby Traditional machine learning.
CS 391L: Machine Learning: Ensembles
Use of web scraping and text mining techniques in the Istat survey on “Information and Communication Technology in enterprises” Giulio Barcaroli(*), Alessandra.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Benk Erika Kelemen Zsolt
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Methods: Bagging and Boosting
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Ensemble Methods in Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
Data Mining and Decision Support
1 January 24, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 7 — Classification Ensemble Learning.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
COMP61011 : Machine Learning Ensemble Models
Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.
ECE 5424: Introduction to Machine Learning
Introduction to Data Mining, 2nd Edition
Ensembles.
Model Combination.
An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Model generalization Brief summary of methods
Decision Trees for Mining Data Streams
Mining Decision Trees from Data Streams
Learning from Data Streams
Evolutionary Ensembles with Negative Correlation Learning
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Classification, Regression and Other Learning Methods CS240B Presentation Peter Huang June 4, 2014

Outline Motivation Introduction to Data Streams and Concept Drift Survey of Ensemble Methods: Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams Summary Conclusion

Motivation Significant amount of research recently has focused on mining data streams Real-world applications include: financial data analysis, credit card fraud, network monitoring, sensor networks, and many others Algorithms for mining data streams have to overcome challenges not seen in traditional data mining, particularly performance and unending data sets Traditional algorithms must be made non-blocking, fast and light, and must adapt to data stream issues

Data Streams A Data Stream is a continuous stream of data items, in the form of tuples or vectors, that arrive at a high rate, and are subject to unknown changes such as concept drift or shift Algorithms that process data streams must be: Iterative – reading data sequentially Efficient – fast and light in computation/memory Single-pass – account for surplus of data Adaptive – account for concept drift Any-time – be able to provide best answer continuously

Data Stream Classification Various type of methods are used to classify data streams Single classifier Sliding window on recent data, fixed or variable Naive Bayes, C4.5, RIPPER Support vector, neural networks K-NN, linear regression Decision Trees BOAT algorithm VFDT, Hoeffding tree CVFDT Ensemble Methods Bagging Boosting Random Forest

Concept Drift Concept drift is an implicit property of data streams Concept may change or drift over time due to sudden or gradual changes of external environment Mining changes one of the core issues of data mining, useful in many real-world applications Two types of concept change: gradual and shift Methods to adapt to concept drift: Ensemble methods, majority or weight voting Exponential Forgetting, forgetting factor Replacement methods, create new classifier

Type of Concept Drift Two types of concept change: gradual and shift Shift: change in mean, class/distribution change Gradual: change in mean and variance, trends

Ensemble Classifiers Ensemble methods is one method of classification that naturally handles concept drift Combines the predictions of multiple base models, each learned using a base learner Known that combining multiple models consistently outperforms individual models Use either traditional averaging or weighted averaging to classify data stream items

Survey of Ensemble Methods Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Approach problem of large-scale or streaming classification by building committee or ensemble classifiers, each constructed on a subset of available data points Basically introduces the concept of ensemble classification Traditional scheme of averaging prediction used Later improved in KDD ’03, KDD ’04, and more

Ensemble of Classifiers Fixed ensemble size, up to around New classifier replaces least quality classifier in existing ensemble Building blocks are decision trees constructed using C4.5 Operational parameter is whether to prune tree or not In experiments, pruning decreased overall accuracy because of over-fitting Adapts to concept drift by changing over time, follows Gaussian-like CDF gradual change

Streaming Ensemble Pseudocode while more data points are available read d points, create training set D build classifier C i using D evaluate C i-1 on D evaluate all classifiers in ensemble E on D if E not full insert C i-1 into E else if Quality(C i-1 ) > Quality(E j ) for some j replace E j with C i-1 Quality is measured by ability to classify points in current test set

Replacement of Existing Classifiers Existing Ensemble of ClassifiersNewly Trained Classifier Average Ensemble Quality: 77.4  80.4 Next Trained ClassifierNew Ensemble of Classifiers

Experimental Results: Adult Data

Experimental Results: SEER Data

Experimental Results: Web Data

Experimental Results: Concept Drift

Survey of Ensemble Methods Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept- Drifting Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers General framework for mining concept-drifting data streams using ensemble of weighted classifiers Basically improves the concept of ensemble classification by adding weighted averaging instead of traditional averaging Weight is reversely proportional to classifiers expected error, or MSE, such that w i = MSR r – MSR i Eliminates the effect of examples representing outdated concepts by assigning lower weight

Ensemble of Classifiers Fixed ensemble size, top K classifiers kept New classifiers replaces less weighted classifiers in existing ensemble Building blocks are decision trees constructed using C4.5 Adapts to concept drift by removing and/or reducing weight of incorrect classifiers

Streaming Ensemble Pseudocode while more data points are available read d points, create training set S build classifier C’ from S compute error rate of C’ via cross-validation on S derive weight w’ for C’, w’ = MSE r – MSE i for each classifier C i in C: apply C i on S to derive MSE i compute weight w i C  top K weight classifiers in C U {C’} return C Quality is measured by ability to classify points in current test set

Data Expiration Problem Identify in a timely manner those data in the training set that are no longer consistent with the current concepts Discards data after they become old, that is, after a fixed period of time T has passed since their arrival If T is large, the training set is likely to contain outdated concepts, which reduces classification accuracy If T is small, the training set may not have enough data, and as a result, the learned model will likely carry a large variance due to over-fitting.

Expiration Problem Illustrated

Replacement of Existing Classifiers X Existing Stream of ClassifiersTrain Example 13 Ensemble of Classifiers Used Newer Classifiers on Right, Numbers Represents MSE Error New Classifier from Train

Experimental Results: Average Error

Experimental Results: Error Rates

Experimental Results: Concept Drift

Survey of Ensemble Methods Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams Novel Adaptive Boosting Ensemble method to solve continuous mining of data stream problem Basically improves the concept of ensemble classification by boosting incorrectly classified samples Weight of incorrect samples is w i = (1 – e j )/e j Traditional scheme of averaging prediction used

Ensemble of Classifiers Fixed ensemble size, recent M classifiers kept Boosting of incorrect sample weight provide a number of formal guarantees on performance Building blocks are decision trees constructed using C4.5 Adapts to concept drift by change detection, starting ensemble from scratch

Streaming Ensemble Pseudocode E b = {C 1,…,C m }, B j = {(x 1,y 1 ),…,(x n,y n )} while more data points are available read n points, create training block B j compute ensemble prediction on each n point i change detection: E b  {} if change detected if E b <> {}: compute error rate of E b on B j set new samples weight w i = (1 – e j )/e j else: w i = 1 learn new classifier C m+1 from B j update Eb  C m+1, remove C 1 if m = M

Change Detection To detect change, check null hypothesis H0 and alternative hypothesis H1 Two-stage method: first check significant test, second check hypothesis test

Replacement of Existing Classifiers Existing Ensemble of ClassifiersNew Classifier 86 Boosted Ensemble Newer Classifiers on Right, Numbers Represents Accuracy Boosted Classifier

Experimental Results: Concept Drift

Experimental Results: Comparison

Experimental Results: Time and Space

Summary Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification Introduced bagging ensemble for data stream Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers Adds weighting to improve accuracy and handle drift Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams Adds boosting to further improve accuracy and speed

Thank You Questions ?

Sources Adams, Niall M., et al. "Efficient Streaming Classification Methods." (2010). Street, W. Nick, and Yong Seog Kim. "A streaming ensemble algorithm (SEA) for large-scale classification." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, Chu, Fang, and Carlo Zaniolo. "Fast and light boosting for adaptive mining of data streams." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg,