Presentation on theme: "The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)"— Presentation transcript:
The Problem of Concept Drift: Definitions and Related Work Alexev Tsymbalo paper. (April 29, 2004)
Abstract A. Tsymbal, “The problem of concept drift: definitions and related work”, Available here.here Real World Problem Concepts are often not stable but change with time. –Weather Prediction –Customers’ Preference The underlying data distribution may change with time.
Definitions and Peculiarities Concept Drift –Changes in the hidden context that can induce more or less radical changes in the target concept. The cause of the change is hidden and not known a priori. –Such as an effect of a car accident on a yearly budget. Often Reoccur –Weather patterns such as El Nino and La Nina. Hidden Context –A dependency not given explicitly in the form of predictive features.
An Ideal Concept Drift Handling System –Quickly adapts to concept drift. –Is robust to noise and distinguishes it from concept drift. –Recognizes and reacts to reoccurring contexts. Such as seasonal differences.
Types of Concept Drift There are two kinds of concept drift –Sudden (abrupt, instantaneous) –Gradual Moderate Slow Hidden changes can change the target concept, but may also cause a change of the underlying data distribution. –Such as a week of record warm temperatures.
Virtual Concept Drift –The necessity in the change of current model due to the change of data distribution. Sampling Shift Real Concept Drift –Concept Shift Virtual concept drift often occurs with real concept drift.
Systems for Handling Concept Drift Three main approaches –Instance Selection –Instance Weighting –Ensemble Learning (learning with multiple concept descriptions)
Systems for Handling Concept Drift (Instance Selection) The goal is to select instances relevant to the current concept. Usually generalized via a window that moves over recently arrived instances and uses the learnt concepts for prediction only in the immediate future. –The window size can be fixed or heuristically determined (Adaptive).
Systems for Handling Concept Drift (Instance Selection) Case-based editing strategies in case- based reasoning that delete noise, irrelevant cases, and redundant cases are also considered instance selection.
Systems for Handling Concept Drift (Instance Weighting) Uses the ability of some learning algorithms such to process weighted instances –Support Vector Machines Weighting by: –Age –Relevance to the current concept. Instance weighting handles concept drift worse than analogous instance selection techniques. –Likely due to data overfitting.
Systems for Handling Concept Drift (Ensemble Learning) Maintains a set of: –concept descriptions –predictions of which are combined using voting or weighted voting –most relevant description Complicated concept descriptions are produced iteratively using feature construction (according to relevance).
All incremental ensemble approaches use some criteria to dynamically delete, reactivate, or create new ensemble members, which are normally based on the base models’ consistency with the current data.
Base Learning Algorithms Rule-Based Learning Decision trees –Including incremental decision trees Naïve Bayes SVMs Radial Basis Functions – networks Instance-Based Learning
Global Eager Learners –Unable to adapt to local concept drift Concept drift is often local –Record highs temps in a part of the world doesn’t necessarily mean that temps around the globe are higher. Local Lazy Learning –able to adapt well to local concept drift due to its nature. –Performs well with disjoint concepts. –Easy to update (Case-Based Learners). –Allows easy sharing of knowledge for some problems. Easier to maintain multiple distributed case-bases.
Common Testing Datasets STAGGER & Moving Hyper-plane –Allow controlling the type and rate of concept drift context recurrence presence of noise irrelevant attributes –Disallow Checking Scalability
Real-World Test Problems –Flight simulator data –Web page access data –Text Retrieval Conference (TREC) –Credit card fraud data –Breast cancer –Anonymous web browsing –US Census Bureau data –Email data Unfortunately most real-world data sets contain little concept drift.
Theoretical Results A maximal frequency of concept changes (rate of drift) that is acceptable by any learner, implies a lower bound for the size of a window of drifting concepts to be learnable. It is sufficient for a learner to see a fixed number of the most recent instance. Large window sizes in the theoretical bounds would be impractical to employ.
Incremental (Online) Learning vs. Batch Learning Most of the algorithms for handling concept drift consider incremental (online) learning environments as opposed to batch learning. –Because real life data often needs to be processed in an online manner. Data Streams := incremental learning Databases := batch learning
Criteria for Updating the Current Model Many algorithms for handling concept drift employ regular model updates while new data arrive. –Can be very costly An alternative is to detect changes and adapt the model only if inevitable. –Based on the average confidence in correct prediction of the model on new instances –Observes the fraction of instances for which the confidence is below a given threshold.
Cased-Based Criteria –Problem-solution regularity –Problem-distribution regularity May be good measures of quality of a case-base –Real-World: Not easy to apply these measures as triggers for model updating because the drift rate and the level of noise may vary drastically with time.
Conclusions Two kinds of concept drift –Real Hidden Contexts –Virtual Data Distribution Three Basic approaches –Instance Selection –Instance Weighting –Ensemble learning
There are problems with most of the real-world datasets. –These data sets contain little concept drift or contain concept drift that is introduced artificially. Criteria needs to be developed for detecting crucial changes that allow adapting the model only if inevitable. –Triggers are not robust enough to differentiate types of concept drift and different levels of noise.