Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervised Time Series Pattern Discovery through Local Importance

Similar presentations


Presentation on theme: "Supervised Time Series Pattern Discovery through Local Importance"— Presentation transcript:

1 Supervised Time Series Pattern Discovery through Local Importance
Mustafa Gokce Baydogan* George Runger* Eugene Tuv† * Arizona State University † Intel Corporation 10/14/2012 INFORMS Annual Meeting 2012, Phoenix

2 Outline Time series classification
Problem definition Motivation Supervised Time Series Pattern Discovery through Local Importance (TS-PD) Computational experiments and results Conclusions and future work

3 Time Series Classification
Time series classification is a supervised learning problem The input consists of a set of training examples and associated class labels, Each example is formed by one or more time series Predict the class of the new (test) series

4 Motivations People measure things, and things (with rare exceptions) change over time Time series are everywhere ECG Heartbeat Stock

5 Motivations Other types of data can be converted to time series.
Everything is about the representation. Example: Recognizing words An example word “Alexandria” from the dataset of word profiles for George Washington's manuscripts. A word can be represented by two time series created by moving over and under the word Images from E. Keogh. A quick tour of the datasets for VLDB In VLDB, 2008.

6 Challenges How can we handle the warping in time series?
Observed 4 peaks are related to certain event in the manufacturing process Indication of a problem Time of the peaks may change (two peaks are observed earlier for blue series) Problem occurred over a shorter time interval TRANSLATION DILATION

7 Approaches Instance-based methods Feature-based methods
Predict based on the similarity to the training time series Requires a similarity measure (distance measure) Euclidean distance …. Dynamic Time Warping (DTW) distance is known to be strong solution [1] Handles translations and dilations by matching observations Feature-based methods Predict a test instance based on a model trained on extracted feature vectors Requires feature extraction methods and a supervised learner (i.e. decision tree, support vector machine, etc.) to be trained on the extracted features

8 Instance-based methods
Advantages Accurate Not requiring setting of many parameters Disadvantages May not be suitable for real time applications [3] DTW has a time complexity of O(n) using a lower bound (LB_Keogh [8]) (it is a variation of shortest path problem) n is the length of the time series Not scalable with large number of training samples and variables, No model, each test series is compared to all (or some) training series Requires storage of the training time series Not suitable for resource limited environments (i.e. sensors) Performance degrades with long time series and short features of interest

9 Feature-based methods
Time series are represented by the features generated. Shape-based features Mean, variance, slope … Wavelet features Coefficients … Global features provide a compact representation of the series (such as global mean/variance) Local features are important Features from time series segments (intervals) mean

10 Feature-based methods
Advantages Fast Robust to noise Fusion of domain knowledge Features specific to domain i.e. Linear predictive coding (LPC) features for speech recognition Disadvantages Problems in handling warping Cardinality of the feature set may vary

11 Time Series Pattern Discovery through Local Importance (TS-PD)
Identifying the region of time series important to classification is required for Interpretability Good classification with appropriate approaches (matching the patterns) Local importance is a measure that evaluates the potential descriptiveness of certain segment (interval) of the time series

12 TS-PD Local Importance
Time series are represented by the interval features A tree-based ensemble is trained on this representation (Random Forest) -> RFint Any features can be added to representation Currently shape-based Application specific? A permutation-based approach to evaluate the descriptiveness of each interval (based on the out-of-bagging idea) Mean Different scales Variance Slope

13 TS-PD Local Importance
Test on permuted OOB samples Train Test Let time series 1 be of class 1 Local importance is defined =

14 Local Importance

15 TS-PD Distance-based features
Find the important intervals for each time series Sample intervals from these intervals (regions) Search for similarity over all time series for each specific region (Euclidean distance in our case) Use the minimum distance of a pattern to the time series as a feature for classification

16 TS-PD Classification In the feature set
Each row is a time series Each column is a pattern The entries are the distance of the region of the time series that is the most similar to the pattern Basically, a kernel based on the distances to the patterns A tree-based ensemble is trained on this feature set (Random Forest) -> RFts Scalable Variable importance measure

17 TS-PD Interpretability
Variable importance [9] enables interpretability Find the most important features from RF Visualize

18 TS-PD Experiments 43 datasets from UCR database

19 TS-PD Experiments Parameters Interval length and sliding window
Set small enough that probability of missing a pattern is decreased. Number of locally important intervals to be used as reference pattern Depends on the dataset characteristics If features of interest is long, larger setting preferred Interval length also affects RF is not affected by this setting if set large enough because of the embedded feature selection Irrelevant patterns are easily identified Correlated patterns are handled by building tree on random feature subspaces Number of trees for both RF, RFint and RFts This can be easily set based on the OOB error rates If there is no concern about the computation time, larger setting is preferred 6 and 3 time units 10 intervals 2000 trees

20 TS-PD Experiments Two types of NN classifiers with DTW NNDTWNoWin
NNBestDTW searches for the best warping window, based on the training data

21 TS-PD Example Extending TS-PD to MTS classification
Gesture recognition task [12] Acceleration of hand on x, y and z axis Classify gestures (8 different types of gestures)

22 TS-PD Example

23 TS-PD Conclusion TS-PD identifies regions of interests
Provides a visualization tool for understanding underlying relations Fast approach to detect the local information related to the classification Handles the warping partially Handles translations Dilations? Distance based features do not guarantee Provides a kernel based on local distances Interpretable and provides fast classification results For reproducibility of the results, the code of TS-PD is available on

24 Questions and Comments?
Thanks! Questions and Comments?

25 References

26 References (continued)

27 References (continued)


Download ppt "Supervised Time Series Pattern Discovery through Local Importance"

Similar presentations


Ads by Google