Supervised Time Series Pattern Discovery through Local Importance

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Random Forest Predrag Radenković 3237/10
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Mining Time Series.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Sparse vs. Ensemble Approaches to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Jessica Lin, Eamonn Keogh, Stefano Loardi
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
Using Relevance Feedback in Multimedia Databases
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Mining Time Series.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Image Classification for Automatic Annotation
Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Data Mining and Decision Support
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Math 285 Project Diffusion Maps Xiaoyan Chong Department of Mathematics and Statistics San Jose State University.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
A Time Series Representation Framework Based on Learned Patterns
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Mining Data Streams with Periodically changing Distributions Yingying Tao, Tamer Ozsu CIKM’09 Supervisor Dr Koh Speaker Nonhlanhla Shongwe April 26,
Machine Learning with Spark MLlib
SNS COLLEGE OF TECHNOLOGY
Bag-of-Visual-Words Based Feature Extraction
Artificial Intelligence for Speech Recognition
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Basic machine learning background with Python scikit-learn
Machine Learning Basics
A Time Series Representation Framework Based on Learned Patterns
Robust Similarity Measures for Mobile Object Trajectories
Data Mining Practical Machine Learning Tools and Techniques
Classification Nearest Neighbor
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Brief Review of Recognition + Context
COSC 4335: Other Classification Techniques
Creating Data Representations
Department of Electrical Engineering
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Supervised Time Series Pattern Discovery through Local Importance Mustafa Gokce Baydogan* George Runger* Eugene Tuv† * Arizona State University † Intel Corporation 10/14/2012 INFORMS Annual Meeting 2012, Phoenix

Outline Time series classification Problem definition Motivation Supervised Time Series Pattern Discovery through Local Importance (TS-PD) Computational experiments and results Conclusions and future work

Time Series Classification Time series classification is a supervised learning problem The input consists of a set of training examples and associated class labels, Each example is formed by one or more time series Predict the class of the new (test) series

Motivations People measure things, and things (with rare exceptions) change over time Time series are everywhere ECG Heartbeat Stock

Motivations Other types of data can be converted to time series. Everything is about the representation. Example: Recognizing words An example word “Alexandria” from the dataset of word profiles for George Washington's manuscripts. A word can be represented by two time series created by moving over and under the word Images from E. Keogh. A quick tour of the datasets for VLDB 2008. In VLDB, 2008.

Challenges How can we handle the warping in time series? Observed 4 peaks are related to certain event in the manufacturing process Indication of a problem Time of the peaks may change (two peaks are observed earlier for blue series) Problem occurred over a shorter time interval TRANSLATION DILATION

Approaches Instance-based methods Feature-based methods Predict based on the similarity to the training time series Requires a similarity measure (distance measure) Euclidean distance …. Dynamic Time Warping (DTW) distance is known to be strong solution [1] Handles translations and dilations by matching observations Feature-based methods Predict a test instance based on a model trained on extracted feature vectors Requires feature extraction methods and a supervised learner (i.e. decision tree, support vector machine, etc.) to be trained on the extracted features

Instance-based methods Advantages Accurate Not requiring setting of many parameters Disadvantages May not be suitable for real time applications [3] DTW has a time complexity of O(n) using a lower bound (LB_Keogh [8]) (it is a variation of shortest path problem) n is the length of the time series Not scalable with large number of training samples and variables, No model, each test series is compared to all (or some) training series Requires storage of the training time series Not suitable for resource limited environments (i.e. sensors) Performance degrades with long time series and short features of interest

Feature-based methods Time series are represented by the features generated. Shape-based features Mean, variance, slope … Wavelet features Coefficients … … Global features provide a compact representation of the series (such as global mean/variance) Local features are important Features from time series segments (intervals) mean

Feature-based methods Advantages Fast Robust to noise Fusion of domain knowledge Features specific to domain i.e. Linear predictive coding (LPC) features for speech recognition Disadvantages Problems in handling warping Cardinality of the feature set may vary

Time Series Pattern Discovery through Local Importance (TS-PD) Identifying the region of time series important to classification is required for Interpretability Good classification with appropriate approaches (matching the patterns) Local importance is a measure that evaluates the potential descriptiveness of certain segment (interval) of the time series

TS-PD Local Importance Time series are represented by the interval features A tree-based ensemble is trained on this representation (Random Forest) -> RFint Any features can be added to representation Currently shape-based Application specific? A permutation-based approach to evaluate the descriptiveness of each interval (based on the out-of-bagging idea) Mean Different scales Variance Slope

TS-PD Local Importance Test on permuted OOB samples Train Test Let time series 1 be of class 1 Local importance is defined =

Local Importance

TS-PD Distance-based features Find the important intervals for each time series Sample intervals from these intervals (regions) Search for similarity over all time series for each specific region (Euclidean distance in our case) Use the minimum distance of a pattern to the time series as a feature for classification

TS-PD Classification In the feature set Each row is a time series Each column is a pattern The entries are the distance of the region of the time series that is the most similar to the pattern Basically, a kernel based on the distances to the patterns A tree-based ensemble is trained on this feature set (Random Forest) -> RFts Scalable Variable importance measure

TS-PD Interpretability Variable importance [9] enables interpretability Find the most important features from RF Visualize

TS-PD Experiments 43 datasets from UCR database

TS-PD Experiments Parameters Interval length and sliding window Set small enough that probability of missing a pattern is decreased. Number of locally important intervals to be used as reference pattern Depends on the dataset characteristics If features of interest is long, larger setting preferred Interval length also affects RF is not affected by this setting if set large enough because of the embedded feature selection Irrelevant patterns are easily identified Correlated patterns are handled by building tree on random feature subspaces Number of trees for both RF, RFint and RFts This can be easily set based on the OOB error rates If there is no concern about the computation time, larger setting is preferred 6 and 3 time units 10 intervals 2000 trees

TS-PD Experiments Two types of NN classifiers with DTW NNDTWNoWin NNBestDTW searches for the best warping window, based on the training data

TS-PD Example Extending TS-PD to MTS classification Gesture recognition task [12] Acceleration of hand on x, y and z axis Classify gestures (8 different types of gestures)

TS-PD Example

TS-PD Conclusion TS-PD identifies regions of interests Provides a visualization tool for understanding underlying relations Fast approach to detect the local information related to the classification Handles the warping partially Handles translations Dilations? Distance based features do not guarantee Provides a kernel based on local distances Interpretable and provides fast classification results For reproducibility of the results, the code of TS-PD is available on http://www.mustafabaydogan.com/supervised-time-series-pattern-discovery-through-local-importance-tspd.html

Questions and Comments? Thanks! Questions and Comments?

References

References (continued)

References (continued)