Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Fast Algorithms For Hierarchical Range Histogram Constructions

Data Mining Classification: Alternative Techniques

Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.

Lecture 3 Nonparametric density estimation and classification

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.

Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Sparse vs. Ensemble Approaches to Supervised Learning

Data Mining Techniques Outline

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Ensemble Learning: An Introduction

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

A Multiresolution Symbolic Representation of Time Series

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

Ensemble Learning (2), Tree and Forest

1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Data mining and machine learning A brief introduction.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

CLASSIFICATION: Ensemble Methods

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:

Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.

An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

A Time Series Representation Framework Based on Learned Patterns

Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Semi-Supervised Clustering

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Trees, bagging, boosting, and stacking

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Supervised Time Series Pattern Discovery through Local Importance

COMP61011 : Machine Learning Ensemble Models

Basic machine learning background with Python scikit-learn

Neural networks (3) Regularization Autoencoder

A Time Series Representation Framework Based on Learned Patterns

Data Mining Practical Machine Learning Tools and Techniques

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.

Department of Electrical Engineering

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Ensemble learning.

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Ensemble learning Reminder - Bagging of Trees Random Forest

Neural networks (3) Regularization Autoencoder

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † 11/13/2011 * Arizona State University † Intel Corporation

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 2 Outline  Time series classification Problem definition Literature review Motivation  A Bag-of-Features Framework for Time Series Classification Approach Algorithm  Computational experiments and results  Conclusions and future work

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 3 Time Series Classification  A supervised learning problem aimed at labeling temporally structured univariate (or multivariate) sequences of certain (or variable) length.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 4 Time Series Classification  Gun Point Problem  OSU Leaf

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Time Series Classification 5

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Literature review  The algorithms proposed for time series classification can be divided into Instance-based methods  predict a test instance based on its similarity to the training instances  Requires a similarity measure (distance measure) Feature-based methods  predict a test instance based on a model built on the feature vectors extracted from a set of instances  Requires feature extraction methods and a prediction model to be built on the extracted features 6

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Literature review Instance-based methods  Nearest neighbor classifiers Euclidean distance  Fast but can not handle shift in time Dynamic time warping (DTW)  Strong solution known for time series problems in a variety of domains [1]  Shapelets [2,3] finds local patterns in a time series that are highly predictive of a class based on a distance measure 7 Heraldic shields

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 8  Global information is compact representation of the series (such as mean/variance of the time series), they are often too rigid to represent time series  Local patterns define the class They may shift over time Relation between certain patterns may be important The cardinality of the local feature set may vary. Literature review Feature-based methods

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Literature review Feature-based methods  Most of the approaches extract features of intervals and build a prediction model Use of knots of a piecewise linear approximation of the series as features [4] Feature extraction through genetic algorithm (finding representative features), SVM to classify based on extracted features [5] Neural networks [6] Decision trees [7] 9

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Motivation 10  Although dynamic time warping (DTW) provides strong results, it is not suitable for real time applications [3] DTW has a time complexity of O(n 2 )  the complexity reduces to O(n) using a lower bound (LB_Keogh [8]) where n is the length of the time series This complexity is for comparing only two series (train to test). DTW distance of a test data should be compared to every training data (or at least some of them) which increases the computation time significantly.  Disadvantage of instance based classifier

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Motivation  Feature-based approaches assume that patterns exist in the same time interval over the instances, but a pattern that defines a certain class may exist anywhere in time. 11 Each time series is standardized

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Motivation  DTW attempts to compensate for possible time translations between features, but with long time series, relatively short features of interest, and moderate noise, the capability for DTW is degraded.  The discontinuity of certain patterns in time is another problem which affects performance of DTW. 12

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification 13

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification  Bag of features is also referred as Bag of words in document analysis Bag of instances in multiple instance learning (MIL) Bag of frames in audio and speech recognition  Three main steps local feature extraction codebook generation  Similarity based  Unsupervised  Supervised classification from the codebook 14

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Local feature extraction  A subsequence s is generated randomly a lower bound on the subsequence length is set. It is a factor of the length of the time series (M).  After partitioning subsequence to intervals, interval features are extracted (mean, variance, slope). Number of intervals (d) must be the same for all subsequences (the cardinality of the feature set must be the same), however the subsequence lengths are random.  Random length intervals are used.  We set a minimum interval length so that extracted features are valid. (i.e mean value of single time point is not meaningful)  The number of intervals to partition a subsequence is set as  Number of subsequences generated for each series Maximum number of intervals – number of intervals of a subsequence 15

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Local feature extraction 16

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Codebook generation  After local feature extraction for each time series, a new dataset is generated where each subsequence from each time series becomes an instance. the class label defined for each instance is the class of the corresponding time series  We use a classifier that generates a class probability estimate for each instance (subsequence)  The estimate provides information on the strength of an assignment.  Codebook consists of Frequency of predicted subsequence classes for each time series Histogram of the class probability estimates for each class  number of bins of equal probability for generation is a parameter (b) 17

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Codebook generation 18

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Classification  Global features can supplement the codebook  A classifier is built on the codebook and the global features 19

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte TSBF  Random Forest (RF) [9] is an ensemble of N decision trees Each tree is constructed using a different bootstrap sample from the original data About one-third of the cases are left out of the bootstrap sample and not used in the construction of the single tree. (out-of-bag (OOB) samples) At each node of each tree, a RF considers the best split based on only a random sample of features. The random selection reduces the variance of the classifier, and also reduces the computational complexity  Random Forest (RF) is used as a classifier in our study. It is fast (parallel implementation is even faster) It is inherently multiclass (i.e. unlike SVM) It can handle missing values It provides class probability estimates based on OOB samples  The estimates computed from OOB predictions are good estimates of the generalization error  Other classifiers that generates similar information may have problems with overfitting. 20

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Computational experiments and results  Parameters  20 datasets from UCR time series database 21

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte  Some of the parameters are fixed based on the OOB error rates (number of bins, number of trees) Based on the training data (not after seeing the results on test data) 22 Computational experiments and results

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte  TSBF is implemented in both R and C.  We compare our algorithm to nearest neighbors (NN) classifiers with DTW. Two versions of DTW are considered:  NNBestDTW searches for the best warping window, based on the training data, then uses the learned window on the test data.  NNDTWNoWin has no warping window.  NNBestDTW is known to be a strong solution for time series classification  We replicate TSBF ten times because of its random nature and report average error rate on test data over ten replications 23 Computational experiments and results

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 24 Computational experiments and results

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 25 Computational experiments and results

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 26 Computational experiments and results

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 27 Computational experiments and results *Windows 7 system with 8 GB RAM, dual core CPU (i7-3620M 2.7 GHz)

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Conclusion and Future Work  We present an approach that Can handle local information better Is faster than instance based classifiers Allows for integration of local information Performs better than competitive methods Provides insight about the problem 28

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte Conclusion and Future Work  Knowledge extraction from random forests find patterns that are unique for each class  Variable importance measures from RF  Current approach works on series of the same length Modification of local feature extraction for series of variable length  Extending to multivariate time series classification problems Straightforward 29

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte THANKS, Questions? This research was partially supported by ONR grant N The code of TSBF and the datasets are provided in The paper will be available soon after submission. 30

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte References 31

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte References (continued) 32

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte 33