Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series.

Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series Shapelets（KDD2014） by：张永辉

Outline Background What is shapelet? Finding the shapelet
Shapelet transform Learning shapelet directly Summary

Background TSC (time series classification)

TSC The simple nearest neighbor algorithm is very difficult to beat for most time series problems Disadvantages of nearest neighbor algorithm High time and space complexity Not interpretable

What is shapelet? Time Series Shapelets: A New Primitive for Data Mining（KDD2009） Shapelets are time series subsequences which are in some sense maximally representative of a class. Advantages: Interpretable More accurate/robust on some datasets Faster at classification

An Example Samples of leaves from two species

An Example Convert shape to “time series” representation

An Example The best shapelet found by algorithm

Why shapelets are effective?
Shapelets are local features, whereas most other state-of-the-art time series/shape classifiers consider global features, which can be brittle to even low levels of noise and distortions. Geurts(PKDD2001)：“it is impossible in practice to consider every such subsignal as a candidate pattern” improvements in CPU performance pruning technique

Finding the shapelet Definition 1: Time Series. A time series T = t1,…,tm is an ordered set of m real-valued variables. Definition 4: Distance between the time series Dist(T, R) (T and R are of the same length) Definition 5: Distance from the time series to the subsequence. SubsequenceDist(T, S) = min(Dist(S, S'))

Finding the shapelet Definition 8: Optimal Split Point (OSP). An Optimal Split Point is a distance threshold that Definition 9: Shapelet. shapelet(D) is a subsequence that, with its corresponding optimal split point,

Finding the shapelet Brute-Force Algorithm

Finding the shapelet Build a decision tree Pruning

Accuracy of the shapelet decision tree Accuracy of the k-NN classifier
Experiments Dataset Accuracy of the shapelet decision tree Accuracy of the k-NN classifier Run time (shapelet VS. k-NN) Projectile Points 80.0% 68.0% 3×103 times faster Historical Documents 89.9% 82.9% 3×104 times faster Gun/NoGun Problem 93.3% 91.3% four times faster Wheat Spectrography 72.6% 44.1% -

The original research embedded the procedure of finding shapelets within a decision tree.
The relatively time-consuming shapelet detection method is called repeatedly. It is a costly operation, especially with multi-class problems. How to take full advantage of existing classification algorithms?

Shapelet transform A Shapelet Transform for Time Series Classification（KDD2012） Describe a means of extracting the k best shapelets from a data set in a single pass, and then use these shapelets to transform data into a new feature space. Increase classification accuracy while reducing training time and maintaining the interpretability of the model.

Alternative Quality Measure
The original paper use information gain to determine the quality of a shapelet. Our concern is not necessarily how well a candidate splits the data. we are concerned with how the distribution of the distances of the alternative classes differ. The utility of the original algorithm degrades with multi-class problems. F-statistic of a fixed effects ANOVA

单因素方差分析方差分析（ANOVA）又称F检验，目的是推断多组资料中的总体均数是否相等。单因素方差分析

Find the k best shapelets

Shapelet transform Length Parameter Approximation Data Transformation
a heuristic rule: not optimal, but an automatic approach Data Transformation Shapelet Selection a heuristic rule: n/2 shapelets 5-fold cross-validation Each attribute corresponds to the distance from each shapelet to the original time series.

Data sets 18 data sets from the UCR time series repository and 8 new data sets provided by us. New data sets: hand x-ray focus on eight specific bones 3 types of class labels infant (0-6 years) junior (7-12 years) teen (13-18 years)

Experiments(Embedded vs. Transformed)

Experiments(Shapelet Transformation Classifiers)
1 Shapelet Tree 2 C4.5 3 1NN 4 Naıve Bayes 5 Bayesian Network 6 Random Forest 7 Rotation Forest 8 SVM (linear)

Experiments(Other Classifiers)
This offers promising support for shapelet-based approaches, suggesting that they fill a classification niche that has not been covered in the literature.

Can we learn top-K shapelets directly without the need to try out lots of candidates?

Learning shapelet directly
Learning Time-Series Shapelets（KDD2014） Learn near-to-optimal shapelets directly without the need to try out lots of candidates. Learn true top-K shapelets by capturing their interaction.

Method Two steps: (i) Start with rough initial guesses for the shapelets (ii) Iteratively learn/optimize the shapelets by minimizing a classification loss function.

Objective Function For the sake of simplicity, the targets Y ∈ {0, 1} and the shapelet length is L(fixed). Learning Model Transformed shapelet representation (distance from i-th time series to k-th shapelet) weights

Objective Function Loss Function Regularized Objective Function
The idea of this paper is to jointly learn the optimal shapelets S and the optimal linear hyper-plane W that minimize the classification objective F.

Differentiable Soft-Minimum Function
The distance between shapelet and series is not differentiable.

Differentiable Soft-Minimum Function

Stochastic gradient descent

Comparison to State of the Art
Learning Near-To-Optimal Shapelets The baselines cannot explore candidates which do not appear literally as segments. minimizing the classification objective through candidate guesses has no guarantee of optimality. Capturing Interactions Among Shapelets The baselines find the score of each shapelet independently.

Interactions among Shapelets
Interactions among shapelets can become a game changing factor.

Learning general shapelets
Multi-Class Problem Shapelets having Various Lengths Generalized Objective Function Generalized Soft-Minimum

Weaker Aspects Relies on more hyper-parameters.
learning rate η regularization parameter λW soft-min precision α The interpretability of shapelets is weakened to some extent.

Experiments Baselines: Experiments(run time) next page
Shapelet Tree Methods (IG, KW, FST, MM) Basic Classifiers (1NN, NB, C4.5) More Complex Classifiers (BN, RAF, ROF, SVM) Other Related Methods (FSH, DTW) Experiments(run time) next page

Experiments (accuracy)

Summary One of the most promising recent approaches in the time-series domain. 改进算法效率提高分类精度扩展到多维时间序列应用

研究进展使用Shapelet对社团演化序列进行挖掘已进行的工作预测（消失、缩小/增大、稳定、分裂）找出有意义（可解释）的子序列
生成更细粒度的社团演化序列（月周）实现了F-Stat（KDD12）的算法对F-Stat（KDD12）算法做了改进，提高算法效率初步的实验

改进F-Stat 原始算法的问题：候选Shapelet过多改进思路：离散化，保存已经评估过的Shapelet，避免重复计算
分辨率r = 0.01

运行时间

预测社团消失预测10个时刻后社团是否会消失（目前只考虑社团节点数、后续工作会增加社团边数、社团层次等因素）
t=140，10%训练 90%预测对比方法：逻辑回归存在时间、社团节点数（均值、标准差、最大值、最小值）、当前时刻社团节点数评价指标：F1

预测社团消失 Enron Facebook

Top 20 Shaplets Facebook Enron

后续工作算法实验进一步筛选Shapelet 扩展到多维时间序列更多对比算法迁移学习其它预测问题（缩小/增大、稳定、分裂…）
增加社团边数、社团层次等更多因素

Thank you!

Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series.

Similar presentations

Presentation on theme: "Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series.

Similar presentations

Presentation on theme: "Shapelet Time Series Shapelets: A New Primitive for Data Mining（KDD2009） A Shapelet Transform for Time Series Classification（KDD2012） Learning Time-Series."— Presentation transcript:

Similar presentations

About project

Feedback