Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shapelet Time Series Shapelets: A New Primitive for Data Mining(KDD2009) A Shapelet Transform for Time Series Classification(KDD2012) Learning Time-Series.

Similar presentations


Presentation on theme: "Shapelet Time Series Shapelets: A New Primitive for Data Mining(KDD2009) A Shapelet Transform for Time Series Classification(KDD2012) Learning Time-Series."— Presentation transcript:

1 Shapelet Time Series Shapelets: A New Primitive for Data Mining(KDD2009) A Shapelet Transform for Time Series Classification(KDD2012) Learning Time-Series Shapelets(KDD2014) by:张永辉

2 Outline Background What is shapelet? Finding the shapelet
Shapelet transform Learning shapelet directly Summary

3 Background TSC (time series classification)

4 TSC The simple nearest neighbor algorithm is very difficult to beat for most time series problems Disadvantages of nearest neighbor algorithm High time and space complexity Not interpretable

5 What is shapelet? Time Series Shapelets: A New Primitive for Data Mining(KDD2009) Shapelets are time series subsequences which are in some sense maximally representative of a class. Advantages: Interpretable More accurate/robust on some datasets Faster at classification

6 An Example Samples of leaves from two species

7 An Example Convert shape to “time series” representation

8 An Example The best shapelet found by algorithm

9 Why shapelets are effective?
Shapelets are local features, whereas most other state-of-the-art time series/shape classifiers consider global features, which can be brittle to even low levels of noise and distortions. Geurts(PKDD2001):“it is impossible in practice to consider every such subsignal as a candidate pattern” improvements in CPU performance pruning technique

10 Finding the shapelet Definition 1: Time Series. A time series T = t1,…,tm is an ordered set of m real-valued variables. Definition 4: Distance between the time series Dist(T, R) (T and R are of the same length) Definition 5: Distance from the time series to the subsequence. SubsequenceDist(T, S) = min(Dist(S, S'))

11 Finding the shapelet Definition 8: Optimal Split Point (OSP). An Optimal Split Point is a distance threshold that Definition 9: Shapelet. shapelet(D) is a subsequence that, with its corresponding optimal split point,

12 Finding the shapelet Brute-Force Algorithm

13 Finding the shapelet Build a decision tree Pruning

14 Accuracy of the shapelet decision tree Accuracy of the k-NN classifier
Experiments Dataset Accuracy of the shapelet decision tree Accuracy of the k-NN classifier Run time (shapelet VS. k-NN) Projectile Points 80.0% 68.0% 3×103 times faster Historical Documents 89.9% 82.9% 3×104 times faster Gun/NoGun Problem 93.3% 91.3% four times faster Wheat Spectrography 72.6% 44.1% -

15 The original research embedded the procedure of finding shapelets within a decision tree.
The relatively time-consuming shapelet detection method is called repeatedly. It is a costly operation, especially with multi-class problems. How to take full advantage of existing classification algorithms?

16 Shapelet transform A Shapelet Transform for Time Series Classification(KDD2012) Describe a means of extracting the k best shapelets from a data set in a single pass, and then use these shapelets to transform data into a new feature space. Increase classification accuracy while reducing training time and maintaining the interpretability of the model.

17 Alternative Quality Measure
The original paper use information gain to determine the quality of a shapelet. Our concern is not necessarily how well a candidate splits the data. we are concerned with how the distribution of the distances of the alternative classes differ. The utility of the original algorithm degrades with multi-class problems. F-statistic of a fixed effects ANOVA

18 单因素方差分析 方差分析(ANOVA)又称F检验,目的是推断多组资料中的总体均数是否相等。 单因素方差分析

19 Find the k best shapelets

20 Shapelet transform Length Parameter Approximation Data Transformation
a heuristic rule: not optimal, but an automatic approach Data Transformation Shapelet Selection a heuristic rule: n/2 shapelets 5-fold cross-validation Each attribute corresponds to the distance from each shapelet to the original time series.

21 Data sets 18 data sets from the UCR time series repository and 8 new data sets provided by us. New data sets: hand x-ray focus on eight specific bones 3 types of class labels infant (0-6 years) junior (7-12 years) teen (13-18 years)

22 Experiments(Embedded vs. Transformed)

23 Experiments(Shapelet Transformation Classifiers)
1 Shapelet Tree 2 C4.5 3 1NN 4 Naıve Bayes 5 Bayesian Network 6 Random Forest 7 Rotation Forest 8 SVM (linear)

24 Experiments(Other Classifiers)
This offers promising support for shapelet-based approaches, suggesting that they fill a classification niche that has not been covered in the literature.

25 Can we learn top-K shapelets directly without the need to try out lots of candidates?

26 Learning shapelet directly
Learning Time-Series Shapelets(KDD2014) Learn near-to-optimal shapelets directly without the need to try out lots of candidates. Learn true top-K shapelets by capturing their interaction.

27 Method Two steps: (i) Start with rough initial guesses for the shapelets (ii) Iteratively learn/optimize the shapelets by minimizing a classification loss function.

28 Objective Function For the sake of simplicity, the targets Y ∈ {0, 1} and the shapelet length is L(fixed). Learning Model Transformed shapelet representation (distance from i-th time series to k-th shapelet) weights

29 Objective Function Loss Function Regularized Objective Function
The idea of this paper is to jointly learn the optimal shapelets S and the optimal linear hyper-plane W that minimize the classification objective F.

30 Differentiable Soft-Minimum Function
The distance between shapelet and series is not differentiable.

31 Differentiable Soft-Minimum Function

32 Stochastic gradient descent

33 Comparison to State of the Art
Learning Near-To-Optimal Shapelets The baselines cannot explore candidates which do not appear literally as segments. minimizing the classification objective through candidate guesses has no guarantee of optimality. Capturing Interactions Among Shapelets The baselines find the score of each shapelet independently.

34 Interactions among Shapelets
Interactions among shapelets can become a game changing factor.

35 Learning general shapelets
Multi-Class Problem Shapelets having Various Lengths Generalized Objective Function Generalized Soft-Minimum

36 Weaker Aspects Relies on more hyper-parameters.
learning rate η regularization parameter λW soft-min precision α The interpretability of shapelets is weakened to some extent.

37 Experiments Baselines: Experiments(run time) next page
Shapelet Tree Methods (IG, KW, FST, MM) Basic Classifiers (1NN, NB, C4.5) More Complex Classifiers (BN, RAF, ROF, SVM) Other Related Methods (FSH, DTW) Experiments(run time) next page

38

39 Experiments (accuracy)

40 Experiments (accuracy)

41 Summary One of the most promising recent approaches in the time-series domain. 改进算法效率 提高分类精度 扩展到多维时间序列 应用

42 研究进展 使用Shapelet对社团演化序列进行挖掘 已进行的工作 预测(消失、缩小/增大、稳定、分裂) 找出有意义(可解释)的子序列
生成更细粒度的社团演化序列(月周) 实现了F-Stat(KDD12)的算法 对F-Stat(KDD12)算法做了改进,提高算法效率 初步的实验

43 改进F-Stat 原始算法的问题:候选Shapelet过多 改进思路:离散化,保存已经评估过的Shapelet,避免重复计算
分辨率r = 0.01

44 运行时间

45 预测社团消失 预测10个时刻后社团是否会消失(目前只考虑社团节点数、后续工作会增加社团边数、社团层次等因素)
t=140,10%训练 90%预测 对比方法:逻辑回归 存在时间、社团节点数(均值、标准差、最大值、最小值)、当前时刻社团节点数 评价指标:F1

46 预测社团消失 Enron Facebook

47 Top 20 Shaplets Facebook Enron

48 后续工作 算法 实验 进一步筛选Shapelet 扩展到多维时间序列 更多对比算法 迁移学习 其它预测问题(缩小/增大、稳定、分裂…)
增加社团边数、社团层次等更多因素

49 Thank you!


Download ppt "Shapelet Time Series Shapelets: A New Primitive for Data Mining(KDD2009) A Shapelet Transform for Time Series Classification(KDD2012) Learning Time-Series."

Similar presentations


Ads by Google