Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li.

Similar presentations


Presentation on theme: "Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li."— Presentation transcript:

1 Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li

2 Classification in Time Series Application: Finance, Medicine 1-Nearest Neighbor – Pros: accurate, robust, simple – Cons: time and space complexity (lazy learning); results are not interpretable

3 Solution Shapelets – time series subsequence – representative of a class – discriminative from other classes

4 MOTIVATING EXAMPLE

5 false nettles stinging nettles false nettles Shapelet stinging nettles false nettles stinging nettles Leaf Decision Tree Shapelet Dictionary 5.1 yes no I I 0 1

6 BRUTE-FORCE ALGORITHM

7 ca Candidates Pool Extract subsequences of all possible lengths

8 Testing the utility of a candidate shapelet Arrange the time series objects – based on the distance from candidate Find the optimal split point (maximal information gain) Pick the candidate achieving best utility as the shapelet Split Point 0 candidate Information gain

9 Problem Total number of candidate Each candidate: compute the distance between this candidate and each training sample Trace dataset – 200 instances, each of length 275 – 7,480,200 shapelet candidates – approximately three days Candidates Pool

10 Speedup Distance calculations from time series objects to shapelet candidates are the most expensive part Reduce the time in two ways – Distance Early Abandon reduce the distance computation time between two time series – Admissible Entropy Pruning reduce the number of distance calculatations 0 candidate

11 DISTANCE EARLY ABANDON

12 T S

13 best matching location Dist= 0.4 S T

14 T S calculation abandoned at this point Dist> 0.4

15 Distance Early Abandon We only need the minimum Dist Method – Keep the best-so-far distance – Abandon the calculation if the current distance is larger than best so far.

16 ADMISSIBLE ENTROPY PRUNING

17 Admissible Entropy Pruning We only need the best shapelet for each class For a candidate shapelet – We dont need to calculate the distance for each training sample – After calculating some training samples, the upper bound of information gain < best candidate shapelet – Stop calculation – Try next candidate

18 0 false nettles stinging nettles

19 0 0 I=0.42 I= 0.29

20 false nettles stinging nettles Leaf Decision Tree Shapelet Dictionary 5.1 yes no I I 0 1 false nettles stinging nettles false nettles Shapelet stinging nettles Classification

21 EXPERIMENTAL EVALUATION

22 Performance Comparison Original Lightning Dataset Length2000 Training2000 Testing18000

23 Projectile Points

24 Shapelet Dictionary (Clovis) (Avonlea) I II Arrowhead Decision Tree I 2 1 II 0 Clovis Avonlea MethodAccuracyTime Shapelet Rotation Invariant Nearest Neighbor

25 Spectrography Wheat Spectrography one sample from each class Wheat Dataset Length1050 Training49 Testing276

26 I II IIIIV V VI I II III IV V VI Shapelet Dictionary Wheat Decision Tree MethodAccuracyTime Shapelet Nearest Neighbor

27 Coffee Coffee Decision Tree I (Robusta) 0 1 Shapelet Dictionary I caffeine chlorogenic acid MethodAccuracy Shapelet (28/28)1.0 Nearest Neighbor (leave-one-out)0.98

28 the Gun/NoGun Problem MethodAccuracyTime Shapelet Rotation Invariant Nearest Neighbor Shapelet Dictionary Gun Decision Tree (No Gun) No Gun Gun I I 10

29 Gait Analysis Gait Dataset Lengthdifferent lengths Training18 Testing143

30 Reduces the sensitivity of alignment Time for Classification on 143 testing objects Shapelet0.060s Rotation Invariant Nearest Neighbor2.684s Walk Decision Tree I (Normal Walk) left toe right toe I

31 Conclusions Interpretable results more accurate/robust significantly faster at classification

32 Discussions - Comparison Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, Discriminative Frequent Pattern Analysis for Effective Classification (ICDE'07) Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", (ICDE'08) Similarities: motivation: Discriminative frequent pattern = Shapelet technique: Use upper bound of information gain to speed up Differences: application: general feature selection v.s. time series (no explicit features) split node: binary (contain/not contain a pattern) v.s. numeric value (smaller/larger than a value)

33 Discussions – other topics Similar ideas could be applied to other research topics – graph – image – spatio-temporal – social network – ….

34 Discussions – other topics Graph classification: Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu, Mining Significant Graph Patterns by Scalable Leap Search, Proc ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'08), Vancouver, BC, Canada, June 2008.

35 Discussions – other topics moving object classification Discriminative sub-movement

36 Discussions – other topics Social network – classify normal/spamming users

37 Discussions – other topics

38 Social network – classify normal/spamming users – How to find discriminative features on social network? social network structure user behaviour

39 Discussions – other topics For different applications, this idea could be adapted to improve the performance; but not easily adapted.

40 Thank You Question?


Download ppt "Time Series Shapelets: A New Primitive for Data Mining Lexiang Ye and Eamonn Keogh University of California, Riverside KDD 2009 Presented by: Zhenhui Li."

Similar presentations


Ads by Google