Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.

Similar presentations


Presentation on theme: "Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan."— Presentation transcript:

1 Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan Zhu, Liudmila Ulanova Nurjahan Begum, Yifei Ding, Hoang Anh Dau Diego Furtado Silva, Abdullah Mueen, Eamonn Keogh

2 Motivation Given a time series, T and a desired subsequence length, m

3 Motivation Given a time series, T and a desired subsequence length, m
We can use sliding window of length m to extract all subsequences of length m |T|-m+1

4 Motivation Given a time series, T and a desired subsequence length, m
We can then compute the pairwise distance among these subsequences and store them to a matrix 7.6952 7.7399 7.7106 |T|-m+1 4

5 Motivation Given a time series, T and a desired subsequence length, m
We can visualize the matrix by plot the matrix as a image where blue is more similar subsequences and red is more dissimilar subsequences |T|-m+1 5

6 Motivation Given a time series, T and a desired subsequence length, m
Different information about the time series can be extract from the matrix (e.g., pair with smallest distance is the motif, checkerboard kernel can be used to segment the time series [a]) |T|-m+1 6 [a] J Foote, Automatic Audio Segmentation Using A Measure of Audio Novelty

7 Motivation Given a time series, T and a desired subsequence length, m
However, since the space complexity is O( T −m+1 2 ), which is quadratic in T , the corresponding matrix for a time series with 1 million doubles requires 4,000 GB to store |T|-m+1 7

8 Problem statement Given a time series, T and a desired subsequence length, m m Most part of the matrix contains information that is repeated or irrelevant for tasks in time series data mining What kind of information should we extract from the matrix? How do we compactly storing it? |T|-m+1

9 What kind of information should be extracted from the matrix?
set of all subsequences set of corresponding nearest neighbor Who is each subsequence’s nearest neighbor |T|-m+1

10 What kind of information should be extracted from the matrix?
set of all subsequences set of corresponding nearest neighbor 2.88 0.90 1.61 6.04 5.69 1.23 1.40 Who is each subsequence’s nearest neighbor How far away from their corresponding nearest neighbor |T|-m+1

11 How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile time series, T matrix profile, P

12 How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile Ti time series, T matrix profile, P m The matrix profile value at this location i is the distance between Ti and its nearest neighbor

13 How to store the information? (1 of 2)
The distance to the corresponding nearest neighbor of each subsequence can be stored in a vector called matrix profile time series, T matrix profile, P Local minimums are corresponding to motifs

14 How to store the information? (2 of 2)
The index of corresponding nearest neighbor of each subsequence is also stored in a vector called matrix profile index Ti time series, T matrix profile index, I m 192 193 194 195 196 The matrix profile index value at this location i is the index of Ti‘s nearest neighbor

15 How to store the information? (2 of 2)
The index of corresponding nearest neighbor of each subsequence is also stored in a vector called matrix profile index Ti T194 time series, T matrix profile index, I m 192 193 194 195 196 It turns out that Ti‘s nearest neighbor is T194

16 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m m inf Matrix profile is initialized as inf vector This is just a toy example, so the values and the vector length does not fit the time series shown above

17 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf At the first iteration, a subsequence T i is randomly selected from T

18 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf We compute the distances between T i and every subsequences from T (time complexity = O(|T|log(|T|))) We them put the distances in a vector based on the position of the subsequences 3 2 5 4 1 9 8 6 The distance between T i and T 1 (first subsequence) is 3

19 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf We compute the distances between T i and every subsequences from T (time complexity = O(|T|log(|T|))) We them put the distances in a vector based on the position of the subsequences 3 2 5 4 1 9 8 6 Let say T i happen to be the third subsequences, therefore the third value in the distance vector is 0

20 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6

21 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6

22 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 2 inf Matrix profile is updated by apply elementwise minimum to these two vectors min 3 2 5 4 1 9 8 6 Because T i is the third subsequences and the distance between oneself is unimportant, we skip it

23 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Ti m 3 2 inf 5 4 1 9 8 6 After we finish update matrix profile for the first iteration 3 2 5 4 1 9 8 6

24 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 In the second iteration, we randomly select another subsequence Tj and it happens to be the 12th subsequences

25 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 Once again, we compute the distance between Tj and every subsequences of T 2 3 1 4 6 5 8 9

26 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 3 2 inf 5 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9

27 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 inf 5 3 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9

28 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 inf 5 3 4 1 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9

29 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min The same elementwise minimum 2 3 1 4 6 5 8 9

30 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min 2 3 1 4 6 5 8 9 We repeat the two steps (distance computation and update) until we have used every subsequences

31 How to compute matrix profile? (1 of 2)
Given a time series, T and a desired subsequence length, m Tj m 2 1 5 3 4 9 8 6 min 2 3 1 4 6 5 8 9 There are |T| subsequences and the distance computation is O(|T|log(|T|)) The overall time complexity is O(|T|2log(|T|))

32 How to compute matrix profile? (2 of 2)
It is exact It is simple and parameter-free It is fast O(n2 log n) or O(n2) if not anytime It is space efficient O(n) It allows anytime algorithms (the quality of matrix profile improves over iteration) It can leverage hardware (embarrassingly parallelizable) It is incrementally maintainable

33 Motif mining with matrix profile: repeated earthquakes
Seismologists are interested in finding repeated earthquakes in long sequence of seismometer readings as the repeated earthquakes could be years apart Excerpt of a seismic time series from Mammoth Lakes, California

34 Motif mining with matrix profile: repeated earthquakes
Seismologists are interested in finding repeated earthquakes in long sequence of seismometer readings as the repeated earthquakes could be years apart

35 Discord mining with matrix profile: abnormal heartbeat detection
Identifying abnormal heartbeat from ECG reading is important for diagnosis patients Maximum value in matrix profile indicates discord

36 Discord mining with matrix profile: abnormal heartbeat detection
Identifying abnormal heartbeat from ECG reading is important for diagnosis patients Premature ventricular contraction

37 Common subsequence mining with matrix profile: music sampling detection (1 of 2)
Sometimes a musician “borrows” section of other music to generate new music The borrowed section is the common subsequence in both music and can be discovered using matrix profile Given two time series T a and T b , matrix profile can be compute by either finding the nearest neighbor of T a ’s subsequences in T b or vise versa

38 Common subsequence mining with matrix profile: music sampling detection (2 of 2)

39 Common subsequence mining with matrix profile: music sampling detection (2 of 2)
Most similar 5 secs subsequences

40 Common subsequence mining with matrix profile: music sampling detection (2 of 2)
Music 1: Under Pressure by Queen-David Bowie Music 2: Ice Ice Baby by Vanilla Ice Most similar 5 secs subsequences Under Pressure Ice Ice Baby

41 Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun

42 Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun 1 arc 3 arc 2 arc

43 Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun Arc counting curve

44 Segmentation with matrix profile: human activity (1 of 2)
Matrix profile and index can also be used for segmentation Given a time series, by examining its matrix profile index, the neighboring information can be extracted, and similar pattern will be neighbors The neighboring information can be visualized with arcs walkwalkwalkwalkwalkwalkwalkwalkrunrunrunrunrunrunrunrunrun Local minimum is a good split point Arc counting curve

45 Segmentation with matrix profile: human activity (2 of 2)
CMU Motion Capture database Local minimum is found with MATLAB’s peak finding function 5,000 10,000 Matrix Profile Arc Counting Curve One-dimension of multi-d time series: Subject 86, recording 4, dimension 30 W S P N W C T D P W Ground Truth

46 Conclusion http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
Given a time series, T and a desired subsequence length, m m pairwise distance matrix Important (neighboring) information are extracted from O( T 2 ) pairwise distance matrix and stored in O( T ) matrix profile and matrix profile index matrix profile matrix profile index Tasks such as motif mining, discord mining, or segmentation can be easily performed with matrix profile and matrix profile index

47 Questions?


Download ppt "Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan."

Similar presentations


Ads by Google