Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Sequence Matching Jiaqin Wang CMPS 565.

Similar presentations


Presentation on theme: "Time Series Sequence Matching Jiaqin Wang CMPS 565."— Presentation transcript:

1 Time Series Sequence Matching Jiaqin Wang CMPS 565

2 Papers “ Fast subsequence Matching in time-series database ” Christos Faloutsos, M.Ranganathan Yannis Manolopoulos “ Skyline index for time series data ” Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon

3 Types of Time Series sequence Financial, marketing area Stock prices Sales numbers Scientific databases Weather data Environmental data

4 Categories for time series sequence matching Whole matching data sequences and query sequence have the same length Subsequence matching Query sequence and data sequence have different length

5 Whole matching Given N sequences with the same length l Use features extraction function to convert sequences into n-dimensional values DFT N-dimensional value (Q1,Q2, …,Qn) Most energy in first few coefficients Keep first few coefficients Reduce dimensions of sequence

6 Whole matching Map each sequence as a n- dimensional point into the feature space Only take first 2 coefficients Organize these points into R-tree For index and search in R-tree

7 Whole matching New coming query sequence Use DFT convert to feature point Map the query feature point into feature space Find out points whose distance to query point within tolerance e Consider them similar

8 Some pictures of time series data and DFT Discrete Fourier Transform (DFT ) keep first few (2-3) coefficients The first few coefficients contain most energy of the feature

9 Feature space TS1(0.05,3) TS2(0.01,12) ……

10 Feature space The distance e < minimum query distance

11 Subsequence matching A collection of N sequences, each one has different length A query Q with tolerance e Find out all sequence Sі(1<i<N), along with the correct offsets k,such that the sequence Sі[k:k+Len(Q)-1] matches the query sequence: D(Q, Sі[k:k+Len(Q)-1] ) <= e

12 ST-index Assuming the minimum query length w Using a sliding window of size w and place it on the date sequence at every possible offsets of the whole data sequences Extract the features in window at each possible offset and map each feature as a point into feature space

13 Figure Sliding window on sequence from offset 0 to Len(S)-w+1 The length of window is w

14 Figure Sliding window on sequence from offset 0 to Len(S)-w+1 The length of window is w

15 Figure Sliding window on sequence from offset 0 to Len(S)-w+1 The length of window is w

16 Figure Sliding window on sequence from offset 0 to Len(S)-w+1 The length of window is w

17 Figure Sliding window on sequence from offset 0 to Len(S)-w+1 The length of window is w

18 Result A series of points in the feature space is curve R-tree

19 MBRs Store points in R-tree is inefficient Divide trial into sub-trials using minimum bounding rectangles (MBRs)

20 MBRs in R-tree Combine small MBRs Get the index information

21 How to insert points into MBRs Group the points into MBR with a fixed-number Group the points into MBR with a variable-number

22 I-adaptive method One greedy algorithm number of disk access cost function average cost function

23 Algorithm Assign the first point of the trail in a sub-trail For each successive point If it increases the average cost of current sub-trail Then start another sub-trail Else include this point in current sub- trial

24 Skyline index for time series data “ Skyline index for time series data ” Quanzhong Li, Ines Fernando Vega Lopez, Bongki Moon

25 Adaptive Piecewise Constant Approximation (APCA) What is APCA?

26 Adaptive Piecewise Constant Approximation (APCA) Limitation of APCA Internal overlap in MBRs

27 Skyline Bounding Region (SBR) SBR N time series data objects of length l Specify 2-dimensional regions by top and bottom skylines

28 Approximate SBR Many approaches Equal-length constant-valued segments Variance-length constant-valued segments ASBR will cover the original SBR

29 Index Approximation SBR R-Tree based Skyline index Internal node Approximation SBR Pointer to child node Leaf node Pointer to time series data

30 The End Thank You


Download ppt "Time Series Sequence Matching Jiaqin Wang CMPS 565."

Similar presentations


Ads by Google