1
2 General problem Retrieval of time-series similar to a given pattern.
3 Example: Stock charts Database of time-series
4 Example: Stock charts Database of time-seriesPattern
5 Example: Stock charts Database of time-seriesPatternRetrieval results
6 Example: Stock charts Database of time-seriesPatternRetrieval results
7 Example: Electrocardiogram Database of time-series
8 Example: Electrocardiogram Database of time-seriesPattern
9 Example: Electrocardiogram Database of time-seriesPatternRetrieval results
10 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
11 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions Contributions }
12 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data
13 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
14 Previous work Feature choice Similarity metrics Indexing and retrieval
15 Previous work: Feature choice Discrete Fourier transforms Alphabets Statistical features Subsets of points
16 Previous work: Similarity metrics Euclidean distance Bounding rectangles Envelope count Aggregate similarity
17 Previous work: Indexing and retrieval Advanced techniques: B-trees R-trees KD-trees VP-trees Grids Applied techniques: Linear search with compression
18 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
19 Important points Choose “important” maxima and minima, and discard the other points.
20 Important points Choose “important” maxima and minima, and discard the other points. Original series Example:
21 Important points Choose “important” maxima and minima, and discard the other points. Original series Example:
22 Important points Choose “important” maxima and minima, and discard the other points. Original series Example: Compressed series
23 Definition of important points Important minimum
24 Definition of important points Important minimum a m is the minimum among a i,…, a j
25 Definition of important points Important minimum a m is the minimum among a i,…, a j a i /a m R and a j /a m R
26 Definition of important points Important minimum a m is the minimum among a i,…, a j a i /a m R and a j /a m R R is a knob that determines compression rate
27 Definition of important points Important maximum a m is the maximum among a i,…, a j a m /a i R and a m /a j R R is a knob that determines compression rate
28 Compression example Original series
29 Compression example Original series Compressed series
30 Compression example Original series Compressed series
31 Compression example Original series Compressed series
32 Compression algorithm Linear time Constant memory Accepts streaming data For a series with n values, compression time is n milliseconds (300 MHz PC, Visual Basic 6.0).
33 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
34 Retrieval Retrieval of time-series similar to a given pattern. Intuition: Find a prominent feature in the pattern Find candidate segments with a similar feature Compare similarity of candidates to the pattern
35 Example: Stock charts Database of time-series
36 Example: Stock charts Database of time-series
37 Example: Stock charts Database of time-seriesPattern
38 Example: Stock charts Database of time-seriesPattern
39 Example: Stock charts Database of time-seriesPattern
40 Example: Stock charts Database of time-seriesPatternRetrieval results
41 Algorithm Identify the prominent leg in the pattern Retrieve similar legs from the database Identify corresponding candidate segments For each candidate segment, compute its similarity to the pattern Output the candidates whose similarity is above the threshold
42 Important details Use compressed pattern and compressed sequences in the retrieval process The prominent feature is the leg having the greatest ratio of right end to left end All legs in the database are indexed by their prominence, using a binary search tree
43 Alternative versions Different prominence definitions Different similarity metrics The end-point ratio prominence usually gives the best empirical results.
44 Extended legs Similar sequence
45 Indexing on extended legs Advantage: More accurate retrieval Disadvantage: Larger index, more memory If a compressed sequence has n legs: Worst case: n 2 /2 extended legs Average case: (n lg n) extended legs
46 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
47 Data sets Stock charts Air and sea temperatures Wind speeds Electroencephalograms Electrocardiograms
48 Data sets Stock charts Air and sea temperatures Wind speeds Electroencephalograms Electrocardiograms 60,000 points 445,000 points 79,000 points 17,000 points 2,000 points
49 Patterns Compressed patterns with 4 to 27 legs Examples:
50 Retrieval time Retrieval time: 0.07 m k milliseconds m legs in a pattern k candidates
51 Retrieval accuracy: Stock charts 20 % candidates C = 3 10 % C = 2 5 % C = % C = 1.1
52 Retrieval accuracy: Wind speeds 20 % candidates C = % C = % C = 1.1
53 Retrieval candidate quality Stock charts (5,400 legs)447 Air and sea temperatures (5,500 legs)456 Wind speeds (10,500 legs)379 Candidates 5%10%20% Found matches among ten best:
54 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions
55 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data
56 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data
57 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~
58 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~
59 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~
60 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~ ~
61 Main results Compression Fast compression procedure Preserves similarity Retrieval Works with compressed data Controlled trade-off between speed and accuracy