Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Time Series Data

Similar presentations


Presentation on theme: "Mining Time Series Data"— Presentation transcript:

1 Mining Time Series Data
Carlo Zaniolo UCLA CS Dept With Slides from: A Tutorial on Indexing and Mining Time Series Data ICDM '01 The 2001 IEEE International Conference on Data Mining Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA

2 Outline Introduction, Motivation Similarity Measures
Properties of distance measures Preprocessing the data Time-warped measures Indexing Time Series Dimensionality reduction Discrete Fourier Transform Discrete Wavelet Transform Singular Value Decomposition Piecewise Linear Approximation Symbolic Approximation Piecewise Aggregate Approximation Adaptive Piecewise Constant Approximation Summary, Conclusions

3 What are Time Series? .. A time series is a collection of observations made sequentially in time. 50 100 150 200 250 300 350 400 450 500 23 24 25 26 27 28 29 Note that virtually all similarity measurements, indexing and dimensionality reduction techniques discussed in this tutorial can be used with other data types.

4 Time Series are Ubiquitous! I
People measure things... The presidents approval rating. Their blood pressure. The annual rainfall in Riverside. The value of their Yahoo stock. The number of web hits per second. … and things change over time. Thus time series occur in virtually every medical, scientific and businesses domain.

5 Time Series are Ubiquitous! II
A random sample of 4,000 graphics from 15 of the world’s newspapers published from 1974 to 1989 found that more than 75% of all graphics were time series (Tufte, 1983).

6 Time Series Similarity
Classification Clustering Defining the similarity between two time series is at the heart of most time series data mining applications/tasks Rule Discovery 10 s = 0.5 c = 0.3 Thus time series similarity will be the primary focus of this tutorial. Query by Content Query Q (template)

7 Why is Working With Time Series so Difficult? Part I
Answer: How do we work with very large databases? 1 Hour of EKG data: 1 Gigabyte. Typical Weblog: 5 Gigabytes per week. Space Shuttle Database: 158 Gigabytes and growing. Macho Database: 2 Terabytes, updated with 3 gigabytes per day. Since most of the data lives on disk (or tape), we need a representation of the data we can efficiently manipulate.

8 Why is Working With Time Series so Difficult? Part II
Answer: We are dealing with subjective notions of similarity. The definition of similarity depends on the user, the domain and the task at hand. We need to be able to handle this subjectivity.

9 Why is working with time series so difficult? Part III
Answer: Miscellaneous data handling problems. Differing data formats. Differing sampling rates. Noise, missing values, etc.

10 Similarity Matching Problem: Flavors 1
Query Q (template) 1: Whole Matching 6 1 2 7 8 3 C6 is the best match. 9 4 5 10 Database C Given a Query Q, a reference database C and a distance measure, find the Ci that best matches Q.

11 2: Subsequence Matching
Similarity matching problem: flavor 2 Query Q (template) 2: Subsequence Matching Database C The best matching subsection. Given a Query Q, a reference database C and a distance measure, find the location that best matches Q. Note that we can always convert subsequence matching to whole matching by sliding a window across the long sequence, and copying the window contents.

12 After all that background we might have forgotten what we are doing and why we care!
So here is a simple motivator and review.. You go to the doctor because of chest pains. Your ECG looks strange… You doctor wants to search a database to find similar ECGS, in the hope that they will offer clues about your condition... Two questions: How do we define similar? How do we search quickly?

13 Similarity is always subjective. (i.e. it depends on the application)
All models are wrong, but some are useful… This slide was taken from: A practical Time-Series Tutorial with MATLAB—presented at ECLM PAKDD 2005, by Michalis Vlachos.

14 Distance functions Metric distances i.e., those that satisfy the Triangle Inequality: d(x,z) ≤ d(x,y) + d(y,z) E.g., Correlation, Euclidean Distance Assume: d(Q,bestMatch) = 35, d(Q,B) =150, d(A,B)=20: then d(Q,A) ≥ d(Q,B) – d(B,A) thus d(Q,A) ≥ 150 – 20 = 130 We do not need to get A from disk Non-Metric Distances Examples: Time Warping LCSS: longest common sub-sequence

15 Preprocessing the data before distance calculations
If we naively try to measure the distance between two “raw” time series, we may get very unintuitive results. This is because Euclidean distance is very sensitive to some distortions in the data. For most problems these distortions are not meaningful, and thus we can and should remove them. In the next four slides, we discuss the 4 most common forms of distortion, and how to remove them. Offset Translation Amplitude Scaling Linear Trend Noise

16 Transformation I: Offset Translation
50 100 150 200 250 300 0.5 1 1.5 2 2.5 3 50 100 150 200 250 300 0.5 1 1.5 2 2.5 3 D(Q,C) Operations Performed: Q = Q - mean(Q) C = C - mean(C) D(Q,C) 50 100 150 200 250 300 50 100 150 200 250 300

17 Transformation II: Amplitude Scaling
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 Operations Performed: Q = (Q - mean(Q)) / std(Q) C = (C - mean(C)) / std(C) D(Q,C)

18 Transformation III: Linear Trend
20 40 60 80 100 120 140 160 180 200 -3 -2 -1 1 2 3 4 5 Operation performed: 1. offset translation 2. amplitude scaling 20 40 60 80 100 120 140 160 180 200 -4 -2 2 4 6 8 10 12 20 40 60 80 100 120 140 160 180 200 -3 -2 -1 1 2 3 4 5 Operation Performed 1. Removed linear trend Removing linear trend : Fit the best fitting straight line to the time series, then subtract that line from the time series. 2. offset translation 3. amplitude scaling

19 Transformation IIII: Noise
20 40 60 80 100 120 140 -4 -2 2 4 6 8 20 40 60 80 100 120 140 -4 -2 2 4 6 8 Q = smooth(Q) The intuition behind removing noise is this. Average each datapoints value with its neighbors. C = smooth(C) D(Q,C)

20 A Quick Experiment to Demonstrate the Utility of Preprocessing the Data
3 Clustered using Euclidean distance on the raw data 2 9 6 8 5 7 4 1 Clustered using Euclidean distance on the raw data, after removing noise, linear trend, offset translation and amplitude scaling. 9 8 7 5 6 4 3 2 1

21 Summary of Preprocessing
The “raw” time series may have distortions which we should remove before clustering, classification etc. Of course, sometimes the distortions are the most interesting thing about the data, the above is only a general rule. We should keep in mind these problems as we consider the high level representations of time series which we will encounter later (Fourier transforms, Wavelets etc). Since these representations often allow us to handle distortions in elegant ways.

22 Dynamic Time Warping Fixed Time Axis “Warped” Time Axis
Sequences are aligned “one to one”. “Warped” Time Axis Nonlinear alignments are possible. Note: We will first see the utility of DTW, then see how it is calculated.

23 Wednesday was a national holiday
Utility of Dynamic Time Warping: Example II, Data Mining Power-Demand Time Series. Each sequence corresponds to a week’s demand for power in a Dutch research facility in 1997 [van Selow 1999]. Monday Tuesday Wednesday Thursday Friday Saturday Sunday Wednesday was a national holiday

24 Hierarchical clustering with Euclidean Distance.
<Group Average Linkage> 4 5 3 6 The two 5-day weeks are correctly grouped. Note however, that the three 4-day weeks are not clustered together. Also, the two 3-day weeks are also not clustered together. 7 2 1

25 Hierarchical clustering with Dynamic Time Warping.
<Group Average Linkage> 1 2 3 5 7 4 6 The two 5-day weeks are correctly grouped. The three 4-day weeks are clustered together. The two 3-day weeks are also clustered together.

26 Dynamic Time-Warping (how does it work?) The intuition is that we copy an element multiple times so as to achieve a better matching Euclidean distance: d = 1 T1 = [1, 1, 2, 2] | | | | T2 = [1, 2, 2, 2] Warping distance: d = 0 | | |

27 Computing the Dynamic Time Warp Distance I
Note that the input sequences can be of different lengths Q |n| |p| C Q C n 1 p w1 wk i j

28 Computing the Dynamic Time Warp Distance II
Q |n| |p| C Every possible mapping from Q to C can be represented as a warping path in the search matrix. We simply want to find the cheapest one… Although there are exponentially many such paths, we can find one in only quadratic time using dynamic programming. Q C n 1 p w1 wk i j (i,j) = d(qi,cj) + min{ (i-1,j-1) , (i-1,j ) , (i,j-1) }

29 Complexity of Time Warping
Time taken to create hierarchical clustering of power-demand time series. Time to create dendrogram using Euclidean Distance seconds using Dynamic Time Warping hours How to speed it up. Approach 1: Complexity is O(n2). We can reduce it to O(δn) simply by restricting the warping path. Approach 2: Approximate the time series with some compressed or downsampled representation, and do DTW on the new representation.

30 22.7 sec 1.3 sec Fast Approximations to Dynamic Time Warp Distance II
.. strong visual evidence to suggests it works well. Good experimental evidence the utility of the approach on clustering, classification and query by content problems also has been demonstrated.

31 Weighted Distance Measures I
Intuition: For some queries different parts of the sequence are more important. Weighting features is a well known technique in the machine learning community to improve classification and the quality of clustering.

32 Relevance Feedback for Time Series
The original query The weigh vector. Initially, all weighs are the same. Note: In this example we are using a piecewise linear approximation of the data. We will learn more about this representation later.

33 The initial query is executed, and the five best matches are shown (in the dendrogram)
One by one the 5 best matching sequences will appear, and the user will rank them from between very bad (-3) to very good (+3)

34 The new query can be executed.
Based on the user feedback, both the shape and the weigh vector of the query are changed. The new query can be executed. The hope is that the query shape and weights will converge to the optimal query. Two paper consider relevance feedback for time series. L Wu, C Faloutsos, K Sycara, T. Payne: FALCON: Feedback Adaptive Loop for Content-Based Retrieval. VLDB 2000:

35 Motivating Example Revisited...
You go to the doctor because of chest pains. Your ECG looks strange… You doctor wants to search a database to find similar ECGS, in the hope that they will offer clues about your condition... Two questions: How do we define similar? How do we search quickly?

36 Indexing Time Series We have seen techniques for assessing the similarity of two time series. However we have not addressed the problem of finding the best match to a query in a large database... We need someway to index the data... A topics extensively discussed in topical literature that we will not discuss here for lack of time Query Q Find shapes like this In this DB

37 Compression – Dimensionality Reduction
Project all sequences into a new space, and search this space instead.

38 Compressed Representations … Compressed Representations
20 40 60 80 100 120 Chan & Fu. ICDE 1999 Korn, Jagadish & Faloutsos. SIGMOD 1997 Morinaka, Yoshikawa, Amagasa, & Uemura, PAKDD 2001 DFT DWT SVD APCA PAA PLA Keogh, Chakrabarti, Pazzani & Mehrotra KAIS 2000 Yi & Faloutsos VLDB 2000 Keogh, Chakrabarti, Pazzani & Mehrotra SIGMOD 2001 Agrawal, Faloutsos, &. Swami. FODO 1993 Faloutsos, Ranganathan, & Manolopoulos. SIGMOD 1994

39 Fourier Transforms and Discrete Fourier Transforms: Represent functions as sums of a series of sine&cosine terms of increasing frequency. Sampling the function at regular intervals FFTs

40 Discrete Fourier Transform I
Basic Idea: Represent the time series as a linear combination of sines and cosines, but keep only the first n/2 coefficients. Why n/2 coefficients? Because each sine wave requires 2 numbers, for the phase (w) and amplitude or (A,B). X X' 20 40 60 80 100 120 140 Jean Fourier 1 2 3 4 5 6 7 Excellent free Fourier Primer Hagit Shatkay, The Fourier Transform - a Primer'', Technical Report CS , Department of Computer Science, Brown University, 1995. 8 9

41 An Example of a Dimensionality Reduction Technique
0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387 Raw Data The graphic shows a time series with 128 points. The raw data used to produce the graphic is also reproduced as a column of numbers (just the first 30 or so points are shown). C 20 40 60 80 100 120 140 n = 128

42 Dimensionality Reduction (cont.)
We can decompose the data into pure sine waves using the Discrete Fourier Transform (just the first few sine waves are shown). The Fourier Coefficients are reproduced as a column of numbers (just the first 30 or so coefficients are shown). Note that at this stage we have not done dimensionality reduction, we have merely changed the representation... 1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928 0.1635 0.1602 0.0992 0.1282 0.1438 0.1416 0.1400 0.1412 0.1530 0.0795 0.1013 0.1150 0.1801 0.1082 0.0812 0.0347 0.0052 0.0017 0.0002 ... Fourier Coefficients 0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387 Raw Data C 20 40 60 80 100 120 140

43 An Example of a Dimensionality Reduction Technique III
Truncated Fourier Coefficients 0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387 Raw Data Fourier Coefficients 1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928 0.1635 0.1602 0.0992 0.1282 0.1438 0.1416 0.1400 0.1412 0.1530 0.0795 0.1013 0.1150 0.1801 0.1082 0.0812 0.0347 0.0052 0.0017 0.0002 ... C 1.5698 1.0485 0.7160 0.8406 0.3709 0.4670 0.2667 0.1928 n = 128 N = 8 Cratio = 1/16 C’ 20 40 60 80 100 120 140 … however, note that the first few sine waves tend to be the largest (equivalently, the magnitude of the Fourier coefficients tend to decrease as you move down the column). We can therefore truncate most of the small coefficients with little effect. We have discarded of the data.

44 An Example of a Dimensionality Reduction Technique IIII
Sorted Truncated Fourier Coefficients 1.5698 1.0485 0.7160 0.8406 0.3709 0.1670 0.4667 0.1928 0.1635 0.1302 0.0992 0.1282 0.2438 0.2316 0.1400 0.1412 0.1530 0.0795 0.1013 0.1150 0.1801 0.1082 0.0812 0.0347 0.0052 0.0017 0.0002 ... Fourier Coefficients 0.4995 0.5264 0.5523 0.5761 0.5973 0.6153 0.6301 0.6420 0.6515 0.6596 0.6672 0.6751 0.6843 0.6954 0.7086 0.7240 0.7412 0.7595 0.7780 0.7956 0.8115 0.8247 0.8345 0.8407 0.8431 0.8423 0.8387 Raw Data C 1.5698 1.0485 0.7160 0.8406 0.2667 0.1928 0.1438 0.1416 C’ 20 40 60 80 100 120 140 Instead of taking the first few coefficients, we could take the best coefficients This can help greatly in terms of approximation quality, but makes indexing hard (impossible?). Note this applies also to Wavelets

45 Compressed Representations … Compressed Representations
20 40 60 80 100 120 Chan & Fu. ICDE 1999 Korn, Jagadish & Faloutsos. SIGMOD 1997 Morinaka, Yoshikawa, Amagasa, & Uemura, PAKDD 2001 DFT DWT SVD APCA PAA PLA Keogh, Chakrabarti, Pazzani & Mehrotra KAIS 2000 Yi & Faloutsos VLDB 2000 Keogh, Chakrabarti, Pazzani & Mehrotra SIGMOD 2001 Agrawal, Faloutsos, &. Swami. FODO 1993 Faloutsos, Ranganathan, & Manolopoulos. SIGMOD 1994

46 Discrete Wavelet Transform I
Basic Idea: Represent the time series as a linear combination of Wavelet basis functions, but keep only the first N coefficients. Although there are many different types of wavelets, researchers in time series mining/indexing generally use Haar wavelets. Haar wavelets seem to be as powerful as the other wavelets for most problems and are very easy to code. 20 40 60 80 100 120 140 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 X X' DWT Alfred Haar Excellent free Wavelets Primer Stollnitz, E., DeRose, T., & Salesin, D. (1995). Wavelets for computer graphics A primer: IEEE Computer Graphics and Applications.

47 X = {8, 4, 1, 3} h1= 4 = mean(8,4,1,3) h2 = 2 = mean(8,4) - h1
7 6 5 4 3 2 1 I have converted a raw time series X = {8, 4, 1, 3}, into the Haar Wavelet representation H = [4, 2 , 2, 1] We can covert the Haar representation back to raw signal with no loss of information... h1 = 4 h2 = 2 h3 = 2 h4 = -1 X = {8, 4, 1, 3} 8 7 6 5 4 3 2 1

48 We have only considered one type of wavelet, there are many others.
Discrete Wavelet Transform II We have only considered one type of wavelet, there are many others. Are the other wavelets better for indexing? YES: I. Popivanov, R. Miller. Similarity Search Over Time Series Data Using Wavelets. ICDE 2002. NO: K. Chan and A. Fu. Efficient Time Series Matching by Wavelets. ICDE 1999 20 40 60 80 100 120 140 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 X X' DWT I consider this an open question...

49 Discrete Wavelet Transform III
Pros and Cons of Wavelets as a time series representation. Good ability to compress stationary signals. Fast linear time algorithms for DWT exist. Able to support some interesting non-Euclidean similarity measures. Signals must have a length n = 2some_integer Works best if N is = 2some_integer. Otherwise wavelets approximate the left side of signal at the expense of the right side. Cannot support weighted distance measures. 20 40 60 80 100 120 140 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 X X' DWT Open Question: We have only considered one type of wavelet, there are many others. Are the other wavelets better for indexing? YES: I. Popivanov, R. Miller. Similarity Search Over Time Series Data Using Wavelets. ICDE 2002. NO: K. Chan and A. Fu. Efficient Time Series Matching by Wavelets. ICDE 1999

50 Singular Value Decomposition
Basic Idea: Represent the time series as a linear combination of eigenwaves but keep only the first N coefficients. SVD is similar to Fourier and Wavelet approaches is that we represent the data in terms of a linear combination of shapes (in this case eigenwaves). SVD differs in that the eigenwaves are data dependent. SVD has been successfully used in the text processing community (where it is known as Latent Semantic Indexing ) for many years—but it is computationally expensive Good free SVD Primer Singular Value Decomposition - A Primer. Sonia Leach X X' SVD James Joseph Sylvester 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Camille Jordan ( ) Eugenio Beltrami

51 Singular Value Decomposition (cont.)
How do we create the eigenwaves? We have previously seen that we can regard time series as points in high dimensional space. We can rotate the axes such that axis 1 is aligned with the direction of maximum variance, axis 2 is aligned with the direction of maximum variance orthogonal to axis 1 etc. Since the first few eigenwaves contain most of the variance of the signal, the rest can be truncated with little loss. X X' SVD 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7

52 Piecewise Linear Approximation I
Basic Idea: Represent the time series as a sequence of straight lines. If we have a n N-vector then: If lines are connected, we can represent N/2 lines If lines are disconnected, we can represent only N/3 lines Personal experience on dozens of datasets suggest disconnected is better. Also only disconnected allows a lower bounding Euclidean approximation X Karl Friedrich Gauss X' 20 40 60 80 100 120 140 Each line segment has length left_height (right_height can be inferred by looking at the next segment) Each line segment has length left_height right_height

53 How do we obtain the Piecewise Linear Approximation?
Optimal Solution is O(n2N), which is too slow for data mining. A vast body on work on faster heuristic solutions to the problem can be classified into the following classes: Top-Down O(n2N) Bottom-Up O(n(1/CRatio)) Sliding Window O(n(1/CRatio)) Other (genetic algorithms, randomized algorithms, Bspline wavelets, MDL etc) Recent extensive empirical evaluation of all approaches suggest that Bottom-Up is the best approach overall. Piecewise Linear Approximation II X X' 20 40 60 80 100 120 140

54 Piecewise Linear Approximation III
Pros and Cons of PLA as a time series representation. Good ability to compress natural signals. Fast linear time algorithms for PLA exist. Able to support some interesting non-Euclidean similarity measures. Including weighted measures, relevance feedback, fuzzy queries… Already widely accepted in some communities (ie, biomedical) Not (currently) indexable by any data structure (but does allows fast sequential scanning). X X' 20 40 60 80 100 120 140

55 Symbolic Approximation
Basic Idea: Convert the time series into an alphabet of discrete symbols. Use string indexing techniques to manage the data. Potentially an interesting idea, but all the papers thusfar are very ad hoc. Symbolic Approximation X X' C U U C D C U D 20 40 60 80 100 120 140 Pros and Cons of Symbolic Approximation as a time series representation. Potentially, we could take advantage of a wealth of techniques from the very mature field of string processing. There is no known technique to allow the support of Euclidean queries. It is not clear how we should discretize the times series (discretize the values, the slope, shapes? How big of an alphabet? etc) C 1 2 3 4 5 6 7 U U C D C Key: C = Constant U = Up D = Down U D

56 Piecewise Aggregate Approximation I
Basic Idea: Represent the time series of n points as a sequence of box basis functions. Each box is the same length w (simple case: assume n multiple of w) X X' 20 40 60 80 100 120 140 x1 x2 x3 x4 x5 x6 x7 x8 is the mean value for points in the ith segment Given the reduced dimensionality representation we can calculate the approximate Euclidean distance (a lower bound) Independently introduced by:Keogh, Chakrabarti, Pazzani & Mehrotra, KAIS (2000) and Byoung-Kee Yi, Christos Faloutsos, VLDB (2000)

57 Piecewise Aggregate Approximation II
Pros and Cons of PAA as a time series representation. Extremely fast to calculate As efficient as other approaches (empirically) Support queries of arbitrary lengths Can support any Minkowski metric Supports non Euclidean measures Supports weighted Euclidean distance Simple! Intuitive! If visualized directly, looks ascetically unpleasing. X X' 20 40 60 80 100 120 140 X1 X2 X3 X4 X5 X6 X7 X8

58 Adaptive Piecewise Constant Approximation I
Basic Idea: Generalize PAA to allow the piecewise constant segments to have arbitrary lengths. Note that we now need 2 coefficients to represent each segment, its value and its length. Adaptive Piecewise Constant Approximation I 50 100 150 200 250 Raw Data (Electrocardiogram) Adaptive Representation (APCA) Reconstruction Error 2.61 Haar Wavelet Reconstruction Error 3.27 DFT Reconstruction Error 3.11 X X 20 40 60 80 100 120 140 <cv1,cr1> <cv2,cr2> <cv3,cr3> <cv4,cr4> The intuition is this, many signals have little detail in some places, and high detail in other places. APCA can adaptively fit itself to the data achieving better approximation.

59 Adaptive Piecewise Constant Approximation II
The high quality of the APCA had been noted by many researchers. However it was believed that the representation could not be indexed because some coefficients represent values, and some represent lengths. However an indexing method was discovered! (SIGMOD 2001 best paper award) Unfortunately, it is non-trivial to understand and implement…. X X 20 40 60 80 100 120 140 <cv1,cr1> <cv2,cr2> <cv3,cr3> <cv4,cr4>

60 SAX: Symbol Mapping Each average value from the PAA vector is replaced by a symbol from an alphabet An alphabet size of 5 to 8 is recommended a,b,c,d,e a,b,c,d,e,f a,b,c,d,e,f,g a,b,c,d,e,f,g,h Given an average value we need a symbol This is achieved by using the normal distribution from statistics Because our input series is normalized we can use normal distribution as the data model We divide the area under the normal distribution into ‘a’ equal sized areas where a is the alphabet size Each such area is bounded by breakpoints

61 SAX Computation – in pictures
20 40 60 80 100 120 This slide taken from Eamonn’s Tutorial on SAX - 20 40 60 80 100 120 b a c baabccbc

62 Conclusion This is just an introduction, with many unavoidable omissions: There are dozens of papers that offer new distance measures. Hidden Markov models do have a sound basis, but don’t scale well. Time series analysis remains a hot area of research and the most recent papers have not been discussed here.

63 References Eamonn J. Keogh: Indexing and Mining Time Series Data. Encyclopedia of GIS 2008: Jin Shieh, Eamonn J. Keogh: iSAX: indexing and mining terabyte sized time series. KDD 2008: Jessica Lin, Michail Vlachos, Eamonn J. Keogh, Dimitrios Gunopulos: Iterative Incremental Clustering of Time Series. EDBT 2004: Chiu, E. Keogh, S. Lonardi, Probabilistic discovery of time series motifs, 9th ACM SIGKDD international conference, 2003,

64 Discrete Fourier Transform II
Pros and Cons of DFT as a time series representation. Good ability to compress most natural signals. Fast, off the shelf DFT algorithms. FFT O(nlog(n)). (Weakly) able to support time warped queries. Difficult to deal with sequences of different lengths. Cannot support weighted distance measures. X X' 20 40 60 80 100 120 140 1 2 3 4 5 6 7 8 9


Download ppt "Mining Time Series Data"

Similar presentations


Ads by Google