# Choosing Distance Measures for Mining Time Series Data

## Presentation on theme: "Choosing Distance Measures for Mining Time Series Data"— Presentation transcript:

Choosing Distance Measures for Mining Time Series Data
It's about time. Choosing Distance Measures for Mining Time Series Data Spencer Schnier 2/22/11

Major Time Series Data Mining Tasks
Indexing Clustering Classification Prediction Summarization Anomaly Detection Segmentation Indexing and clustering make explicit use of a distance measure The others make implicit use of a distance measure (Ratanamahatana et al., 2010)

Popular Distance Measures
Lock-step Measure (one-to-one) Minkowski Distance L1 norm (Manhattan Distance) L2 norm (Euclidean Distance) L∞ norm (Supremum Distance) Elastic Measure (one-to-many/one-to-none) Dynamic Time Warping (DTW) Edit distance based measure Longest Common SubSequence (LCSS) Edit Distance on Real Sequence (EDR) Threshold-based Measure Threshold query based similarity search (TQuEST) Pattern-based Measure Spatial Assembling Distance (SpADe) Distance measure = similarity measure Lock step means the measure compares i-th point to i-th point (one-to-one) Elastic measures mean one-to-many (DTW) and one-to-many/one-to-none points (LCSS) (Ding et al., 2008)

Minkowski Distance h = 1: Manhattan (city block, L1 norm) distance
E.g., the Hamming distance: the number of bits that are different between two binary vectors h = 2: (L2 norm) Euclidean distance h  . “supremum” (Lmax norm, L norm) distance. This is the maximum difference between any component (attribute) of the vectors Borrowed from CS 412 Chp2 slides

Minkowski Distance Examples
Dissimilarity Matrices Manhattan (L1) Euclidean (L2) Borrowed from CS 412 Chp2 slides Supremum

What’s wrong with Euclidean Distance?
Similar sequences but they are shifted and have different scales Normalize the time series before measuring the distance between them. 𝑥 𝑖 ′ = 𝑥 𝑖 −μ σ What if a sequence is stretched or compressed along the time axis? (Goldin and Kanellakis, 1995)

Dynamic Time Warping Sequences are similar but accelerate differently along the time axis Enforcing a temporal constraint δ on the warping window size improves computation efficiency and accuracy Application: Speech recognition (Berndt and Clifford, 1996)

Longest Common Subsequence Similarity
Match 2 sequences by allowing some elements to be unmatched C = {1,2,3,4,5,1,7} and Q = {2,5,4,5,3,1,8} Longest is {2,4,5,1} Application: Bioinformatics 1 2 3 4 5 7 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 2 2 2 2 2 1 2 2 3 3 3 3 1 2 2 3 3 4 4 This is an edit-distance based measure Two points from two time series are considered to match if their distance is less than ε. Dissimilarity is the minimum number of elements that should be removed from and inserted into C to transform C to Q. Specification of a matching window can improve accuracy. Dissimilarity: 1 2 2 3 3 4 4 𝐿𝐶𝑆𝑆 𝐶,𝑄 = 𝑚+𝑛−2∙𝑙 𝑚+𝑛 2 4 5 1 Tolerance: c 1−ε <𝑞<𝑐(1+ε) Vlachos et al., 2002

Longest Common Subsequence Similarity
Input sequences C[1..m] and Q[1..n] Compute LCS btwn C[1..i] and Q[1..j] for all 1 ≤ i ≤ m and 1 ≤ j ≤ n Stores it in L[i,j] L[m,n] = length of the LCS 1 2 3 4 5 7 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 for i := 1..m for j := 1..n if C[i] = Q[j] L[i,j] := L[i-1,j-1] + 1 else: L[i,j] := max(L[i,j-1], L[i-1,j]) return L[m,n] 1 1 2 2 2 2 2 1 2 2 3 3 3 3 1 2 2 3 3 4 4 1 2 2 3 3 4 4 2 4 5 1 Vlachos et al., 2002

Edit Distance on Real Sequence
Similar to LCSS Uses a threshold parameter ε to quantify the distance between a pair of points to 0 or 1 Seeks the minimum number of edit operations to change one sequence into another Assigns penalties to the unmatched segments according to the lengths of the gaps Application: Trajectories of moving objects (Chen et al., 2005)

TQuEST (Assfalg et al., 2006) Uses a threshold parameter τ to transform a time series into a sequence of threshold-crossing intervals (the points within each interval have a value greater than a given τ) Each interval is treated as a 2D point: x = starting time, y = ending time The similarity between two time series is then defined as the Minkowski sum of the two sequences of time interval points SpADe (Chen et al., 2007) A pattern-based similarity measure for time series Finds matching segments called patterns by allowing shifting and scaling Then finds the most similar set of matching patterns Disadvantage: requires many parameters (temporal and amplitude scale factor, pattern length, sliding step size, etc.)

Comparison of Distance Measures
(Ding et al., 2008)

Comparison of Distance Measures
The accuracy of elastic measures converge with Euclidean distance as the training set increases. On small data sets, elastic measures can be significantly more accurate than lock-step measures. Constraining the warping window size for elastic measures can reduce the computation cost and increase accuracy. The accuracy of edit distance based similarity measures is very close to that of DTW. Only EDR is potentially slightly better than DTW. The accuracy of several new similarity measures, such as TQuEST and SpADe, is in general inferior to elastic measures. To improve accuracy of a similarity measure, get more training data. If you can’t get more data, trying the other measures might help; however, be careful to avoid overfitting. elastic measures (e.g., DTW, LCSS, EDR and ERP etc.). Other lock-step (e.g., L1-norm, Euclidean and DISSIM). Elastic measures (such as DTW and LCSS) Edit distance based (such as LCSS, EDR and ERP ) (Ding et al., 2008)

ELKI 0.2 Software for visualization and performance evaluation of distance measures for time series (Achtert et al., 2009)

Research Questions Is distance measure performance related to some intrinsic properties of the data set? If so, can those properties be used to identify the most appropriate distance measure?

References Achtert, E., T. Bernecker, H.-P. Kriegel, E. Schubert, and A. Zimek “ELKI in Time: ELKI 0.2 for the Performance Evaluation of Distance Measures for Time Series.” SSTD 2009. Aßfalg, J., H.-P. Kriegel, P. Kr¨oger, P. Kunath, A. Pryakhin, and M. Renz “Similarity search on time series based on threshold queries.” EDBT, 2006. Berndt, D., and J. Clifford “Finding Patterns in Time Series: A Dynamic Programming Approach.” Advances in Knowledge Discovery and Data Mining AAAI/MIT Press, Menlo Park, CA. pg Chen, L., M. Ozsu, and V. Oria “Robust and fast similarity search for moving object trajectories. SIGMOD ‘05. Chen, Y., M. Nascimento, B. Ooi, and A. Tung “SpADe: On Shape-based Pattern Detection in Streaming Time Series. ICDE, 2007. Ding, H., G. Trajcevski, P. Scheuermann, X. Wang, E. Keogh “Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures.” VLDB ‘08. Goldin, D., and P. Kanellakis “On Similarity Queries for Time-Series Data: Constraint Specification and Implementation.” Proceedings of the 1st International Conference on the Principles and Practice of Constraint Programming. pp Ratanamahatana, C., J. Lin, D. Gunopulos, E. Keogh, M. Vlachos, G. Das “Mining Time Series Data.” Data Mining and Knowledge Discovery Handbook. Part 6, pg Vlachos, M., D. Gunopulos, and G. Kollios “Discovering similar multidimensional trajectories.” ICDE, 2002.