Time Series I.

Slides:



Advertisements
Similar presentations
Indexing Time Series Based on original slides by Prof. Dimitrios Gunopulos and Prof. Christos Faloutsos with some slides from tutorials by Prof. Eamonn.
Advertisements

Time Series II.
Word Spotting DTW.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
Lazy vs. Eager Learning Lazy vs. eager learning
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Mining Time Series.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Heuristic alignment algorithms and cost matrices
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Making Time-series Classification More Accurate Using Learned Constraints © Chotirat “Ann” Ratanamahatana Eamonn Keogh 2004 SIAM International Conference.
Reza Sherkat ICDE061 Reza Sherkat and Davood Rafiei Department of Computing Science University of Alberta Canada Efficiently Evaluating Order Preserving.
Distance Functions for Sequence Data and Time Series
Based on Slides by D. Gunopulos (UCR)
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Using Relevance Feedback in Multimedia Databases
A Multiresolution Symbolic Representation of Time Series
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Indexing Time Series.
Time series analysis and Sequence Segmentation
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Exact Indexing of Dynamic Time Warping
FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
Multimedia and Time-series Data
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
Analysis of Constrained Time-Series Similarity Measures
S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.
Dynamic Time Warping Algorithm for Gene Expression Time Series
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
k-Shape: Efficient and Accurate Clustering of Time Series
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Exact indexing of Dynamic Time Warping
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
COMP 5331 Project Roadmap I will give a brief introduction (e.g. notation) on time series. Giving a notion of what we are playing with.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Dynamic programming with more complex models When gaps do occur, they are often longer than one residue.(biology) We can still use all the dynamic programming.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Fast Subsequence Matching in Time-Series Databases.
Supervised Time Series Pattern Discovery through Local Importance
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Distance Functions for Sequence Data and Time Series
Time Series Filtering Time Series
Spatio-temporal Pattern Queries
Instance Based Learning (Adapted from various sources)
Distance Functions for Sequence Data and Time Series
Robust Similarity Measures for Mobile Object Trajectories
Finding Similar Time Series
Handwritten Characters Recognition Based on an HMM Model
Time Series Filtering Time Series
Presentation transcript:

Time Series I

Syllabus Nov 4 Introduction to data mining Nov 5 Association Rules Clustering and Data Representation Nov 17 Exercise session 1 (Homework 1 due) Nov 19 Classification Nov 24, 26 Similarity Matching and Model Evaluation Dec 1 Exercise session 2 (Homework 2 due) Dec 3 Combining Models Dec 8, 10 Time Series Analysis Dec 15 Exercise session 3 (Homework 3 due) Dec 17 Ranking Jan 13 Review Jan 14 EXAM Feb 23 Re-EXAM

Why deal with sequential data? Because all data is sequential  All data items arrive in the data store in some order Examples transaction data documents and words In some (or many) cases the order does not matter In many cases the order is of interest

Time-series data: example Financial time series

Questions What is time series? How do we compare time series data? What is the structure of sequential data? Can we represent this structure compactly and accurately?

Time Series value axis time axis A sequence of observations: X = (x1, x2, x3, x4, …, xn) Each xi is a real number e.g., (2.0, 2.4, 4.8, 5.6, 6.3, 5.6, 4.4, 4.5, 5.8, 7.5) value axis time axis

Time Series Databases Stock prices Volume of sales over time A time series is an ordered set of real numbers, representing the measurements of a real variable at equal time intervals Stock prices Volume of sales over time Daily temperature readings ECG data A time series database is a large collection of time series

Time Series Problems The Similarity Problem X = x1, x2, …, xn and Y = y1, y2, …, yn Define and compute Sim(X, Y) or Dist(X, Y) e.g. do stocks X and Y have similar movements? Retrieve efficiently similar time series Indexing for Similarity Queries

Types of queries whole match vs subsequence match range query vs nearest neighbor query

Examples Find companies with similar stock prices over a time interval Find products with similar sell cycles Cluster users with similar credit card utilization Find similar subsequences in DNA sequences Find scenes in video streams

distance function: by expert (e.g., Euclidean distance) day $price 1 365 distance function: by expert (e.g., Euclidean distance)

Problems Define the similarity (or distance) function Find an efficient algorithm to retrieve similar time series from a database (Faster than sequential scan) The Similarity function depends on the Application

Metric Distances What properties should a similarity distance have to allow (easy) indexing? D(A,B) = D(B,A) Symmetry D(A,A) = 0 Constancy of Self-Similarity D(A,B) >= 0 Positivity D(A,B)  D(A,C) + D(B,C) Triangular Inequality Some times the distance function that best fits an application is not a metric Then indexing becomes interesting and challenging

Euclidean Distance Each time series: a point in the n-dim space pair-wise point distance X = x1, x2, …, xn Y = y1, y2, …, yn v2 v1

Euclidean model Query Q Database Distance 0.98 0.07 0.21 0.43 Rank 4 1 n datapoints Database n datapoints Distance 0.98 0.07 0.21 0.43 Rank 4 1 2 3 S Q Euclidean Distance between two time series Q = {q1, q2, …, qn} and X = {x1, x2, …, xn}

Advantages Easy to compute: O(n) Allows scalable solutions to other problems, such as indexing clustering etc...

Disadvantages Query and target lengths should be equal! Cannot tolerate noise: Time shifts Sequences out of phase Scaling in the y-axis

Limitations of Euclidean Distance Q Euclidean Distance Sequences are aligned “one to one”. C Q Very brittle distance measure, what we need is a method that allows elastic shifting on the time axis to accommodate sequences that are similar but out of phase. “Warped” Time Axis Nonlinear alignments are possible. C

Dynamic Time Warping [Berndt, Clifford, 1994] DTW allows sequences to be stretched along the time axis Insert ‘stutters’ into a sequence THEN compute the (Euclidean) distance original ‘stutters’

Computation p-stutter q-stutter no stutter P = {p1, p2, …, pi} DTW is computed by dynamic programming Given two sequences P = {p1, p2, …, pi} Q = {q1, q2, …, qj} q-stutter no stutter p-stutter

DTW: Dynamic time warping (1/2) Each cell c = (i, j) is a pair of indices whose corresponding values will be computed, (xi–yj)2, and included in the sum for the distance. Euclidean path: i = j always. Ignores off-diagonal cells. Y yj (x2–y2)2 + (x1–y1)2 (x1–y1)2 xi X

DTW: Dynamic time warping (2/2) b DTW allows any path. Examine all paths: Standard dynamic programming to fill in the table. The top-right cell contains final result. shrink x / stretch y (i, j) (i, j) (i-1, j) (i-1, j-1) (i, j-1) Y stretch x / shrink y Say: gray cells are prefix subsequences – we use only these in recursive definition/estimation X a

Properties of a DTW legal path Properties of DTW Warping path W: set of grid cells in the time warping matrix DTW finds the optimum warping path W: the path with the smallest matching score Optimum warping path W (the best alignment) Properties of a DTW legal path Boundary conditions W1=(1,1) and WK=(n,m) Continuity Given Wk = (a, b), then Wk-1 = (c, d), where a-c ≤ 1, b-d ≤ 1 Monotonicity Wk-1 = (c, d), where a-c ≥ 0, b-d ≥ 0 X Y

Properties of DTW Boundary conditions Continuity Monotonicity W1=(1,1) and WK=(n,m) Continuity Given Wk = (a, b), then Wk-1 = (c, d), where a-c ≤ 1, b-d ≤ 1 Monotonicity Wk-1 = (c, d), where a-c ≥ 0, b-d ≥ 0 C. S. Myers and L. R. Rabiner. A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60(7):1389-1409, Sept. 1981. Sakoe, H. and Chiba, S., Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1) pp. 43– 49, 1978, ISSN: 0096-3518

Advantages Query and target lengths may not be of equal length  Can tolerate noise: time shifts sequences out of phase scaling in the y-axis

Disadvantages Computational complexity: O(nm) May not be able to handle some types of noise... It is not metric (triangle inequality does not hold)

r = Global Constraints Sakoe-Chiba Band Itakura Parallelogram Slightly speed up the calculations and prevent pathological warpings A global constraint limits the indices of the warping path wk = (i, j)k such that j-r  i  j+r Where r is a term defining allowed range of warping for a given point in a sequence r = pathological warpings = smaller section of the series maps to a very longer one Sakoe-Chiba Band Itakura Parallelogram

Complexity of DTW Basic implementation = O(n2) where n is the length of the sequences will have to solve the problem for each (i, j) pair If warping window is specified, then O(nr) only solve for the (i, j) pairs where | i – j | <= r

Longest Common Subsequence Measures (Allowing for Gaps in Sequences) Gap skipped

Longest Common Subsequence (LCSS) LCSS is more resilient to noise than DTW. Disadvantages of DTW: All points are matched Outliers can distort distance One-to-many mapping ignore majority of noise Advantages of LCSS: Outlying values not matched Distance/Similarity distorted less match match

Longest Common Subsequence Similar dynamic programming solution as DTW, but now we measure similarity not distance. Can also be expressed as distance

Similarity Retrieval Range Query Nearest Neighbor query Find all time series X where Nearest Neighbor query Find all the k most similar time series to Q A method to answer the above queries: Linear scan A better approach GEMINI [next time]

Lower Bounding – NN search We can speed up similarity search by using a lower bounding function D: distance measure LB: lower bounding function s.t.: LB(Q, X) ≤ D(Q, X) Intuition Try to use a cheap lower bounding calculation as often as possible Do the expensive, full calculations when absolutely necessary Set best = ∞ For each Xi: if LB(Xi, Q) < best if D(Xi, Q) < best best = D(Xi, Q) 1-NN Search Using LB We assume a database of time series: DB = {X1, X2, …, XN}

Lower Bounding – NN search We can speed up similarity search by using a lower bounding function D: distance measure LB: lower bounding function s.t.: LB(Q, X) ≤ D(Q, X) Intuition Try to use a cheap lower bounding calculation as often as possible Do the expensive, full calculations when absolutely necessary Range Query Using LB For each Xi: if LB(Xi, Q) ≤ ε if D(Xi, Q) < ε report Xi We assume a database of time series: DB = {X1, X2, …, XN}

Problems How to define Lower bounds for different distance measures? How to extract the features? How to define the feature space? Fourier transform Wavelets transform Averages of segments (Histograms or APCA) Chebyshev polynomials .... your favorite curve approximation...

Some Lower Bounds on DTW A B C D LB_Kim Each sequence is represented by 4 features: <First, Last, Min, Max> LB_Kim = maximum squared difference of the corresponding features max(Q) min(Q) LB_Yi

LB_Keogh [Keogh 2004] U Q L Ui = max(qi-r : qi+r) C Q U Sakoe-Chiba Band Q L Ui = max(qi-r : qi+r) Li = min(qi-r : qi+r) C Q U L Q Itakura Parallelogram

LB_Keogh LB_Keogh C U Q L C U L Q C Q Sakoe-Chiba Band C Q Itakura Parallelogram

Tightness of LB 0  T  1 The larger the better …proportional to the length of gray lines used in the illustrations LB_Kim LB_Yi LB_Keogh Sakoe-Chiba LB_Keogh Itakura

Lower Bounding we want to find the 1-NN to our query data series, Q Q distance

this becomes the best so far (BSF) Lower Bounding we compute the distance to the first data series in our dataset, D(S1,Q) this becomes the best so far (BSF) Q true S1 distance

Lower Bounding we compute the distance LB(S2,Q) and it is greater than the BSF we can safely prune it, since D(S2,Q) LB(S2,Q) BSF Q true S1 LB S2 distance

we compute the distance LB(S3,Q) and it is smaller than the BSF Lower Bounding we compute the distance LB(S3,Q) and it is smaller than the BSF we have to compute D(S3,Q)≥ LB(S3,Q), since it may still be smaller than BSF BSF Q LB S3 true S1 LB S2 distance

it turns out that D(S3,Q)≥ BSF, so we can safely prune S3 Lower Bounding it turns out that D(S3,Q)≥ BSF, so we can safely prune S3 BSF Q true S1 true S3 LB S2 distance

Lower Bounding BSF Q true S1 true S3 LB S2 distance

we compute the distance LB(S4,Q) and it is smaller than the BSF Lower Bounding we compute the distance LB(S4,Q) and it is smaller than the BSF we have to compute D(S4,Q)≥ LB(S4,Q), since it may still be smaller than BSF BSF Q LB S4 true S1 true S3 LB S2 distance

it turns out that D(S4,Q)< BSF, so S4 becomes the new BSF Lower Bounding it turns out that D(S4,Q)< BSF, so S4 becomes the new BSF BSF Q true S4 true S1 true S3 LB S2 distance

S1 cannot be the 1-NN, because S4 is closer to Q Lower Bounding S1 cannot be the 1-NN, because S4 is closer to Q BSF Q true S4 true S1 true S3 LB S2 distance

How about subsequence matching? DTW is defined for full-sequence matching: All points of the query sequence are matched to all points of the target sequence Subsequence matching: The query is matched to a part (subsequence) of the target sequence Query sequence Data stream

Subsequence Matching X: long sequence Q: short sequence What subsequence of X is the best match for Q? Q: short sequence

J-Position Subsequence Match X: long sequence X: long sequence position j What subsequence of X is the best match for Q … such that the match ends at position j? Q: short sequence Q: short sequence

J-Position Subsequence Match X: long sequence X: long sequence position j Naïve Solution: DTW Examine all possible subsequences Q: short sequence Q: short sequence

J-Position Subsequence Match X: long sequence X: long sequence X: long sequence position j Naïve Solution: DTW Examine all possible subsequences Naïve Solution: DTW Examine all possible subsequences Q: short sequence Q: short sequence Q: short sequence

J-Position Subsequence Match X: long sequence X: long sequence X: long sequence position j Naïve Solution: DTW Examine all possible subsequences Naïve Solution: DTW Examine all possible subsequences Q: short sequence Q: short sequence Q: short sequence

J-Position Subsequence Match X: long sequence X: long sequence X: long sequence position j Naïve Solution: DTW Examine all possible subsequences Naïve Solution: DTW Examine all possible subsequences Too costly! Q: short sequence Q: short sequence Q: short sequence

Capture the optimal subsequence starting from t = tstart Why not ‘naive’? Compute the time warping matrices starting from every database frame Need O(n) matrices, O(nm) time per frame Capture the optimal subsequence starting from t = tstart n Q m x1 xtstart xtend X

Key Idea Star-padding Use only a single matrix (the naïve solution uses n matrices) Prefix Q with ‘*’, that always gives zero distance Instead of Q=(q1 , q2 , …, qm), compute distances with Q’ O(m) time and space (the naïve requires O(nm))

SPRING: dynamic programming query Q * database sequence X Initialization Insert a “dummy” state ‘*’ at the beginning of the query ‘*’ matches every value in X with score 0

SPRING: dynamic programming query Q * database sequence X Computation Perform dynamic programming computation in a similar manner as standard DTW (i, j) (i-1, j) (i-1, j-1) (i, j-1) (i, j)

SPRING: dynamic programming query Q i Q[1:i] is matched with X[s,j] * s j database sequence X For each (i, j): compute the j-position subsequence match of the first i items of Q to X[s:j]

SPRING: dynamic programming query Q * database sequence X For each (i, j): compute the j-position subsequence match of the first i items of Q to X[s:j] Top row: j-position subsequence match of Q for all j’s Final answer: best among j-position matches Look at answers stored at the top row of the table

Subsequence vs. full matching query Q * database sequence X Q p1 pi pN q1 qj qM

Computational complexity query Q * database sequence X Assume that the database is one very long sequence Concatenate all sequences into one sequence O (|Q| * |X|) But can be computed faster by looking at only two adjacent columns

STWM (Subsequence Time Warping Matrix) Problem of the star-padding: we lose the information about the starting frame of the match After the scan, “which is the optimal subsequence?” Elements of STWM Distance value of each subsequence Starting position !! Combination of star-padding and STWM Efficiently identify the optimal subsequence in a stream fashion

Up next… Time series summarizations Time series classification Discrete Fourier Transform (DFT) Discrete Wavelet Transform (DWT) Piecewise Aggregate Approximation (PAA) Symbolic ApproXimation (SAX) Time series classification Lazy learners Shapelets