Mining Frequent Spatio-temporal Sequential Patterns Authors: Huiping Cao, Nikos Mamoulis, David W. Cheung Department of Computer Science, University of Hong Kong Accepted to ICDM 2005 Abhinaya Sinha Vijay Gandhi CSCI 8715 Date: 4/3/06
Outline Introduction Problem Definition Contribution Approach Experimental Evaluation Assumptions Rewrite today
Spatio-Temporal Pattern Spatio-temporal sequence is a list of time-sampled locations. (x1,y1,t1), (x2,y2,t2),…,(xn,yn,tn) A pattern is a subset of the sequence A pattern is frequent if its support >= threshold value Bus routes across 3 runs
Motivation Predict trajectory of an object Detect frequent routes followed by an object Ecology CHIMPS project at UofM (http://www.discoverchimps.org) Tracking chimpanzee movements and analyzing their behavior Location based services Where a user may go next and thus, provide customized services
Problem Definition Input: Given spatio-temporal sequence S and support threshold min_sup Output: Frequent spatio-temporal patterns Objective: Lower computation time and space Constraints: Use of a device such as GPS to sample locations
Contributions Model for spatio-temporal sequential pattern mining Algorithm for extracting frequent singular spatial pattern elements a.k.a. spatial regions Algorithm for combining single regions to patterns of longer length
Methodology Convert spatio-temporal sequence to set of line segments i.e. abstract the trajectory Detect frequent, singular regions Combine singular patterns to longer sequences
Modeling the problem Aim: abstract (approximate) the trajectory Input: original spatio-temporal sequence Output: sequence segments Need locations are not repeated exactly Derived line segments are used for defining spatial regions Compress original data and decrease effort required in mining Uses Douglas-Pecker (DP) algorithm for performing this step
Concepts in Modeling Spatio-temporal Segment Representative line segment of a segment Similarity between line segments | l1.angle – l2.angle | <= θ | l1.len – l2.len | <= length factor * max ( l1.len, l2.len) Closeness between line segments Distance (points belonging to line1, line2) <= ε Mean line segment of a set of line segments Spatial pattern element/ spatial region Sides determined by mean line segment and average orthogonal distance Spatio-temporal sequential pattern Ordered sequence of spatial pattern elements Support of a pattern number of sequences supporting it
Discovering frequent singular patterns Input: All segments representing the spatio-temporal sequence, support threshold Output: All segments have been assigned to regions or been found as outliers. This step constructs frequent single spatial regions Uses similarity and closeness to assign segments to regions
Deriving longer patterns Goal is to derive longer patterns from the singular frequent patterns obtained in previous step. Input is series SR of spatial regions. Output: Frequent patterns. Two algorithms Level-wise mining Mining using substring tree
Example Input: r1-r2-r1-r2-r3 minsup = 2 Output candidates r1-r2 #2; r2-r1 #1; r2-r3 #1 r1 #2 r2 #2; r3 #1 Frequent Patterns: r1-r2; r1; r2
Level-wise pattern mining Apriori Algorithm e.g. input: r1-r2-r1-r2 r1(2) r2(2) r1-r2(2) r2-r1(1) Output: r1; r2; r1-r2
Contribution: Improvements Connectivity constraint Closeness property Input: r1-r2-r1-r3-r1-r2 minsup: 2 r1(3) r2(2) r3(1) r1-r2 r2-r1 r1-r3 r3-r1 r2r3 r3r2
Mining using substring tree Tree structure with each node is pattern element The substring tree is used to help in counting long substrings Depth first search gives substring e.g. r1-r2-r1-r3-r1-r2
Validation Methodology Computer simulations Candidate Algorithms: Level-wise, Grid I, Grid II, Substring tree Evaluation done on Real-world dataset Synthetic dataset
Experimental Evaluation Substring takes least time
Experimental Evaluation Extracted patterns Comparison with grid-based approach
Assumptions Spatio-temporal sequence is a list of (x,y) values Trajectory is abstracted by line segments Closeness and similarity of line segments is defined using 2-D data constructs Hence is only applicable to 2-D data
Rewrite Today Experimental evaluation on more real-life datasets Account for noise in input data (inaccuracy of GPS data) Explore techniques that abstract the spatial trajectory with higher accuracy
Any questions??