Presentation on theme: "Incremental Clustering for Trajectories"— Presentation transcript:
1 Incremental Clustering for Trajectories Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei HanUniv. of Illinois at Urbana-ChampaignDASFAA Conference 2010April, Tsukuba, Japan
2 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
3 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
4 Tracking by GPS/Sensor is becoming more common VehiclesAnimalsHurricane
5 Moving object data is accumulated fast Taxi tracking system tracks 5,000 taxis in San FranciscoLocation information received each taxi every minuteAfter a day, 7.2 million points collectedAfter a week, 50.4 million points collected...
6 Online monitoring demand Trajectory clusters have applications indiscovering common hurricane pathsmonitoring hot traffic pathsanalyzing animals’ movementAs data is updated along with time, there is need to online monitor the clustering result.But, it is inefficient to compute the trajectory clusters from scratch every time.
7 New data will only affect local shifts The key observation is that, the new data will only affect local shifts.Snapshot Time 1Snapshot Time 2
8 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
9 TRACLUS: trajectory clustering Clustering trajectories as a whole could not detect similar portions of the trajectories (i.e., common sub-trajectories)Example: if we cluster TR1~TR5 as a whole, we cannot discover the common behavior since they move to totally different directionsJae-Gil Lee, Jiawei Han, and Kyu-Young Whang, “Trajectory Clustering: A Partition-and- Group Framework”, in Proc ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.A common sub-trajectoryTR5TR1TR2TR3TR4
10 The Partition-and-Group Framework Consists of two phases: partitioning and groupingTR5TR1TR2TR3TR4(1) PartitionA set of trajectoriesA representative trajectory(2) GroupA clusterA set of line segmentsNote: a representative trajectory is a common sub-trajectory
11 PartitionIdentify the points where the behavior of a trajectory changes rapidly; such points are called characteristic pointsA trajectory is partitioned at every characteristic pointA line segment between consecutive characteristic points is called a trajectory partition: characteristic point : trajectory partition
12 Group Group line segments based on density L1, L2, L3, L4, and L5 are core line segmentsL2 (or L3) is directly density-reachable from L1L6 is density-reachable from L1, but not vice versaL1, L4, and L5 are all density-connectedL1L3L5L2L4L6L L L L L L4MinLns = 3
13 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
14 TCMM Framework Trajectories received along with time Partition the trajectory into line segmentsA micro-cluster stores a small group of close line segmentsA macro-cluster a cluster of micro-clusters
15 Data PreprocessFinding the optimal partitioning translates to finding the best hypothesis using the MDL principleH a set of trajectory partitions, D a trajectoryL(H) the sum of the length of all trajectory partitionsL(D|H) the sum of the difference between a trajectory and a set of its trajectory partitionsL(H) measures conciseness; L(D|H) preciseness
16 Micro-Cluster Definition Micro-cluster maintains a fine-granularity clustering.Each micro-cluster holds and summarizes the information of local partitioned trajectories.A micro-cluster for a set of directed line segments is defined as the tuple::number of line segments:linear sums of the line segments’ center points, angles and lengths:squared sums of the line segments’ center points, angles and lengths
17 Distance between Micro-Clusters Representative line segment of a micro-clusterDistance between two micro-clusters can be defined as the distance between representative line segments of two micro- clusters
18 Creating and updating Micro-Cluster When a new line segment is receivedFind the closest micro-clusterIf the distance is between the new line segment and its closest micro-cluster is less than threshold , add the new line segment into this micro-cluster and update the micro-clusterIf not, create a new micro-cluster, and the new micro-cluster only contains this line segmentare the center, angle, and length of this line segmentare the square of the center, angle and length of this line segment
19 Merging Micro-Clusters Why merging micro-clusters?If the number of micro-clusters is large, it is time-consuming tofind the closest micro-cluster when a new line segment is receiveddo macro-clustering over micro-clustersAnd the memory might not be enough to store all the micro- clustersMerge close micro-clusters to save storage space and save computation time“closeness” can be simply defined as the distance between two micro-clustersHowever, it does not consider the “tightness” of a micro-cluster
20 Merging Micro-Clusters (cont.) We prefer to merge loose micro-clusters rather than tight ones to better preserve the “tightness” of micro-clusters.Lose more information when merging two tight micro-clusters.
21 Merging Micro-Clusters (cont.) Introducing “extent” of a micro-clusterExtent defines the tightness of a micro-cluster in terms of center, angle and length
22 Merging Micro-Clusters (cont.) Distance between micro-clusters with extentCenter distanceAngle distanceLength distance
24 Macro-ClusteringMacro-clustering is evoked only when it is called upon by the userMacro-clustering is performed on the representative line segments of micro-clustersSimilar to the group step in TRACLUS framework
25 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
26 ExperimentReal taxi data in san Francisco, trajectories in a week, 100,000 points in total
27 Experiment (cont.) Effectiveness SSQ (sum of squared distance) is the average of all the line segments to the centroid of its macro-clusterTCMM reaches similar quality as TRACLUS
28 Experiment (cont.)EfficiencyTCMM is much faster than TRACLUS
29 Experiment (cont.) Sensitivity with parameter: When d_max is larger, the quality is lower but the efficiency is better
30 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
31 ConclusionWe address the problem to incrementally cluster trajectories.We propose the TCMM (Trajectory Clustering based on Micro- and Macro-clustering) framework.The definition of extent is proposed to better capture the “tightness” of micro-clusters.Experiments show that TCMM achieves similar quality as TRACLUS but it is much faster.
32 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clusteringExperimentConclusionFuture Work
33 Future Work Efficiency Parameter insensitivity Temporal information Use an index to find closest micro-clusterNot easy because our distance function is non-metricParameter insensitivityMake our algorithm more insensitive to parameter valuesTemporal informationTake account of temporal information during clusteringOther applicationsIncrementally discover outliers and patterns