Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering.

Similar presentations


Presentation on theme: "Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering."— Presentation transcript:

1 Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering for Trajectories

2 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

3 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

4 Tracking by GPS/Sensor is becoming more common Hurricane Animals Vehicles

5 Moving object data is accumulated fast Taxi tracking system tracks 5,000 taxis in San Francisco Location information received each taxi every minute After a day, 7.2 million points collected After a week, 50.4 million points collected...

6 Online monitoring demand Trajectory clusters have applications in discovering common hurricane paths monitoring hot traffic paths analyzing animals movement As data is updated along with time, there is need to online monitor the clustering result. But, it is inefficient to compute the trajectory clusters from scratch every time.

7 New data will only affect local shifts The key observation is that, the new data will only affect local shifts. Snapshot Time 1Snapshot Time 2

8 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

9 TRACLUS: trajectory clustering Clustering trajectories as a whole could not detect similar portions of the trajectories (i.e., common sub-trajectories) Example: if we cluster TR 1 ~TR 5 as a whole, we cannot discover the common behavior since they move to totally different directions Jae-Gil Lee, Jiawei Han, and Kyu-Young Whang, Trajectory Clustering: A Partition-and- Group Framework, in Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007. A common sub-trajectory TR 5 TR 1 TR 2 TR 3 TR 4

10 The Partition-and-Group Framework Consists of two phases: partitioning and grouping TR 5 TR 1 TR 2 TR 3 TR 4 A set of trajectories A set of line segments A cluster (1) Partition (2) Group A representative trajectory Note: a representative trajectory is a common sub-trajectory

11 Partition Identify the points where the behavior of a trajectory changes rapidly; such points are called characteristic points A trajectory is partitioned at every characteristic point A line segment between consecutive characteristic points is called a trajectory partition : characteristic point : trajectory partition

12 Group Group line segments based on density L 1, L 2, L 3, L 4, and L 5 are core line segments L 2 (or L 3 ) is directly density-reachable from L 1 L 6 is density-reachable from L 1, but not vice versa L 1, L 4, and L 5 are all density-connected L1L1 L3L3 L5L5 L2L2 L4L4 L6L6 L 6 L 5 L 3 L 1 L 2 L 4 MinLns = 3

13 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

14 TCMM Framework Trajectories received along with time Partition the trajectory into line segments A micro-cluster stores a small group of close line segments A macro-cluster a cluster of micro- clusters

15 Data Preprocess Finding the optimal partitioning translates to finding the best hypothesis using the MDL principle H a set of trajectory partitions, D a trajectory L(H) the sum of the length of all trajectory partitions L(D|H) the sum of the difference between a trajectory and a set of its trajectory partitions L(H) measures conciseness; L(D|H) preciseness

16 Micro-Cluster Definition Micro-cluster maintains a fine-granularity clustering. Each micro-cluster holds and summarizes the information of local partitioned trajectories. A micro-cluster for a set of directed line segments is defined as the tuple: :number of line segments :linear sums of the line segments center points, angles and lengths :squared sums of the line segments center points, angles and lengths

17 Distance between Micro-Clusters Representative line segment of a micro-cluster Distance between two micro-clusters can be defined as the distance between representative line segments of two micro- clusters

18 Creating and updating Micro-Cluster When a new line segment is received Find the closest micro-cluster If the distance is between the new line segment and its closest micro-cluster is less than threshold, add the new line segment into this micro-cluster and update the micro-cluster If not, create a new micro-cluster, and the new micro-cluster only contains this line segment are the center, angle, and length of this line segment are the square of the center, angle and length of this line segment

19 Merging Micro-Clusters Why merging micro-clusters? If the number of micro-clusters is large, it is time-consuming to find the closest micro-cluster when a new line segment is received do macro-clustering over micro-clusters And the memory might not be enough to store all the micro- clusters Merge close micro-clusters to save storage space and save computation time closeness can be simply defined as the distance between two micro-clusters However, it does not consider the tightness of a micro-cluster

20 Merging Micro-Clusters (cont.) We prefer to merge loose micro-clusters rather than tight ones to better preserve the tightness of micro-clusters. Lose more information when merging two tight micro-clusters.

21 Merging Micro-Clusters (cont.) Introducing extent of a micro-cluster Extent defines the tightness of a micro-cluster in terms of center, angle and length

22 Merging Micro-Clusters (cont.) Distance between micro-clusters with extent Center distanceAngle distance Length distance

23 Micro-clustering summary

24 Macro-Clustering Macro-clustering is evoked only when it is called upon by the user Macro-clustering is performed on the representative line segments of micro-clusters Similar to the group step in TRACLUS framework

25 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

26 Experiment Real taxi data in san Francisco, 7000+ trajectories in a week, 100,000 points in total

27 Experiment (cont.) Effectiveness SSQ (sum of squared distance) is the average of all the line segments to the centroid of its macro-cluster TCMM reaches similar quality as TRACLUS

28 Experiment (cont.) Efficiency TCMM is much faster than TRACLUS

29 Experiment (cont.) Sensitivity with parameter: When d_max is larger, the quality is lower but the efficiency is better

30 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

31 Conclusion We address the problem to incrementally cluster trajectories. We propose the TCMM (Trajectory Clustering based on Micro- and Macro-clustering) framework. The definition of extent is proposed to better capture the tightness of micro-clusters. Experiments show that TCMM achieves similar quality as TRACLUS but it is much faster.

32 Outline Motivation Introducing previous work TRACLUS Trajectory Clustering using Micro- and Macro-clustering Experiment Conclusion Future Work

33 Efficiency Use an index to find closest micro-cluster Not easy because our distance function is non-metric Parameter insensitivity Make our algorithm more insensitive to parameter values Temporal information Take account of temporal information during clustering Other applications Incrementally discover outliers and patterns

34 Thank you!


Download ppt "Zhenhui Li, Jae-Gil Lee, Xiaolei Li, Jiawei Han Univ. of Illinois at Urbana-Champaign DASFAA Conference 2010 April, Tsukuba, Japan Incremental Clustering."

Similar presentations


Ads by Google