Presentation is loading. Please wait.

Presentation is loading. Please wait.

黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.

Similar presentations


Presentation on theme: "黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07."— Presentation transcript:

1 黃福銘 (Angus)

2 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07 2012.01.04 2

3 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Introduction Trajectory clustering Trajectory partitioning Line segment clustering Experimental evaluation Discussion and conclusions 3

4 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Background The key observation Examples in real applications Possible arguments Contributions 4

5 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Previous research has mainly dealt with clustering of point data K-means, BIRCH, DBSCAN, OPTICS, STING Recent researches cluster trajectories as a whole Improvements in satellites and tracking facilities 5

6 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Clustering trajectories as a whole could not detect similar portions of the trajectories 6

7 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Hurricanes : landfall forecasts Coastline: at the time of landing Sea: before landing Animal movements : effects of roads and traffic Road segments Traffic rate 7

8 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab If we prune the useless parts of trajectories and keep only the interesting ones  It is tricky to determine which part of the trajectories is useless  Pruning useless parts of trajectories forbids us to discover unexpected clustering results 8

9 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Partition-and-group framework To cluster trajectories To discover common sub-trajectories Formal trajectory partitioning algorithm Minimum description length principle Density-based clustering algorithm for line segments Demonstrate by using various real data sets 9

10 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Problem statement The TRACLUS algorithm Distance function 10

11 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Input : a set of trajectories T = {TR 1,…,TR num tra } Output : a set of clusters O = {C 1,…,C num clus } Trajectory Tr i = p 1 p 2 p 3 …p j …p len i Sub-trajectory Characteristic point Cluster A set of trajectory partitions Representative trajectory The major behavior of the trajectory partitions 11

12 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 12

13 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 13

14 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab The perpendicular distance ( d ┴ ) The parallel distance ( d || ) The angle distance ( d θ ) 14

15 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Desirable properties Formalization using the MDL principle Approximate solution 15

16 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Preciseness The difference between a trajectory and a set of its trajectory partitions should be as small as possible Conciseness The number of trajectory partitions should be as small as possible 16

17 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab To find the optimal tradeoff between preciseness and conciseness Minimum description length (MDL) Cost components: H hypothesis; D data. L(H) is the length, in bits, of the description of the hypothesis; and L(D|H) is the length, in bits, of the description of the data when encoded with the help of the hypothesis. Definition: The best hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H). A hypothesis corresponds to a specific set of trajectory partitions Find the optimal partitioning translates to finding the best hypothesis using the MDL principle 17

18 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab L(H) represents the sum of the length of all trajectory partitions L(D|H) represents the sum of the difference between a trajectory and a set of its trajectory partitions 18 So~ Let’s minimize the L(H)+L(D|H)

19 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 19 MDL=L(H)+L(D|H)

20 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Density of line segments Clustering algorithm Representative trajectory of a cluster Heuristic for parameter value selection 20

21 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab ε-neighborhood Core line segment Directly density-reachable Density-reachable Density-connected Density-connected set 21

22 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab L1, L2, L3, L4, and L5 are core line segments L2 (or L3) is directly density-reachable from L1 L6 is density-reachable from L1, but not vice versa L1, L4, and L5 are all density-connected 22

23 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab A short line segment might induce over-clustering Our experience indicates that increasing the length of trajectory partitions by 20~30% generally improves the clustering quality 23

24 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 24 A cluster is a density- connected set Trajectory cardinality Be classified as a cluster or a noise Directly density- reachable ε-neighborhood Core line segment Cardinality checking

25 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 25

26 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 26

27 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 27 3 3 5 5 6 6

28 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab The value of the ε and MinLns Simulated annealing Entropy function 28

29 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Experimental setting Results for hurricane track data Results for animal movement data Effects of parameter values 29

30 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Hurricane track data set Atlantic 1950~2004 570 trajectories and 17736 points Latitude and longitude Animal movement data set Elk, 1993: 33 trajectories and 47204 points Deer 1995: 32 trajectories and 20065 points 30

31 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab No well-defined measure for density-based clustering methods Sum of Squared Error (SSE) N : the set of all noise line segments The noise penalty becomes larger if we select too small ε or too large MinLns 31

32 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 32

33 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 33

34 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Use smaller ε or larger MinLns Discovers a larger number of smaller clusters Use a larger ε or a smaller MinLns Discovers a smaller number of larger clusters 34

35 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Discussion Conclusions 35

36 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Extensibility Undirected or weighted trajectories Parameter insensitivity Point data, trajectory data Efficiency index Movement patterns Circular motion.. Temporal information 36

37 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Partition-and-group framework Trajectory clustering algorithm TRACLUS Two real data sets experiments A visual inspection tool Common sub-trajectories 37

38 Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Detailed sentence with explicit illustration ! What is the principle of the parallel distance function ? (p.14) What is the base for the 20~30% length increasing? (p.23) 38


Download ppt "黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07."

Similar presentations


Ads by Google