黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.

Slides:



Advertisements
Similar presentations
Incremental Clustering for Trajectories
Advertisements

AMCS/CS229: Machine Learning
An Interactive-Voting Based Map Matching Algorithm
Clustering.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
PARTITIONAL CLUSTERING
Fast Algorithms For Hierarchical Range Histogram Constructions
Osmar Zaïane and Chi-Hoon Lee Database Laboratory Dept. of Computing Science University of Alberta Density-Based Clustering of Spatial Data when facing.
VLDB  Motivation  TraClass: Trajectory Feature Generation  Trajectory Partitioning  Region-Based Clustering  Trajectory-Based.
Alla Petrakova.  Becoming familiar with Motion Pattern algorithms described in: Similarity Invariant Classification of Events by KL Divergence Minimization.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Motion Patterns Alla Petrakova & Steve Mussmann. Trajectory Clustering Trajectory clustering is a well-established field of research in Data Mining area.
MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng,
A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.
Avatar Path Clustering in Networked Virtual Environments Jehn-Ruey Jiang, Ching-Chuan Huang, and Chung-Hsien Tsai Adaptive Computing and Networking Lab.
Cluster Analysis.
An Introduction to Clustering
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
What is Cluster Analysis
Cluster Analysis.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Tokyo Research Laboratory © Copyright IBM Corporation 2009 | 2009/04/03 | SDM 09 / Travel-Time Prediction Travel-Time Prediction using Gaussian Process.
On Simultaneous Clustering and Cleaning over Dirty Data
Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL BING HU THANAWIN RAKTHANMANON YUAN HAO SCOTT EVANS1 STEFANO LONARDI EAMONN.
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica TrajPattern: Mining Sequential Patterns from Imprecise Trajectories.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Outlier Detection Lian Duan Management Sciences, UIOWA.
Density-Based Clustering Algorithms
Computer Vision Lab Seoul National University Keyframe-Based Real-Time Camera Tracking Young Ki BAIK Vision seminar : Mar Computer Vision Lab.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Vehicle Segmentation and Tracking From a Low-Angle Off-Axis Camera Neeraj K. Kanhere Committee members Dr. Stanley Birchfield Dr. Robert Schalkoff Dr.
Clustering.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Trajectory Outlier Detection: A Partition-and-Detect Framework1 04/08/08 April 8, 2007 Trajectory Outlier Detection: A Partition-and-Detect Framework Jae-Gil.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Week Aug-24 – Aug-29 Introduction to Spatial Computing CSE 5ISC Some slides adapted from the book Computing with Spatial Trajectories, Yu Zheng and Xiaofang.
Evaluation of Image Segmentation algorithms By Dr. Rajeev Srivastava.
Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Marko Živković 3179/2015.  Clustering is the process of grouping large data sets according to their similarity  Density-based clustering: ◦ groups together.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
基于多核加速计算平台的深度神经网络 分割与重训练技术
Trajectory Clustering
Discrete Controller Synthesis
Inferring Road Networks from GPS Trajectories
Presentation transcript:

黃福銘 (Angus)

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Introduction Trajectory clustering Trajectory partitioning Line segment clustering Experimental evaluation Discussion and conclusions 3

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Background The key observation Examples in real applications Possible arguments Contributions 4

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Previous research has mainly dealt with clustering of point data K-means, BIRCH, DBSCAN, OPTICS, STING Recent researches cluster trajectories as a whole Improvements in satellites and tracking facilities 5

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Clustering trajectories as a whole could not detect similar portions of the trajectories 6

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Hurricanes : landfall forecasts Coastline: at the time of landing Sea: before landing Animal movements : effects of roads and traffic Road segments Traffic rate 7

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab If we prune the useless parts of trajectories and keep only the interesting ones  It is tricky to determine which part of the trajectories is useless  Pruning useless parts of trajectories forbids us to discover unexpected clustering results 8

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Partition-and-group framework To cluster trajectories To discover common sub-trajectories Formal trajectory partitioning algorithm Minimum description length principle Density-based clustering algorithm for line segments Demonstrate by using various real data sets 9

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Problem statement The TRACLUS algorithm Distance function 10

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Input : a set of trajectories T = {TR 1,…,TR num tra } Output : a set of clusters O = {C 1,…,C num clus } Trajectory Tr i = p 1 p 2 p 3 …p j …p len i Sub-trajectory Characteristic point Cluster A set of trajectory partitions Representative trajectory The major behavior of the trajectory partitions 11

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 12

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 13

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab The perpendicular distance ( d ┴ ) The parallel distance ( d || ) The angle distance ( d θ ) 14

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Desirable properties Formalization using the MDL principle Approximate solution 15

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Preciseness The difference between a trajectory and a set of its trajectory partitions should be as small as possible Conciseness The number of trajectory partitions should be as small as possible 16

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab To find the optimal tradeoff between preciseness and conciseness Minimum description length (MDL) Cost components: H hypothesis; D data. L(H) is the length, in bits, of the description of the hypothesis; and L(D|H) is the length, in bits, of the description of the data when encoded with the help of the hypothesis. Definition: The best hypothesis H to explain D is the one that minimizes the sum of L(H) and L(D|H). A hypothesis corresponds to a specific set of trajectory partitions Find the optimal partitioning translates to finding the best hypothesis using the MDL principle 17

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab L(H) represents the sum of the length of all trajectory partitions L(D|H) represents the sum of the difference between a trajectory and a set of its trajectory partitions 18 So~ Let’s minimize the L(H)+L(D|H)

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 19 MDL=L(H)+L(D|H)

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Density of line segments Clustering algorithm Representative trajectory of a cluster Heuristic for parameter value selection 20

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab ε-neighborhood Core line segment Directly density-reachable Density-reachable Density-connected Density-connected set 21

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab L1, L2, L3, L4, and L5 are core line segments L2 (or L3) is directly density-reachable from L1 L6 is density-reachable from L1, but not vice versa L1, L4, and L5 are all density-connected 22

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab A short line segment might induce over-clustering Our experience indicates that increasing the length of trajectory partitions by 20~30% generally improves the clustering quality 23

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 24 A cluster is a density- connected set Trajectory cardinality Be classified as a cluster or a noise Directly density- reachable ε-neighborhood Core line segment Cardinality checking

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 25

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 26

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab The value of the ε and MinLns Simulated annealing Entropy function 28

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Experimental setting Results for hurricane track data Results for animal movement data Effects of parameter values 29

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Hurricane track data set Atlantic 1950~ trajectories and points Latitude and longitude Animal movement data set Elk, 1993: 33 trajectories and points Deer 1995: 32 trajectories and points 30

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab No well-defined measure for density-based clustering methods Sum of Squared Error (SSE) N : the set of all noise line segments The noise penalty becomes larger if we select too small ε or too large MinLns 31

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 32

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab 33

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Use smaller ε or larger MinLns Discovers a larger number of smaller clusters Use a larger ε or a smaller MinLns Discovers a smaller number of larger clusters 34

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Discussion Conclusions 35

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Extensibility Undirected or weighted trajectories Parameter insensitivity Point data, trajectory data Efficiency index Movement patterns Circular motion.. Temporal information 36

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Partition-and-group framework Trajectory clustering algorithm TRACLUS Two real data sets experiments A visual inspection tool Common sub-trajectories 37

Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Detailed sentence with explicit illustration ! What is the principle of the parallel distance function ? (p.14) What is the base for the 20~30% length increasing? (p.23) 38