Clustering Trajectories of Moving Objects in an Uncertain World

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus.
Incremental Clustering for Trajectories
On-Line Discovery of Hot Motion Paths D. Sacharidis 1, K. Patroumpas 1, M. Terrovitis 1, V. Kantere 1, M. Potamias 2, K. Mouratidis 3, T. Sellis 1 1 National.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
Trajectory Pattern Mining NTU IM Hsieh, Hsun-Ping Trajectory Pattern Mining Reporter : Hsieh, Hsun-Ping 解巽評 (R ) Fosca Giannotti Mirco Nanni Dino.
Avatar Path Clustering in Networked Virtual Environments Jehn-Ruey Jiang, Ching-Chuan Huang, and Chung-Hsien Tsai Adaptive Computing and Networking Lab.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Privacy Preserving Publication of Moving Object Data Joey Lei CS295 Francesco Bonchi Yahoo! Research Avinguda Diagonal 177, Barcelona, Spain 6/10/20151CS295.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dieter Pfoser, LBS Workshop1 Issues in the Management of Moving Point Objects Dieter Pfoser Nykredit Center for Database Research Aalborg University, Denmark.
Based on Slides by D. Gunopulos (UCR)
Cluster Analysis (1).
The Tornado Model: Uncertainty Model for Continuously Changing Data Byunggu Yu 1, Seon Ho Kim 2, Shayma Alkobaisi 2, Wan Bae 2, Thomas Bailey 3 Department.
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Birch: An efficient data clustering method for very large databases
1 Preserving Privacy in GPS Traces via Uncertainty-Aware Path Cloaking by: Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady ACM CCS '07 Presentation:
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
GPS Trajectories Analysis in MOPSI Project Minjie Chen SIPU group Univ. of Eastern Finland.
GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
APT: Accurate Outdoor Pedestrian Tracking with Smartphones TsungYun
Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi Efficient distributed computation of human mobility aggregates through user mobility profiles.
Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.
Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
Trajectory Pattern Mining
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Trajectory Pattern Mining Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Fabio Pinelli KDD Lab (ISTI-CNR & Univ. Pisa) Presented by: Qiming Zou.
Universität Stuttgart Institute of Parallel and Distributed Systems (IPVS) Universitätsstraße 38 D Stuttgart Scalable Processing of Trajectory-Based.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Lecture 07: Dealing with Big Data
黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.
Presented by Ho Wai Shing
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Other Clustering Techniques
Designing Factorial Experiments with Binary Response Tel-Aviv University Faculty of Exact Sciences Department of Statistics and Operations Research Hovav.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Motion Segmentation at Any Speed Shrinivas J. Pundlik Department of Electrical and Computer Engineering, Clemson University, Clemson, SC.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Anomaly Detection Carolina Ruiz Department of Computer Science WPI Slides based on Chapter 10 of “Introduction to Data Mining” textbook by Tan, Steinbach,
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Network-Wide Bike Availability Clustering Using the College Admission Algorithm: A Case Study of San Francisco Bay Area Hesham Rakha, Ph.D., P.Eng. Samuel.
Clustering Uncertain Taxi data
Time Relaxed Spatiotemporal Trajectory Joins
Spatial Databases: Spatio-Temporal Databases
Topological Signatures For Fast Mobility Analysis
Presentation transcript:

Clustering Trajectories of Moving Objects in an Uncertain World Nikos Pelekis1, Ioannis Kopanakis2, Evangelos E. Kotsifakos1, Elias Frentzos1, Yannis Theodoridis1 1 Dept. of Informatics, Univ. of Piraeus, Greece 2 Tech. Educational Institute of Crete, Greece IEEE International Conference on Data Mining (ICDM 2009), Miami, FL, USA, 6-9 December, 2009

Outline Related work Motivation Our contribution Experimental study From Trajectories to Intuitionistic Fuzzy Sets A similarity metric for Uncertain Trajectories (Un-Tra) Cen-Tra: The Centroid Trajectory of a bunch of trajectories TR-I-FCM: A novel clustering algorithm for Un-Tra Experimental study Conclusions & future work Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Related Work on Mobility Data Mining Trajectory clustering

Trajectory Clustering Questions: Which distance between trajectories? Which kind of clustering? What is a cluster ‘mean’ or ‘centroid’? A representative trajectory? Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Which distance? ò Average Euclidean distance “Synchronized” behaviour distance Similar objects = almost always in the same place at the same time Computed on the whole trajectory Computational aspects: Cost = O( |1| + |2| ) (|| = number of points in ) It is a metric => efficient indexing methods allowed, e.g. [Frentzos et al. 2007] Timeseries-based approaches: LCSS, DTW, ERP, EDR Trajectory-oriented approach: (time-relaxed) route similarity vs. (time-aware) trajectory similarity and variations (speed-pattern based similarity; directional similarity; …) [Pelekis et al. 2007] | )) ( ), ) , 2 1 T dt t d D ò = distance between moving objects 1 and 2 at time t Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World" 5

Which kind of clustering? K-means HAC-average Reachability plot (= objects reordering for distance distribution) T-OPTICS [Nanni & Pedreschi, 2006]  threshold Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World" 6

TRACLUS: A Partition-and-Group Framework [Lee et al. 2007] Discovers similar portions of trajectories (sub-trajectories) Two phases: partitioning and grouping Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

What about usage of Mobility Patterns? Visual analytics for mobility data

Visual analytics for mobility data [Andrienko et al. 2007] What is an appropriate way to visualize groups of trajectories? Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Summarizing a bunch of trajectories 1) Trajectories  sequences of “moves” between “places” 2) For each pair of “places”, compute the number of “moves” 3) Represent “moves” by arrows (with proportional widths) Many small moves Major flow Minor variations Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

A word on uncertainty

Handling Uncertainty Handling uncertainty is a relatively new topic! A lot of research effort has been assigned Developing models for representing uncertainty in trajectories. The most popular one [Trajcevski et al. 2004]: a trajectory of an object is modeled as a 3D cylindrical volume around the tracked trajectory (polyline) Various degrees of uncertainty Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Coming back to our approach

Motivation Challenge 1: Introduce trajectory fuzziness in spatial clustering techniques The application of spatial clustering algorithms (k-means, BIRCH, DBSCAN, STING, …) to Trajectory Databases (TD) is not straightforward Fuzzy clustering algorithms (Fuzzy C-Means and its variants) quantify the degree of membership of each data vector to a cluster The inherent uncertainty in TD should taken into account. Challenge 2: study the nature of the centroid / mean / representative trajectory in a cluster of trajectories. Is it a ‘trajectory’ itself? Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Our contribution I-Un-Tra: An intuitionistic fuzzy vector representation of trajectories enables clustering of trajectories by existing (fuzzy or not) clustering algorithms DUnTra: A distance metric of uncertain trajectories Cen-Tra: The centroid of a bunch of trajectories using density and local similarity properties TR-I-FCM: A novel modification of FCM algorithm for clustering complex trajectory datasets exploiting on DUnTra and Cen-Tra. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

From Fuzzy sets to Intuitionistic fuzzy sets Definition 1 (Zadeh, 1965). Let a set E be fixed. A fuzzy set on E is an object of the form Definition 2 (Atanassov, 1986; Atanassov, 1994). An intuitionistic fuzzy set (IFS) A is an object of the form where where and Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Hesitancy For every element The hesitancy of the element x to the set A is Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Vector representation of trajectories Assume a regular grid G(m  n) consisting of cells ck,l , a trajectory and a target dimension p << ni, The “approximate trajectory” consists of p regions (i.e. sets of cells) crossed by Ti during period pj The “Uncertain Trajectory” is the ε-buffer of Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Intuitionistic Uncertain Trajectories membership = inside cell with 100% probability (i.e. thick portions) non-membership = outside cell with 100% probability (i.e. dotted portions) hesitancy = ignorance whether inside or outside the cell (i.e. solid thin portions) A cell ck.l ck.l ε Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Intuitionistic Uncertain Trajectories Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Proposed similarity metric (1/2) The distance between two I-UnTra A and B is: where and Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Proposed similarity metric (2/2) Assuming two intuitionistic fuzzy sets on it, A = (MA, ΓA, ΠA) and B = (MB, ΓB, ΠΒ), with the same cardinality n, the similarity measure Z between A and B is given by the following equation: where z(A’,B’) for fuzzy sets A' and B' (e.g. for MA, MB) is defined as: and similarly for ΓA, ΓB and ΠA, ΠB. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

C is more similar to A than B An example A={x, 0.4, 0.2}, B={x, 0.5, 0.3}, C={x, 0.5, 0.2} C is more similar to A than B Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

The Centroid Trajectory The idea (similarity-density-based approach): adopt some local similarity function to identify common sub-trajectories (concurrent existence in space-time), follow a region growing approach according to density Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Algorithm CenTra: An example     Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

The Centroid Trajectory Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Fuzzy C-Means algorithm The FCM objective function: Given that to be minimized requires: and The FCM algorithm minimizes intra-cluster variance, but shares the same problems as k-means. It does not ensure that it converges to an optimal solution, while the identified minimum is local and the results depend on the initial choice of centroids. FCM tries to partition the dataset by just looking at the data vectors and as such it ignores the fact that these vectors may be accompanied by qualitative information which may be given per feature (i.e. dimension)! 1. Determine c (1 < c < N), and initialize V(0), j=1, 2. Calculate the membership matrix U(j), 3. Update the centroids’ matrix V(j), 4. If |U(j+1)-U(j)|>ε then j=j+1 and go to Step 2. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Ignore update centroid step CenTR-I-FCM algorithm The FCM objective function: Given that to be minimized requires : and Ignore update centroid step and instead use CenTra 1. V(0) = c random I-UnTra; j=1; 2. repeat 3. Calculate membership matrix U(j) 4. Update the centroids’ matrix V(j) using CenTra; 5. Compute membership and non-membership degrees of V(j) 6. Until ||Uj+1-Uj||F≤ε; j=j+1; Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Experiments (1/2) Dataset: ’Athens trucks’ MOD (www.rtreeportal.org) 50 trucks, 1100 trajectories, 112.300 position records Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

“Round trips” clusters Experiments (2/2) Use CommonGIS [Andrienko et al., 2007] to identify real clusters “Round trips” clusters “Linear” clusters Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Results (Clustering accuracy scaling cell size, ε ) Fix density threshold to δ=2% of the total number of trajectories Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Results (Clustering accuracy scaling density threshold, δ) Fix uncertainty to ε=1 Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Results (scaling the number of clusters) Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Results (scaling the dataset cardinality) Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Results (Quality of CenTra) Representative Trajectories vs. Centroid Trajectories cell size=1.3%, ε=0, δ=0.09 cell size=1.3%, ε=0, δ=0.09, cell size=2.8%, ε=0, δ=0.02 Figure 6(a) illustrates the outcome of TRACLUS. Evidently, the cluster representative (red line) does not fit the real movement, mainly due to its averaging technique. Recall at this point that TRACLUS clusters segments rather than whole trajectories (even considering this, the algorithm does not compass the turn occurring at the bottom of the figure). On the other hand, Figure 6(b) and Figure 6(c) illustrate CenTra, produced with variable cell size, ε and density δ. It turns out that CenTra not only resides on the data traces, but also vanishes the non-interesting movement details (the ‘noisy’ infrequent parts are not part of the centroid), it catches turns, and it becomes thicker in portions where something interesting (i.e. dense-similar subtrajectories) happens. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Conclusions We proposed a three-step approach for clustering trajectories of moving objects, motivated by the observation that clustering and representation issues in TD are inherently subject to uncertainty. 1st step: an intuitionistic fuzzy vector representation of trajectories plus a distance metric consisting of a metric for sequences of regions and a metric for intuitionistic fuzzy sets 2nd step: Algorithm CenTra, a novel technique for discovering the centroid of a bundle of trajectories 3rd step: Algorithm CenTR-I-FCM, for clustering trajectories under uncertainty Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Future Work Devise a clever sampling technique for multi-dimensional data so as to diminish the effect of initialization in the algorithm; Exploit the metric properties of the proposed distance function by using an distance-based index structure (for efficiency purposes); Perform extensive experimental evaluation using large trajectory datasets Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Acknowledgements Research partially supported by the FP7 ICT/FET Project MODAP (Mobility, Data Mining, and Privacy) funded by the European Union. URL: www.modap.org a continuation of the FP6-14915 IST/FET Project GeoPKDD (Geographic Privacy-aware Knowledge Discovery and Delivery) funded by the European Union. URL: www.geopkdd.eu Some slides are from: Fosca Giannotti, Dino Pedreschi, and Yannis Theodoridis, “Geographic Privacy-aware Knowledge Discovery and Delivery”, EDBT Tutorial, 2009. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Back up slides

Examples of mobility patterns exploitation Trajectory Density-based queries Find hot-spots (popular places) [Giannotti et al. 2007] Find T-Patterns [Giannotti et al. 2007] Find hot motion paths [Sacharidis et al. 2008] Find typical trajectories [Lee et al. 2007] Identify flocks & leaders [Benkert et al. 2008] δt ε X Y T Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Which kind of clustering? General requirements: Non-spherical clusters should be allowed E.g.: A traffic jam along a road = “snake-shaped” cluster Tolerance to noise Low computational cost Applicability to complex, possibly non-vectorial data Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World" 41

Temporal focusing Different time intervals can show different behaviours E.g.: objects that are close to each other within a time interval can be much distant in other periods of time The time interval becomes a parameter E.g.: rush hours vs. low traffic times Already supported by the distance measure Just compute D(1 , 2) |T on a time interval T’  T Problem: significant T’ are not always known a priori An automated mechanism is needed to find them Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World" 42

TRACLUS – representative trajectory The representative trajectory of the cluster: Compute the average direction vector and rotate the axes temporarily . Sort the starting and ending points by the coordinate of the rotated axis. While scanning the starting and ending points in the sorted order, count the number of line segments and compute the average coordinate of those line segments. Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"

Trajectory Uncertainty vs. Anonymization Never Walk Alone [Bonchi et al. 2008] Trade uncertainty for anonymity: trajectories that are close up the uncertainty threshold are indistinguishable Combine k-anonymity and perturbation Two steps: Cluster trajectories into groups of k similar ones (removing outliers) Perturb trajectories in a cluster so that each one is close to each other up to the uncertainty threshold Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World" 44 44

Qualitative evaluation of Z Pelekis et al. "Clustering Trajectories of Moving Objects in an Uncertain World"