Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi.

Slides:

Advertisements

Similar presentations

On the Effect of Trajectory Compression in Spatio-temporal Querying Elias Frentzos, and Yannis Theodoridis Data Management Group, University of Piraeus.

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.

Imbalanced data David Kauchak CS 451 – Fall 2013.

Mauro Sozio and Aristides Gionis Presented By:

Lecture outline Density-based clustering (DB-Scan) – Reference: Martin Ester, Hans-Peter Kriegel, Jorg Sander, Xiaowei Xu: A Density-Based Algorithm for.

Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.

DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.

Clustering Prof. Navneet Goyal BITS, Pilani

Clustering Methods Professor: Dr. Mansouri

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.

K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Trajectory Pattern Mining NTU IM Hsieh, Hsun-Ping Trajectory Pattern Mining Reporter ： Hsieh, Hsun-Ping 解巽評 (R ) Fosca Giannotti Mirco Nanni Dino.

Evolutionary Computational Intelligence Lecture 10a: Surrogate Assisted Ferrante Neri University of Jyväskylä.

Automating Keyphrase Extraction with Multi-Objective Genetic Algorithms (MOGA) Jia-Long Wu Alice M. Agogino Berkeley Expert System Laboratory U.C. Berkeley.

Dieter Pfoser, LBS Workshop1 Issues in the Management of Moving Point Objects Dieter Pfoser Nykredit Center for Database Research Aalborg University, Denmark.

Cluster Analysis.

Unsupervised Learning and Data Mining

Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies.

GeoPKDD Geographic Privacy-aware Knowledge Discovery and Delivery Kick-off meeting Pisa, March 14, 2005.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

On Simultaneous Clustering and Cleaning over Dirty Data

Efficient Model Selection for Support Vector Machines

Mirco Nanni, Roberto Trasarti, Giulio Rossetti, Dino Pedreschi Efficient distributed computation of human mobility aggregates through user mobility profiles.

Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,

Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa)‏ www-kdd.isti.cnr.it Anna Monreale Fabio Pinelli Roberto Trasarti Fosca Giannotti A. Monreale,

An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.

Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.

Trajectory Pattern Mining

Bug Localization with Machine Learning Techniques Wujie Zheng

1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

CENTRE CEllular Network Trajectories Reconstruction Environment F. Giannotti, A. Mazzoni, S. Puntoni, C. Renso KDDLab, Pisa.

Trajectory Pattern Mining Fosca Giannotti, Mirco Nanni, Dino Pedreschi, Fabio Pinelli KDD Lab (ISTI-CNR & Univ. Pisa) Presented by: Qiming Zou.

Spatial-Temporal Models in Location Prediction Jingjing Wang 03/29/12.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.

A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić

August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.

黃福銘 (Angus). Angus Fuming Huang Academia Sinica, Institute of Information Science, ANTS Lab Jae-Gil Lee Jiawei Han UIUC Kyu-Young Whang KAIST ACM SIGMOD’07.

Presented by Ho Wai Shing

Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.

1 Core Techniques: Cluster Analysis Cluster: a number of things of the same kind being close together in a group (Longman dictionary of contemporary English.

Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.

Other Clustering Techniques

Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.

Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.

Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.

CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.

Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.

Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.

DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

COMP24111 Machine Learning K-means Clustering Ke Chen.

Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.

More on Clustering in COSC 4335

On Spatial Joins in MapReduce

Data Mining – Chapter 4 Cluster Analysis Part 2

Donghui Zhang, Tian Xia Northeastern University

Presentation transcript:

Time-focused density-based clustering of trajectories of moving objects Margherita D’Auria Mirco Nanni Dino Pedreschi

2 Plan of the talk Introduction  Motivations  Problem & context  Density-based Clustering (OPTICS) Density-based clustering on trajectories  Trajectory data model distance measure  Results Temporal Focusing  A clustering quality measure  Heuristics for optimal temporal interval Conclusions & future work

3 Motivations Plenty of actual and future data sources for spatio-temporal data Plenty of actual and future data sources for spatio-temporal data Sophisticated analysis method are required, in order to fully exploit them Sophisticated analysis method are required, in order to fully exploit them  Data mining methods  Which kind of patterns/models? Main objectives Main objectives  A better understanding of the application domain  An improvement for private and public services

4 Problem & context A distinguishing case: Mobile devices A distinguishing case: Mobile devices  PDAs  Mobile phones  LBS-enabled devices (may include the two above) They (can) yield traces of their movement They (can) yield traces of their movement An important problem: An important problem:  Discovering groups of individuals that (approx.) move together in some period of time  E.g.: detection of traffic jams during rush hours A candidate Data Mining reformulation of the problem A candidate Data Mining reformulation of the problem  Clustering of individuals’ trajectories

5 Which kind of clustering? Several alternatives are available Several alternatives are available General requirements: General requirements:  Non-spherical clusters should be allowed E.g.: A traffic jam along a road It should be represented as a cluster which individuals form a “snake-shaped” cluster  Tolerance to noise  Low computational cost  Applicability to complex, possibly non-vectorial data A suitable candidate: Density-based clustering A suitable candidate: Density-based clustering  In particular, we adopt OPTICS

6 A crushed intro to OPTICS A density threshold is defined through two parameters:   ε: A neighborhood radius   MinPts: Minimum number of points Key concepts: Key concepts:   Core objects Objects with a ε-Neighborhood that contains at least MinPts objects   Reachability-distance reach-d( p, q ) (simplified definition:) Distance between objects p and q Example: Example:   Object “q” is a core object if MinPts=2   Object “p” is not   Their reach-d() is shown q q p p ε ε –neighborhood of q ch reach-d(p,q)

7 A crushed intro to OPTICS The algorithm: Repeatedly choose a non-visited random object, until a core object is selected Select the core object having the smallest reachability distance from all the visited core objects. If none can be found, go to step 1 Order of visit Output: reach-d() of all visited points (reachability plot) “jump” from left-hand group (0-9) (10-18) to right-hand one (10-18) Reachability threshold Cluster 1Cluster 2

8 Applying OPTICS to trajectories Two key issues have to be solved Two key issues have to be solved  A suitable representation for trajectories is needed Which data model for trajectories?  A mean for comparing trajectories has to be provided Which distance between objects? OPTICS needs to define one to perform range queries

9 A trajectory data model Raw input data: Raw input data:   Each trajectory is represented as a set of time-stamped coordinates   T=(t 1,x 1,y 1 ), …, (t n, x n, y n ) => Object position at time t i was (x i,y i ) Data model Data model   Parametric-spaghetti: linear interpolation between consecutive points

10 Adopted distance = average distance Adopted distance = average distance It is a metric => efficient indexing methos allowed It is a metric => efficient indexing methos allowed A distance between trajectories A distance between trajectories

11 A sample dataset Set of trajectories forming 4 clusters + noise Set of trajectories forming 4 clusters + noise Generated by the CENTRE system (KDDLab software) Generated by the CENTRE system (KDDLab software)

12 K-means OPTICS HAC-average OPTICS vs. HAC & K-means

13 Temporal focusing Different time intervals can show different behaviours Different time intervals can show different behaviours  E.g.: objects that are close to each other within a time interval can be much distant in other periods of time The time interval becomes a parameter The time interval becomes a parameter  E.g.: rush hours vs. low traffic times Problem: significant time intervals are not always known a priori Problem: significant time intervals are not always known a priori  An automated mechanism is needed to find them

14 Temporal focusing The proposed method The proposed method 1. Provide a notion of interestingness to be associated with time intervals We define it in terms of estimated quality of the clustering extracted on the given time interval 2. Formalize the Temporal focusing task as an optimization problem Discover the time interval that maximizes the interestingness measure

15 A quality measure for density-based clustering General principle General principle  High-density clusters separated by low-density noise are preferred The method The method  High-density clusters correspond to low dents in the reachability plot => Evaluate the global quality Q of the clustering output as the average reachability within clusters (noise is discarded) LOW DENSITY HIGH DENSITY MEDIUM DENSITY Definition: given ε and dataset D, compute Q D, ε as: Definition: given ε and dataset D, compute Q D, ε as: Q D, ε = - R (D, ε’) = - AVG o in D’ reach-d(o) D’ = D – {noise objects}

16 FAQs How Q() is computed for a given time interval I ? How Q() is computed for a given time interval I ?  Step 1: trajectory segments out of I are clipped away  Step 2: OPTICS is run on the clipped trajectories  Step 3: Q(I) is computed on the output reachability plot How is the reachability threshold set for each interval? How is the reachability threshold set for each interval?  A reachability threshold is needed in order to locate clusters (and noise)  The threshold for the largest I is manually set by the user  Thresholds for other intervals I’  I are computed from the first one by proportionally rescaling w.r.t. average reachability Is the optimal Q(I) biased towards tiny intervals? Is the optimal Q(I) biased towards tiny intervals?  Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I| => A small decrease in Q(I) is accepted when it yields a much larger I

17 Esperiments A more complex sample dataset (generated by CENTRE) A more complex sample dataset (generated by CENTRE)  Clear clusters in the central time interval vs. dispersion on the borders

18 Optimizing Q() Find the optimal Q() by plotting values for all time intervals Find the optimal Q() by plotting values for all time intervals   The optimum corresponds to the central time interval

19 Heuristics for optimum search Each Q() value computation requires a run of the OPTICS algorithm Each Q() value computation requires a run of the OPTICS algorithm Computing all O(N 2 ) values is too expensive (N=|{sub-intervals}|) Computing all O(N 2 ) values is too expensive (N=|{sub-intervals}|) Alternative approaches are needed Alternative approaches are needed Preliminary tests with hill-climbing (i.e., greedy) approach: Preliminary tests with hill-climbing (i.e., greedy) approach: Test on the same dataset Test on the same dataset Global optimum found in the 70,7% of runs Global optimum found in the 70,7% of runs Avg. number of steps: 17 Avg. number of steps: 17 Avg. OPTICS runs: 49 Avg. OPTICS runs: 49 startingpoints localoptima globaloptimum

20 Conclusions & Future works Summary of the work Summary of the work  Extension of OPTICS to a trajectory data model & distance  Definition of the Temporal Focusing problem  Definition of a clustering quality measure  (Preliminary) Tests with exhaustive & greedy optimization Future work Future work  Experimental validation over broader benchmarks  Tighter integration between OPTICS and search strategy  Alternative, domain-specific definition of quality measures