On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras.

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search Chengyuan Zhang 1,Ying Zhang 1,Wenjie Zhang 1, Xuemin.
Advertisements

Swarm: Mining Relaxed Temporal Moving Object Clusters
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
Mining Frequent Spatio-temporal Sequential Patterns
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Cluster Analysis.
Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Optimization of Spatial Joins on Mobile Devices N. Mamoulis 1, P. Kalnis 2, S. Bakiras 3, X. Li 2 1 Department of Computer Science and Information Systems,
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Hierarchical Constraint Satisfaction in Spatial Database Dimitris Papadias, Panos Kalnis And Nikos Mamoulis.
Cluster Analysis.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Ad-hoc Distributed Spatial Joins on Mobile Devices Panos Kalnis, Xiaochen Li National University of Singapore Nikos Mamoulis The University of Hong Kong.
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
1 Target-Oriented Scheduling in Directional Sensor Networks Yanli Cai, Wei Lou, Minglu Li,and Xiang-Yang Li* The Hong Kong Polytechnic University, Hong.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Tree-Based Density Clustering using Graphics Processors
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
BRAID: Discovering Lag Correlations in Multiple Streams Yasushi Sakurai (NTT Cyber Space Labs) Spiros Papadimitriou (Carnegie Mellon Univ.) Christos Faloutsos.
Clustering Moving Objects in Spatial Networks Jidong Chen, Caifeng Lai, Xiaofeng Meng, Renmin University of China Jianliang Xu, and Haibo Hu Hong Kong.
Author:Rakesh Agrawal
University of Macau, Macau
Topic9: Density-based Clustering
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Presented by Ho Wai Shing
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Trajectory Simplification: On Minimizing the Direction-based Error
Data Mining and Decision Support
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 Complex Spatio-Temporal Pattern Queries Cahide Sen University of Minnesota.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Privacy-Preserving Publication of User Locations in the Proximity of Sensitive Sites Bharath Krishnamachari Gabriel Ghinita Panos Kalnis National University.
Generalized Point Based Value Iteration for Interactive POMDPs Prashant Doshi Dept. of Computer Science and AI Institute University of Georgia
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Density-based Place Clustering in Geo-Social Networks Jieming Shi, Nikos Mamoulis, Dingming Wu, David W. Cheung Department of Computer Science, The University.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
Hierarchical Clustering: Time and Space requirements
Efficient Multi-User Indexing for Secure Keyword Search
CSE 5243 Intro. to Data Mining
Query-Friendly Compression of Graph Streams
Spatio-temporal Pattern Queries
Overview Of Clustering Techniques
Spatial Online Sampling and Aggregation
CS 685: Special Topics in Data Mining Jinze Liu
CSE572, CBS572: Data Mining by H. Liu
Continuous Density Queries for Moving Objects
Clustering Large Datasets in Arbitrary Metric Space
CSE572: Data Mining by H. Liu
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

On Discovering Moving Clusters in Spatio-temporal Data Panos Kalnis National University of Singapore Nikos Mamoulis University of Hong Kong Spiridon Bakiras Hong Kong University of Science and Technology

What is a Moving Cluster?  Dense clusters of objects that move similarly for a long time period  Not necessarily the same objects during the lifetime of the cluster  Examples Migrating animals Convoy of cars Military applications  Solutions: Efficient exact and approximate algorithms

Problem Formulation  Example:  Moving cluster

Related Work (Static)  Partition-based clustering (k-medoids)  Hierarchical clustering (BIRCH, CURE)  Density-based clustering (DBSCAN) ε ε MinPts=3

Related Work (Moving Objects)  Grouping trajectories [Vlachos et.al, ICDE 02] Trajectory cluster: Constant set of objects through its lifetime Only similar movement; no space proximity  Dense areas over time [Hadjieleftheriou et.al, SSTD 03] Static dense regions No common objects between regions in sequence  Incremental DBSCAN/OPTICS [Ester et.al, VLDB 98] Only a small percentage of objects moves  Maintaining Data Bubbles [Nassar et.al, SIGMOD 04] Redistributes updated objects in existing bubbles

MC1: The Straight-forward approach  G: set of moving clusters  Apply clustering to next timeslice S i  Expand moving clusters in G  Add new moving clusters to G  Report ending clusters

Hash-based DBSCAN  Memory:  10M objects with 1GB RAM

MC1 is inefficient! 1. Checks all possible combination of clusters in consecutive timeslices 2. Performs clustering for every timeslice

MC2: Minimizing Redundant Checks  Clustering in every timeslice  Select a random object in c 1  Search the object in S 2  Repeat for remaining objects  Max: (1-θ)|c i | objects c 1 c 2 is a moving cluster

Ambiguity Cases: θ<0.5 {c 0 c 1, c 2 } {c 0 c 2, c 1 }

MC3: Approximate Moving Clusters  Intuition: Many clusters will remain the same even if objects move  Avoid performing clustering in every timeslice  For an object o If o belongs to cluster c in timeslice S i Assume that o also belongs to c in the next timeslice (notice: objects may have moved)

Refine clusters  Hash new clusters in a grid  Legal cluster: Does not meet/intersect with other clusters It is connected (cells meet)  Objects in legal clusters are not considered further  For the rest of the objects, perform clustering  Possible inaccuracies!!!

Minimize Error  Perform exact clustering to absorb (may not eliminate) the accumulated error  Period for exact clustering: Grows linearly, drops exponentially  Exact clustering: If more that α|G| clusters have been added/removed

Experimental Evaluation  10K-50K objects per timeslice  timeslices, up to 5M objects  Linux, C++, 1.3GHz CPU, 1.2GB RAM  Generator: Clusters move/rotate, objects appear/disappear

Varying data size (10K-50K per timeslice) Avg: 87%  θ=0.9, α=0.1  Larger dataset: larger clusters, more interactions

Varying number of clusters ( per timeslice)  5M objects, θ=0.9, α=0.1  Many clusters: Reaches error threshold fast 96% 87% 73%

Varying α  5M objects, θ=0.9, 800 clusters  α small: may not recover!!!

Varying α for different agilities  Low agility: Fewer errors  faster

MC3 for varying θ  5M objects, α=0.1, 800 clusters  θ large: incorrect clusters are pruned for not satisfying the θ criterion

Conclusions  Moving clusters Objects may move/change Exact and approximate solutions  Future work Automatic setting of parameter α Better error estimation Constraints (e.g, moving cluster must span at least k timeslices)

Questions?