Clustering Uncertain Taxi data

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Stephen D. Bay 1 and Mark Schwabacher 2 1 Institute for.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering.
Data mining and statistical learning - lecture 14 Clustering methods  Partitional clustering in which clusters are represented by their centroids (proc.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
Data mining and machine learning A brief introduction.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Nearest Neighbor Searching Under Uncertainty
Clustering Uncertain Data Speaker: Ngai Wang Kay.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Efficient Clustering of Uncertain Data Wang Kay Ngai, Ben Kao, Chun Kit Chui, Reynold Cheng, Michael Chau, Kevin Y. Yip Speaker: Wang Kay Ngai.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Presented by Dajiang Zhu 11/1/2011.  Introduction of Markov chains Definition One example  Two problems as examples 2-SAT Algorithm (simply introduce.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
k-Nearest neighbors and decision tree
Mining Spatio-Temporal Reachable Regions over Massive Trajectory Data
Presented by Prashant Duhoon
Topic 3: Cluster Analysis
William Norris Professor and Head, Department of Computer Science
Clustering.
AIM: Clustering the Data together
Dr. Unnikrishnan P.C. Professor, EEE
Probabilistic Data Management
William Norris Professor and Head, Department of Computer Science
Finding Fastest Paths on A Road Network with Speed Patterns
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
CSSE463: Image Recognition Day 23
Continuous Density Queries for Moving Objects
Topic 5: Cluster Analysis
SEEM4630 Tutorial 3 – Clustering.
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Clustering Uncertain Taxi data By: Jianfeng Zhu Salwa Aljehani Kent State University

Outline Introduction Problem definition and solution Experiment Conclusion

Introduction Considering a database of location data reported from moving Taxi in a GPS system. Data Clustering: used to discover any cluster patterns in a dataset dataset may partitioned into several groups: such that the data within the same cluster are closer to each other or more similar (based on some distance functions) than the data from any other clusters. Uncertainty on location: In case of moving objects such as taxies, The actual locations may have changed when their reported locations data is received. Extending traditional clustering methods to handle the uncertain data.

Problem definition and Solution Clustering on taxi data, which is on weekend of summer holiday (Sunday, Feb 3rd, 2008), in Beijing City, China. To Find What is the hotspots for visitors. Many methods for clustering data, such as K-means clustering. K-means clustering considers each cluster to have a centroid (it is the mean of the data in the cluster). K-means clustering assigns each vehicle to one of the K clusters such that its location is closer in Euclidean distance to that cluster's representative than any others' representatives. Then the representative of each cluster is updated to the mean of the locations of the vehicles in the cluster. And each vehicle is re-assigned to the K clusters with the new representatives. This process repeats until some objectives is met, e.g. no changes of any vehicles' clusters between two successive processes.

Problem definition and Solution In the case of uncertainty, the data in the database is not very accurate. The Taxi data will have an "uncertainty" region around the taxi where its actual location lies within this region. The uncertainty region could be a circle region: using the reported location of the taxi as its center and has a radius of the taxi's maximum speed. Assume one time stamp for the radius.

Problem definition and Solution for the probability of the vehicle's actual location being in a particular point of the region: The uncertainty region could be associated with an arbitrary probability density function (pdf). Using uniform distribution Probability Function. Based on the total number of the samples inside the region. If part of the region is sea or building , that part may be associated with a total of 0.1 probability and A probability 0.9 is for the taxi to be in any points of the remaining part of the region.

Problem definition and Solution UK-means clustering is used with uncertain data . UK-means clustering is based on traditional K-means clustering algorithm its distance function is the "expected distance” from the data's uncertainty region (the Taxi region ) to the centroid of the cluster that should assigned to. For centroid c of the cluster, an uncertainty region R with a pdf f, and a Euclidean distance function D(p,c) , the expected distance is:

Experiment : The dataset contains the GPS trajectories of 10,357 taxis during 3 days in Beijing. The total number of points in this dataset is about 15 million, and the total distance of the trajectories reaches to 9 million kilometers.

UK-Mean:

Clustering with K-mean:

Clustering with UK-mean:

Pruning Method: Basic idea Considering the moving of the object : Basic idea Compute the lower/upper bounds of the Taxi speed, spd(s1, s2), from object point T in the uncertain region. Use lower/upper bounds of speed to filter out false sample points.

Conclusion and Future Work: In this project we studied the problem of clustering moving objects with the uncertainty regions defined . applying the UK-means algorithm to cluster uncertain objects using expected distances. To reduce the cost of expected distance computations, effective pruning techniques are necessary. This work can applied to other clustering methods such as nearest neighbor Also we can apply for another pdf to computer the probability of the uncertain points, such as Gaussian distribution.

Reference: Ngai, Wang Kay, et al. "Efficient clustering of uncertain data." Data Mining, 2006. ICDM'06. Sixth International Conference on. IEEE, 2006. Yang, Z., & Tang, H. (2010). A Model of Clustering Uncertain Data, 969–972. Patil, A. B. (2014). A Review of Clustering Algorithms for Clustering Uncertain Data, (November), 3643–3646. https://mubaris.com/2017-10-01/kmeans-clustering-in-python Dataset: https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/

Thank You