Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Clustering.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Clustering Categorical Data The Case of Quran Verses
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Midterm topics Chapter 2 Data Data preprocessing Measures of similarity/dissimilarity Chapter.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Segmentation and Clustering. Segmentation: Divide image into regions of similar contentsSegmentation: Divide image into regions of similar contents Clustering:
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal.
Cluster Analysis: Basic Concepts and Algorithms
Cluster Analysis (1).
What is Cluster Analysis?
What is Cluster Analysis?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
© University of Minnesota Data Mining CSCI 8980 (Fall 2002) 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center.
Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Computer Vision James Hays, Brown
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Jeff Howbert Introduction to Machine Learning Winter Clustering Basic Concepts and Algorithms 1.
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
ROCK: A Robust Clustering Algorithm for Categorical Attributes Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Data Engineering, Proceedings.,
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
SimRank: A Measure of Structural-Context Similarity Glen Jeh and Jennifer Widom Stanford University ACM SIGKDD 2002 January 19, 2011 Taikyoung Kim SNU.
Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Data Mining: Basic Cluster Analysis
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
More on Clustering in COSC 4335
Data Mining K-means Algorithm
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
CSE572, CBS598: Data Mining by H. Liu
Performance Comparison of Tarry and Awerbuch Algorithms
Clustering Wei Wang.
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Locality In Distributed Graph Algorithms
Presentation transcript:

Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao

Outline Motivation Objective Introduction Basic Notions Modeling The Data Clustering Using Random Walks –Separators and separating operators –Clustering by separation –Clustering spatial points Integration with Agglomerative Clustering Examples Conclusion Opinion

Motivation The characteristics of spatial data pose several difficulties for clustering algorithms The clusters may have arbitrary shapes and non- uniform sizes –Different cluster may have different densities The existence of noise may interfere the clustering process

Objective Present a new approach to clustering spatial data Seeking efficient clustering algorithms. Overcoming noise and outliers

Introduction The heart of the method is in what we shall be calling separating operators. Their effect is to sharpen the distinction between the weights of inter-cluster edges and intra-cluster edges –By decreasing the former and increasing the latter It can be used on their own or can be embedded in a classical agglomerative clustering framework.

BASIC NOTIONS graph-theoretic notions (A higher value means more similar)

BASIC NOTIONS The probability of a transition from node i to node j The probability that a random walk originating at s will reach t before returning to s

MODELINE THE DATA Delaunay triangulation (DT) –Many O(n log n) time and O(n) space algorithms exist for computing the DT of a planar point set. K-mutual neighborhood –The k-nearest neighbors of each point can be O(n log n) time O(n) space for any fixed arbitrary dimension. The weight of the edge (a,b) is –d(a,b) is the Euclidean distance between a and b. –ave is the average Euclidean distance between two adjacent points.

CLUSTERING USING RANDOM WALKS To identifying natural clusters in a graph is to find ways to compute an intimacy relation between the nodes incident to each of the graph’s edges. Identifying separators is to use an iterative process of separation. –This is a kind of sharpening pass

NS : Separation by neighborhood similarity Definition :

CE : Separation by circular escape Definition :

Clustering spatial points

Integration with Agglomerative Clustering The separation operators can be used as a preprocessing before activating agglomerative clustering on the graph Can effectively prevent bad local merging opposing the graph structure. It is equivalent to a “single link” algorithm preceded by a separation operation

Examples

Conclusion It is robust in the presence of noise and outliers, and is flexible in handling data of different densities. The CE operator yields better results than the NS operator The time complexity of our algorithm applied to n data points is O(n log n)

Opinion Since the algorithm does not rely on spatial knowledge, we can to try it on other types of data.

END