Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.

Similar presentations


Presentation on theme: "Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao."— Presentation transcript:

1 Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao

2 Outline Motivation Objective Introduction Basic Notions Modeling The Data Clustering Using Random Walks –Separators and separating operators –Clustering by separation –Clustering spatial points Integration with Agglomerative Clustering Examples Conclusion Opinion

3 Motivation The characteristics of spatial data pose several difficulties for clustering algorithms The clusters may have arbitrary shapes and non- uniform sizes –Different cluster may have different densities The existence of noise may interfere the clustering process

4 Objective Present a new approach to clustering spatial data Seeking efficient clustering algorithms. Overcoming noise and outliers

5 Introduction The heart of the method is in what we shall be calling separating operators. Their effect is to sharpen the distinction between the weights of inter-cluster edges and intra-cluster edges –By decreasing the former and increasing the latter It can be used on their own or can be embedded in a classical agglomerative clustering framework.

6 BASIC NOTIONS graph-theoretic notions (A higher value means more similar)

7 BASIC NOTIONS The probability of a transition from node i to node j The probability that a random walk originating at s will reach t before returning to s

8 MODELINE THE DATA Delaunay triangulation (DT) –Many O(n log n) time and O(n) space algorithms exist for computing the DT of a planar point set. K-mutual neighborhood –The k-nearest neighbors of each point can be O(n log n) time O(n) space for any fixed arbitrary dimension. The weight of the edge (a,b) is –d(a,b) is the Euclidean distance between a and b. –ave is the average Euclidean distance between two adjacent points.

9 CLUSTERING USING RANDOM WALKS To identifying natural clusters in a graph is to find ways to compute an intimacy relation between the nodes incident to each of the graph’s edges. Identifying separators is to use an iterative process of separation. –This is a kind of sharpening pass

10

11 NS : Separation by neighborhood similarity Definition :

12 CE : Separation by circular escape Definition :

13 Clustering spatial points

14 Integration with Agglomerative Clustering The separation operators can be used as a preprocessing before activating agglomerative clustering on the graph Can effectively prevent bad local merging opposing the graph structure. It is equivalent to a “single link” algorithm preceded by a separation operation

15 Examples

16 Conclusion It is robust in the presence of noise and outliers, and is flexible in handling data of different densities. The CE operator yields better results than the NS operator The time complexity of our algorithm applied to n data points is O(n log n)

17 Opinion Since the algorithm does not rely on spatial knowledge, we can to try it on other types of data.

18 END


Download ppt "Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao."

Similar presentations


Ads by Google