Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004.

Similar presentations


Presentation on theme: "Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004."— Presentation transcript:

1 Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004

2 Overview Algorithms GRAVIclust AUTOCLUST AUTOCLUST+ 3D Boundary-based Clustering SNN

3 Gravity based spatial clustering GRAVIclust Initialisation Phase calculate the initial centre clusters Optimisation Phase improve the position of the cluster centres so as to achieve a solution which minimizes the distance function

4 GRAVIclust: Initialisation Phase Input: set of points P

5 GRAVIclust: Initialisation Phase Input: set of points P matrix of distances between all pairs of points assumption: actual access path distance exists in GIS maps e.g.. http://www.transinfo.qld.gov.auhttp://www.transinfo.qld.gov.au very versatile footpath road map rail map

6 GRAVIclust: Initialisation Phase Input: set of points P matrix of distances between all pairs of points # of required clusters k

7 GRAVIclust: Initialisation Phase Step 1: calculate first initial centre the point with the largest number of points within radius r remove first initial centre & all points within radius r from further consideration Step 2: repeat Step 1 until k initial centres have been chosen Step 3: create initial clusters by assigning all points to the closest cluster centre

8 GRAVIclust: radius calculation Radius r calculated based on the area of the region considered for clustering static radius based on the assumption that all clusters are of the same size dynamic radius recalculated after each initial cluster centre is chosen

9 GRAVIclust: Static vs. Dynamic Static reduced computation # points within a radius r has to be calculated only once not suitable for problems where the points are separated by large empty areas Dynamic increases computation time ensures the radius is adjusted as the points are removed Differs only when distribution is non-uniform

10 GRAVIclust: Optimisation Phase Step 1: for each cluster, calculate new centre based on the the point closest to cluster centre of gravity Step 2: re-assign points to new cluster centres Step 3: recalculate distance function never greater than previous Step 4: repeat Step 1 to 3 until value distance function equals previous

11 GRAVIclust Deterministic Can handle obstacles Monotonic convergence of the distance function to a stable point

12 AUTOCLUST Definitions

13 AUTOCLUST Definitions II

14 AUTOCLUST Phase 1: finding boundaries Phase 2: restoring and re-attaching Phase 3: detecting second-order inconsistency

15 AUTOCLUST: Phase 1 Finding boundaries Calculate Delaunay Diagram for each point p i ShortEdges(p i ) LongEdges(p i ) OtherEdges(p i ) Remove ShortEdges(p i ) and LongEdges(p i )

16 AUTOCLUST: Phase 2 Restoring and re-attaching for each point p i where ShortEdges(p i )   Determine a candidate connected component C for p i If there are 2 edges e j = (p i,p j ) and e k = (p i,p k ) in ShortEdges(p i ) with CC[p j ]  CC[p k ], then Compute, for each edge e = (p i,p j )  ShortEdges(p i ), the size ||CC[p j ]|| and let M = max e = (pi,pj)  ShortEdges(pi) ||CC[p j ]|| Let C be the class labels of the largest connected component (if there are two different connected components with cardinality M, we let C be the one with the shortest edge to p i )

17 AUTOCLUST: Phase 2 Restoring and re-attaching for each point p i where ShortEdges(p i )   Determine a candidate connected component C for p i If … Otherwise, let C be the label of the connected component all edges e  ShortEdges(p i ) connect p i to

18 AUTOCLUST: Phase 2 Restoring and re-attaching for each point p i where ShortEdges(p i )   Determine a candidate connected component C for p i If the edges in OtherEdges(p i ) connect to a connected component different than C, remove them. Note that all edges in OtherEdges(p i ) are removed, and only in this case, will p i swap connected components Add all edges e  ShortEdges(p i ) that connect to C

19 AUTOCLUST: Phase 3 Detecting second-order inconsistency compute the LocalMean for 2- neighbourhoods remove all edges in N 2,G(pi) that are long edges

20 AUTOCLUST

21 No user supplied arguments eliminates expensive human-based exploration time for finding best-fit arguments Robust to noise, outliers, bridges and type of distribution Able to detect clusters with arbitrary shapes, different sizes and different densities Can handle multiple bridges O(n log n)

22 AUTOCLUST+ Construct Delaunay Diagram Calculate MeanStDev(P) For all edges e, remove e if it intersects some obstacles Apply the 3 phases of AUTOCLUST to the planar graph resulting from the previous steps

23 3D Boundary-based Clustering Benefits from 3D Clustering more accurate spatial analysis distinguish positive clusters: clusters in higher dimensions but not in lower dimensions

24 3D Boundary-based Clustering Benefits from 3D Clustering more accurate spatial analysis distinguish positive clusters: clusters in higher dimensions but not in lower dimensions negative clusters: clusters in lower dimensions but not in higher dimensions

25 3D Boundary-based Clustering Based on AUTOCLUST Uses Delaunay Tetrahedrizations Definitions: e j potential inter-cluster edge if:

26 3D Boundary-based Clustering Phase I For all the p i  P, classify each edge e j incident to p i into one of three groups ShortEdges(pi) when the length of e j is less than the range in AI(p i ) LongEdges(pi) when the length of e j is greater than the range in AI(p i ) OtherEdges(pi) when the length of e j is within AI(p i ) For all the p i  P, remove all edges in ShortEdges(pi) and LongEdges(pi)

27 3D Boundary-based Clustering Phase II Recuperate ShortEdges(pi) incident to border points using connected component analysis Phase III Remove exceptionally long edges in local regions

28 Shared Nearest Neighbour Clustering in higher dimensions Distances or similarities between points become more uniform, making clustering more difficult Also, similarity between points can be misleading i.e.. a point can be more similar to a point that “actually” belongs to a different cluster Solution Shared nearest neighbor approach to similarity

29 SNN: An alternative definition of similarity Euclidian distance most common distance metric used while useful in low dimensions, it doesn’t work well in high dimensions A1A2A3A4A5A6A7A8A9A10 P13000000000 P20000000004 P33240123120 P40240123124

30 SNN: An alternative definition of similarity Define similarity in terms of their shared nearest neighbours the similarity of the points is “confirmed” by their common shared nearest neighbours

31 SNN: An alternative definition of density SNN similarity, with the k-nearest neighbour approach if the k-nearest neighbour of a point, with respect to SNN similarity is close, then we say that there is a high density at this point since it reflects the local configuration of the points in the data space, it is relatively insensitive to variations in desitiy and the dimensionality of the space

32 SNN: Algorithm Compute the similarity matrix corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points

33 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix by keeping only the k most similar neighbours corresponds to keeping only the k strongest links of the similarity graph

34 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared nearest neighbour graph from the sparsified similarity matrix

35 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared … Find the SNN density of each point Find the core points

36 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared … Find the SNN density of each point

37 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared … Find the SNN density of each point Form clusters from the core points

38 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared … Find the SNN density of each point Form clusters from the core points Discard all noise points

39 SNN: Algorithm Compute the similarity matrix Sparsify the similarity matrix … Construct the shared … Find the SNN density of each point Form clusters from the core points Discard all noise points Assign al non-noise, non-core points to clusters

40 Shared Nearest Neighbour Finds clusters of varying shapes, sizes, and densities, even in the presence of noise and outliers Handles data of high dimentionality and varying densities Automaticly detects the # of clusters


Download ppt "Clustering An overview of clustering algorithms Dènis de Keijzer GIA 2004."

Similar presentations


Ads by Google