Download presentation

Presentation is loading. Please wait.

Published byFelix Melton Modified about 1 year ago

1
Clustering Methods Professor: Dr. Mansouri Presented by : Muhammad Abouei &Mohsen Ghahremani Manesh

2
Clustering Methods Density-Based Clustering Methods DBSCAN (Density Based Spatial Clustering of Applications with Noise) OPTICS (Ordering Points To Identify the Clustering Structure) DENCLUE (DENsity-based CLUstEring) Grid-based Clustering 2

3
Density Based Clustering 3

4
DBSCAN Concepts ε -neighborhood: Points within ε distance (radius) of a point. MinPts: minimum number of points in cluster (ε-neighborhood of that point). ε-neighborhood of q ε-neighborhood of p MinPts = 5 where ε and MinPts are a user-defined function. 4

5
DBSCAN Concepts Density : number of points within a specified radius ( ε ) Density(p)=5 5

6
DBSCAN Concepts Core point : A point is a core point if it has more than a specified number of points (MinPts) within ε These are points that are at the interior of a cluster ε-neighborhood of q ε-neighborhood of p p is a core point (MinPts = 5) q is not a core point. 6

7
DBSCAN Concepts Directly density-reachable : point p is directly density- reachable from a point q w.r.t. ε, MinPts if 1. p belongs to ε -neighborhood of q, 2. q is a core point, MinPts = 4 p is DDR from q. q is not DDR from p! DDR is an asymmetric relation. 7

8
DBSCAN Concepts Density-reachable : A point p is density-reachable from a point q w.r.t. ε, MinPts if there is a chain of points P 1, …, P n, P 1 =q, P n =p such that P i +1is directly density-reachable from P i. Or, point p is density-reachable form q, if there is a path (chain of points) from p to q consisting of only core points. MinPts = 4 p is DR from q. q is not DR from p! p is not core. DR is an asymmetric relation. 8

9
DBSCAN Concepts Density-connectivity: point p is density-connected to point q w.r.t. ε, MinPts if there is a point r such that both, p and q are density-reachable from r w.r.t. ε and MinPts. MinPts = 4 p and q are density-connected. DC is an symmetric relation. 9

10
DBSCAN Concepts Border point : A border point has fewer than MinPts within ε, but is in the neighborhood of a core point MinPts =5 ε = circle radius 10

11
DBSCAN Concepts Noise (outlier) point : is any point that is not a core point nor a border point. MinPts =5 ε = circle radius 11

12
DBSCAN Concepts DBSCAN relies on a density-based notion of cluster. Cluster : a cluster C is a non-empty set of density-connected points that is maximal w.r.t. density-reachability. Maximality: For all p, q; if q ∈ C and if p is density-reachable from q w.r.t. ε and MinPts, then also p ∈ C. MinPts = 3 ε = circle radius 12

13
DBSCAN Algorithm Arbitrary select a point p Retrieve all points density-reachable from p w.r.t. ε and MinPts. If p is a core point, a cluster is formed. If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database. Continue the process until all of the points have been processed. 13

14
DBSCAN MinPts = 4 14

15
DBSCAN DBSCAN is Sensitive to Parameters. MinPts = 4 15

16
DBSCAN Core, Border and Noise Points: MinPts = 4, ε = 10 Original Points Point types: core, border and noise 16

17
DBSCAN When DBSCAN works well: Resistant to Noise Can handle clusters of different shapes and sizes Original PointsClusters 17

18
DBSCAN When DBSCAN does not work well: Varying densities High-dimensional data 18

19
DBSCAN Complexity If a spatial index (ex, kd-tree, R*-tree) is used, the computational complexity of DBSCAN is O(n.logn), where n is the number of database objects. Otherwise, it is O(n 2 ). 19

20
OPTICS Core distance: smallest ε that makes it a core object. If p is not core, it is undefined. Core Distance of p or ε′ : distance between p and its 4-thNN. MinPts = 5 ε = 3 cm 20

21
OPTICS Reachability distance: of r w.r.t. p is the greater value of the core distance of p and the Euclidean distance between p & r. If p is not a core object, distance reachability between p & q is undefined. reachability-distance ε, MinPts (p, r) = ε′ reachability-distance ε, MinPts (p, r′) = d(p, r′ ) MinPts = 5 ε = 3 cm 21

22
OPTICS 22

23
OPTICS 23

24
OPTICS 24

25
OPTICS 25

26
OPTICS 26

27
OPTICS 27

28
OPTICS 28

29
OPTICS 29

30
OPTICS 30

31
OPTICS 31

32
OPTICS 32 Color image segmentation using density-Based clustering

33
DENCLUE DENCLUE (DENsity-based CLUstEring) Major features Solid mathematical foundation Good for data sets with large amounts of noise Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets Significant faster than existing algorithm (faster than DBSCAN by a factor of up to 45) But needs a large number of parameters 33

34
DENCLUE Technical Essence Uses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree- based access structure. 34

35
DENCLUE Technical Essence DENCLUE is based on the following concepts: Influence function Density function Density attractors. 35

36
DENCLUE 36

37
DENCLUE Density function :The density function at x based on a data space of N points; i.e. D = {x 1,…, x N }; is defined as the sum of the influence function of all data points at x : The goal of the definition: Identify all “significant” local maxima, x j *, j=1,…,m of f D (x) Create a cluster C j for each x j * and assign to C j all points of D that lie within the “region of attraction” of x j *. 37

38
DENCLUE Example: Density Computation D={x1,x2,x3,x4} f DGaussian (x) = influence(x 1 )+influence(x 2 )+influence(x 3 )+influence(x 4 ) = =0.78 Remark: the density value of y would be larger than the one for x. 38

39
DENCLUE Density attractors :Density attractors are local maxima of the overall density function f D (x). Clusters can then be determined mathematically by identifying density attractors. A hill-climbing algorithm guided by the gradient can be used to determine the density attractor of a set of data points. 39

40
DENCLUE Density-attracted : A point x is density-attracted to a density attractor x*, if there exists a set of points x 0, x 1, …, x k such that x 0 = x, x k = x* and the gradient of x i-1 is in the direction of x i for 0*
*

41
DENCLUE 41

42
DENCLUE Multicenter defined clusters : Multicenter defined clusters are a set of center-defined clusters linked by a path of significance. 42

43
DENCLUE 43

44
DENCLUE 44

45
DENCLUE 45

46
Grid-based Using multi-resolution grid data structure Clustering complexity depends on the number of populated grid cells and not on the number of objects in the dataset Several interesting methods: CS Tree (Clustering Statistical Tree) STING WaveCluster 46

47
Grid-based Basic Grid-based Algorithm 1.Define a set of grid-cells. 2.Assign objects to the appropriate grid cell and compute the density of each cell. 3.Eliminate cells, whose density is below a certain threshold τ. 4.Form clusters from contiguous (adjacent) groups of dense cells (usually minimizing a given objective function). 47

48
Grid-based Fast: No distance computations, Clustering is performed on summaries and not individual objects; complexity is usually O(no_of_populated_grid_cells) and not O(no_of_objects ), Easy to determine which clusters are neighboring. 48

49
References A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No 3,pp , A. L. N. Fred, J. M. N. Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE “Optimal grid-clustering: Toward breaking the curse of dimensionality in high- dimensional clustering,”in Proc. 25th VLDB Conf.,1999, pp. 506–

50
? 50

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google