Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Methods Professor: Dr. Mansouri Presented by : Muhammad Abouei &Mohsen Ghahremani Manesh.

Similar presentations

Presentation on theme: "Clustering Methods Professor: Dr. Mansouri Presented by : Muhammad Abouei &Mohsen Ghahremani Manesh."— Presentation transcript:

1 Clustering Methods Professor: Dr. Mansouri Presented by : Muhammad Abouei &Mohsen Ghahremani Manesh

2 Clustering Methods  Density-Based Clustering Methods  DBSCAN (Density Based Spatial Clustering of Applications with Noise)  OPTICS (Ordering Points To Identify the Clustering Structure)  DENCLUE (DENsity-based CLUstEring)  Grid-based Clustering 2

3 Density Based Clustering 3

4 DBSCAN Concepts  ε -neighborhood: Points within ε distance (radius) of a point.  MinPts: minimum number of points in cluster (ε-neighborhood of that point). ε-neighborhood of q ε-neighborhood of p MinPts = 5 where ε and MinPts are a user-defined function. 4

5 DBSCAN Concepts  Density : number of points within a specified radius ( ε ) Density(p)=5 5

6 DBSCAN Concepts  Core point : A point is a core point if it has more than a specified number of points (MinPts) within ε  These are points that are at the interior of a cluster ε-neighborhood of q ε-neighborhood of p p is a core point (MinPts = 5) q is not a core point. 6

7 DBSCAN Concepts  Directly density-reachable : point p is directly density- reachable from a point q w.r.t. ε, MinPts if 1. p belongs to ε -neighborhood of q, 2. q is a core point, MinPts = 4 p is DDR from q. q is not DDR from p! DDR is an asymmetric relation. 7

8 DBSCAN Concepts  Density-reachable : A point p is density-reachable from a point q w.r.t. ε, MinPts if there is a chain of points P 1, …, P n, P 1 =q, P n =p such that P i +1is directly density-reachable from P i. Or, point p is density-reachable form q, if there is a path (chain of points) from p to q consisting of only core points. MinPts = 4 p is DR from q. q is not DR from p! p is not core. DR is an asymmetric relation. 8

9 DBSCAN Concepts  Density-connectivity: point p is density-connected to point q w.r.t. ε, MinPts if there is a point r such that both, p and q are density-reachable from r w.r.t. ε and MinPts. MinPts = 4 p and q are density-connected. DC is an symmetric relation. 9

10 DBSCAN Concepts  Border point : A border point has fewer than MinPts within ε, but is in the neighborhood of a core point MinPts =5 ε = circle radius 10

11 DBSCAN Concepts  Noise (outlier) point : is any point that is not a core point nor a border point. MinPts =5 ε = circle radius 11

12 DBSCAN Concepts  DBSCAN relies on a density-based notion of cluster.  Cluster : a cluster C is a non-empty set of density-connected points that is maximal w.r.t. density-reachability.  Maximality: For all p, q; if q ∈ C and if p is density-reachable from q w.r.t. ε and MinPts, then also p ∈ C. MinPts = 3 ε = circle radius 12

13 DBSCAN Algorithm  Arbitrary select a point p  Retrieve all points density-reachable from p w.r.t. ε and MinPts.  If p is a core point, a cluster is formed.  If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database.  Continue the process until all of the points have been processed. 13

14 DBSCAN MinPts = 4 14

15 DBSCAN DBSCAN is Sensitive to Parameters. MinPts = 4 15

16 DBSCAN Core, Border and Noise Points: MinPts = 4, ε = 10 Original Points Point types: core, border and noise 16

17 DBSCAN When DBSCAN works well:  Resistant to Noise  Can handle clusters of different shapes and sizes Original PointsClusters 17

18 DBSCAN When DBSCAN does not work well:  Varying densities  High-dimensional data 18

19 DBSCAN Complexity If a spatial index (ex, kd-tree, R*-tree) is used, the computational complexity of DBSCAN is O(n.logn), where n is the number of database objects. Otherwise, it is O(n 2 ). 19

20 OPTICS  Core distance: smallest ε that makes it a core object. If p is not core, it is undefined. Core Distance of p or ε′ : distance between p and its 4-thNN. MinPts = 5 ε = 3 cm 20

21 OPTICS  Reachability distance: of r w.r.t. p is the greater value of the core distance of p and the Euclidean distance between p & r. If p is not a core object, distance reachability between p & q is undefined. reachability-distance ε, MinPts (p, r) = ε′ reachability-distance ε, MinPts (p, r′) = d(p, r′ ) MinPts = 5 ε = 3 cm 21

22 OPTICS 22

23 OPTICS 23

24 OPTICS 24

25 OPTICS 25

26 OPTICS 26

27 OPTICS 27

28 OPTICS 28

29 OPTICS 29

30 OPTICS 30

31 OPTICS 31

32 OPTICS 32  Color image segmentation using density-Based clustering

33 DENCLUE  DENCLUE (DENsity-based CLUstEring)  Major features  Solid mathematical foundation  Good for data sets with large amounts of noise  Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets  Significant faster than existing algorithm (faster than DBSCAN by a factor of up to 45)  But needs a large number of parameters 33

34 DENCLUE  Technical Essence  Uses grid cells but only keeps information about grid cells that do actually contain data points and manages these cells in a tree- based access structure. 34

35 DENCLUE  Technical Essence  DENCLUE is based on the following concepts:  Influence function  Density function  Density attractors. 35


37 DENCLUE  Density function :The density function at x based on a data space of N points; i.e. D = {x 1,…, x N }; is defined as the sum of the influence function of all data points at x :  The goal of the definition:  Identify all “significant” local maxima, x j *, j=1,…,m of f D (x)  Create a cluster C j for each x j * and assign to C j all points of D that lie within the “region of attraction” of x j *. 37

38 DENCLUE  Example: Density Computation D={x1,x2,x3,x4} f DGaussian (x) = influence(x 1 )+influence(x 2 )+influence(x 3 )+influence(x 4 ) =0.04+0.06+0.08+0.6=0.78 Remark: the density value of y would be larger than the one for x. 38

39 DENCLUE  Density attractors :Density attractors are local maxima of the overall density function f D (x).  Clusters can then be determined mathematically by identifying density attractors.  A hill-climbing algorithm guided by the gradient can be used to determine the density attractor of a set of data points. 39

40 DENCLUE  Density-attracted : A point x is density-attracted to a density attractor x*, if there exists a set of points x 0, x 1, …, x k such that x 0 = x, x k = x* and the gradient of x i-1 is in the direction of x i for 0 { "@context": "", "@type": "ImageObject", "contentUrl": "", "name": "DENCLUE  Density-attracted : A point x is density-attracted to a density attractor x*, if there exists a set of points x 0, x 1, …, x k such that x 0 = x, x k = x* and the gradient of x i-1 is in the direction of x i for 0


42 DENCLUE  Multicenter defined clusters : Multicenter defined clusters are a set of center-defined clusters linked by a path of significance. 42




46 Grid-based  Using multi-resolution grid data structure  Clustering complexity depends on the number of populated grid cells and not on the number of objects in the dataset  Several interesting methods:  CS Tree (Clustering Statistical Tree)  STING  WaveCluster 46

47 Grid-based  Basic Grid-based Algorithm 1.Define a set of grid-cells. 2.Assign objects to the appropriate grid cell and compute the density of each cell. 3.Eliminate cells, whose density is below a certain threshold τ. 4.Form clusters from contiguous (adjacent) groups of dense cells (usually minimizing a given objective function). 47

48 Grid-based  Fast:  No distance computations,  Clustering is performed on summaries and not individual objects; complexity is usually O(no_of_populated_grid_cells) and not O(no_of_objects ),  Easy to determine which clusters are neighboring. 48

49 References  A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.  A.K. Jain and M. N. Murty and P.J. Flynn, Data Clustering: A Review, ACM Computing Surveys, vol 31. No 3,pp 264-323, 1999.  A. L. N. Fred, J. M. N. Leitão, A New Cluster Isolation Criterion Based on Dissimilarity Increments, IEEE  “Optimal grid-clustering: Toward breaking the curse of dimensionality in high- dimensional clustering,”in Proc. 25th VLDB Conf.,1999, pp. 506–517. 49

50 ? 50

Download ppt "Clustering Methods Professor: Dr. Mansouri Presented by : Muhammad Abouei &Mohsen Ghahremani Manesh."

Similar presentations

Ads by Google