Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering.

Similar presentations


Presentation on theme: "Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering."— Presentation transcript:

1 Clustering By: Avshalom Katz

2 We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering Definitions of parameters Complexity

3 What is Clustering? clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense.

4 Different types of Clustering Biology Information retrieval Climate Business Clustering for utility Summarization

5 Example

6 DIFFERENT KINDS OF CLUSTERS

7 Well Separated

8 Prototype based

9 Graph based

10 Density based

11 Share property (conceptual clusters)

12 DBSCAN-Introduction Density-Based Spatial Clustering of Applications with Noise Since society has started using databases, the amount of information that we are using is increasing exponentially. Due to that, automatic algorithms are entered to every subject.

13 Database Example

14 Density-Based Spatial Clustering of Applications with Noise 1. Minimum point in the density (MINEPS) 2. The distance of the point to check the density (EPS). There are four main steps in the algorithm, and the algorithm gets two parameters:

15 Definition 1 To find all adjacent points. The so called “adjacent” points are called so only of the distance between them is smaller than EPS from what we refer to as P- “point”. All the adjacent points are later entered into Neps (P).

16 Definition 2 Is to define the core group by checking if the point p is in the core with point q by checking if p includes in Neps (q) and the size of the group Neps (p) is grater then MINPTS.

17 Definition 3 Density-reachable the point p is density reachable from point q if there is a sequence of points that the first is p and the last is q, then every couple in the sequence is a directly density reachable

18 Definition 4 Density connected point refers to a single point that can reach two different points, also in different direction. For example in the diagram below we can see that P and Q are density- reachable from O. Therefore, P and Q are are density connected.

19 Definition 5 Cluster C, wrt.erps and MINPTS are non-empty subset of the database, together these two terms below are created: 1. If P is a member of class C and q is density reachable from P and NEPS(P)> MINTPS then q is also a member of C. 2. If p and q are both members of C, then both p and q are density connected to eachother.

20 Definition 6 There are groups of clusters, each point that does not belong to any group is called “noise”.

21 = noise E B F A N P Q T S R V U J C H G I D O L K M ε DBSCAN ( Eps = ε, MinPts = 3 ) number of adjacent : 5 stack : B,C,D,E,F current ClusterId : green number of adjacent : 8 stack : C,D,E,F,G,H,I, current ClusterId : green number of adjacent : 8 stack : D,E,F,G,H,I, current ClusterId : green number of adjacent : 9 stack : F,G,H,I,J current ClusterId : green number of adjacent : 7 stack : E,F,G,H,I current ClusterId : green number of adjacent : 9 stack : G,H,I,J current ClusterId : green number of adjacent : 6 stack : H,I,J current ClusterId : green number of adjacent : 7 stack : I,J current ClusterId : green number of adjacent : 7 stack : J current ClusterId : green number of adjacent : 5 stack : current ClusterId : green number of adjacent : stack : current ClusterId : purple number of adjacent : 0 stack : current ClusterId : purple X number of adjacent : 3 stack : O,P,Q current ClusterId : purple number of adjacent : 2 stack : P,Q current ClusterId : purple number of adjacent : 5 stack : Q,R,S,T current ClusterId : purple number of adjacent : 1 stack : current ClusterId : purple

22 Pseudocode of the algorithm DBSCAN (Eps, MinPts) // SetOfPoints is UNCLASSIFIED ClusterId := nextId(NOISE); FOR i FROM 1 TO SetOfPoints.size DO Point := SetOfPoints.get(i); IF Point.ClId = UNCLASSIFIED THEN IF ExpandCluster(SetOfPoints, Point,ClusterId, Eps, MinPts) THEN ClusterId := nextId(ClusterId) END IF END FOR END; // DBSCAN

23 ExpandCluster(SetOfPoints, Point, ClId, Eps,MinPts) : Boolean; seeds:=SetOfPoints.regionQuery(Point,Eps); IF seeds.size<MinPts THEN // no core point SetOfPoint.changeClId(Point,NOISE); RETURN False; ELSE // all points in seeds are density- // reachable from Point SetOfPoints.changeClIds(seeds,ClId); seeds.delete(Point); WHILE seeds <> Empty DO currentP := seeds.first(); result := SetOfPoints.regionQuery(currentP,Eps); IF result.size >= MinPts THEN FOR i FROM 1 TO result.size DO resultP := result.get(i); IF resultP.ClId IN {UNCLASSIFIED, NOISE} THEN IF resultP.ClId = UNCLASSIFIED THEN seeds.append(resultP);

24 END IF; SetOfPoints.changeClId(resultP,ClId); END IF; // UNCLASSIFIED or NOISE END FOR; END IF; // result.size >= MinPts seeds.delete(currentP); END WHILE; // seeds <> Empty RETURN True; END IF END; // ExpandCluster

25 Example

26 Define the value of parameter EPS bay MINPTS:

27 The complexity The complexity of ExpandCluster() is o(logN) in the worst case on a data base in size N and there is n iterations of this function, so it is on * log (n) )

28 Bibliography Ankerst, M., Breunig, M. M., Kriegel, H.-P., and Sander, J. (1999). Optics: ordering points to identify the clustering structure. SIGMOD Rec., 28(2):49-60 Clustering. (2010, April 19). In Wikipedia, The Free Encyclopedia. Retrieved 14:14, April 19, 2010 from http://en.wikipedia.org/w/index.php?title=Clustering&oldid=357078594http://en.wikipedia.org/w/index.php?title=Clustering&oldid=357078594 Ester, M., Kriegel, H.-p., Jörg, S., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Ester, M., Kriegel, H,. Jörg, S., and Xu, X (1995).A DatabaseIn terface forClustering in Large Spatial Databases, Proc. 1st Int. Conf. onKnowledge Discovery and Data Mining, Montreal, Canada, 1995, AAAI Press, 1995. Schikuta E., Erhart M.: “The bang-clustering system:Grid-based data analysis”. Proc. Sec. Int. Symp. IDA-97,Vol. 1280 LNCS, London, UK, Springer-Verlag, 1997.


Download ppt "Clustering By: Avshalom Katz. We will be talking about… What is Clustering? Different Kinds of Clustering What is DBSCAN? Pseudocode Example of Clustering."

Similar presentations


Ads by Google