Download presentation

Presentation is loading. Please wait.

Published byClinton Parsons Modified over 2 years ago

1
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering algorithms assume a large data structure which is memory resident. Clustering may be performed first on a sample of the database then applied to the entire database. Clustering may be performed first on a sample of the database then applied to the entire database. Algorithms Algorithms –BIRCH –DBSCAN –CURE

2
Part II - Clustering© Prentice Hall2 Desired Features for Large Databases One scan (or less) of DB One scan (or less) of DB Online Online Suspendable, stoppable, resumable Suspendable, stoppable, resumable Incremental Incremental Work with limited main memory Work with limited main memory Different techniques to scan (e.g. sampling) Different techniques to scan (e.g. sampling) Process each tuple once Process each tuple once

3
Part II - Clustering© Prentice Hall3 BIRCH Balanced Iterative Reducing and Clustering using Hierarchies Balanced Iterative Reducing and Clustering using Hierarchies Incremental, hierarchical, one scan Incremental, hierarchical, one scan Save clustering information in a tree Save clustering information in a tree Each entry in the tree contains information about one cluster Each entry in the tree contains information about one cluster New nodes inserted in closest entry in tree New nodes inserted in closest entry in tree

4
Part II - Clustering© Prentice Hall4 Clustering Feature (N,LS,SS) (N,LS,SS) –N: Number of points in cluster –LS: Sum of points in the cluster –SS: Sum of squares of points in the cluster CF Tree CF Tree –Balanced search tree –Node has CF triple for each child –Leaf node represents cluster and has CF value for each subcluster in it. –Subcluster has maximum diameter

5
Part II - Clustering© Prentice Hall5 BIRCH Algorithm

6
Part II - Clustering© Prentice Hall6 Improve Clusters

7
Part II - Clustering© Prentice Hall7 DBSCAN Density Based Spatial Clustering of Applications with Noise Density Based Spatial Clustering of Applications with Noise Outliers will not effect creation of cluster. Outliers will not effect creation of cluster. Input Input –MinPts – minimum number of points in cluster –Eps – for each point in cluster there must be another point in it less than this distance away.

8
Part II - Clustering© Prentice Hall8 DBSCAN Density Concepts Eps-neighborhood: Points within Eps distance of a point. Eps-neighborhood: Points within Eps distance of a point. Core point: Eps-neighborhood dense enough (MinPts) Core point: Eps-neighborhood dense enough (MinPts) Directly density-reachable: A point p is directly density-reachable from a point q if the distance is small (Eps) and q is a core point. Directly density-reachable: A point p is directly density-reachable from a point q if the distance is small (Eps) and q is a core point. Density-reachable: A point si density- reachable form another point if there is a path from one to the other consisting of only core points. Density-reachable: A point si density- reachable form another point if there is a path from one to the other consisting of only core points.

9
Part II - Clustering© Prentice Hall9 Density Concepts

10
Part II - Clustering© Prentice Hall10 DBSCAN Algorithm

11
Part II - Clustering© Prentice Hall11 CURE Clustering Using Representatives Clustering Using Representatives Use many points to represent a cluster instead of only one Use many points to represent a cluster instead of only one Points will be well scattered Points will be well scattered

12
Part II - Clustering© Prentice Hall12 CURE Approach

13
Part II - Clustering© Prentice Hall13 CURE Algorithm

14
Part II - Clustering© Prentice Hall14 CURE for Large Databases

Similar presentations

OK

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on content development manager Ppt on construction industry in india Ppt on telephone etiquettes images Ppt on american vs british accents Ppt on object-oriented programming pdf Ppt on forest conservation act Ppt on medical abortion Ppt on carbon and its compounds questions Ppt on power transmission lines Ppt on hindu religion facts