Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.

Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering algorithms assume a large data structure which is memory resident. Clustering may be performed first on a sample of the database then applied to the entire database. Clustering may be performed first on a sample of the database then applied to the entire database. Algorithms Algorithms –BIRCH –DBSCAN –CURE

Part II - Clustering© Prentice Hall2 Desired Features for Large Databases One scan (or less) of DB One scan (or less) of DB Online Online Suspendable, stoppable, resumable Suspendable, stoppable, resumable Incremental Incremental Work with limited main memory Work with limited main memory Different techniques to scan (e.g. sampling) Different techniques to scan (e.g. sampling) Process each tuple once Process each tuple once

Part II - Clustering© Prentice Hall3 BIRCH Balanced Iterative Reducing and Clustering using Hierarchies Balanced Iterative Reducing and Clustering using Hierarchies Incremental, hierarchical, one scan Incremental, hierarchical, one scan Save clustering information in a tree Save clustering information in a tree Each entry in the tree contains information about one cluster Each entry in the tree contains information about one cluster New nodes inserted in closest entry in tree New nodes inserted in closest entry in tree

Part II - Clustering© Prentice Hall4 Clustering Feature (N,LS,SS) (N,LS,SS) –N: Number of points in cluster –LS: Sum of points in the cluster –SS: Sum of squares of points in the cluster CF Tree CF Tree –Balanced search tree –Node has CF triple for each child –Leaf node represents cluster and has CF value for each subcluster in it. –Subcluster has maximum diameter

Part II - Clustering© Prentice Hall5 BIRCH Algorithm

Part II - Clustering© Prentice Hall6 Improve Clusters

Part II - Clustering© Prentice Hall7 DBSCAN Density Based Spatial Clustering of Applications with Noise Density Based Spatial Clustering of Applications with Noise Outliers will not effect creation of cluster. Outliers will not effect creation of cluster. Input Input –MinPts – minimum number of points in cluster –Eps – for each point in cluster there must be another point in it less than this distance away.

Part II - Clustering© Prentice Hall8 DBSCAN Density Concepts Eps-neighborhood: Points within Eps distance of a point. Eps-neighborhood: Points within Eps distance of a point. Core point: Eps-neighborhood dense enough (MinPts) Core point: Eps-neighborhood dense enough (MinPts) Directly density-reachable: A point p is directly density-reachable from a point q if the distance is small (Eps) and q is a core point. Directly density-reachable: A point p is directly density-reachable from a point q if the distance is small (Eps) and q is a core point. Density-reachable: A point si density- reachable form another point if there is a path from one to the other consisting of only core points. Density-reachable: A point si density- reachable form another point if there is a path from one to the other consisting of only core points.

Part II - Clustering© Prentice Hall9 Density Concepts

Part II - Clustering© Prentice Hall11 CURE Clustering Using Representatives Clustering Using Representatives Use many points to represent a cluster instead of only one Use many points to represent a cluster instead of only one Points will be well scattered Points will be well scattered

Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.

Similar presentations

Presentation on theme: "Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.

Similar presentations

Presentation on theme: "Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering."— Presentation transcript:

Similar presentations

About project

Feedback