Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23.

Similar presentations


Presentation on theme: "CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23."— Presentation transcript:

1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23

2 Outline Motivation Objective Research restrict Literature review An overview of related clustering algorithms The limitations of clustering algorithms CHAMELEON Concluding remarks Personal opinion

3 Motivation Existing clustering algorithms can breakdown Choice of parameters is incorrect Model is not adequate to capture the characteristics of clusters Diverse shapes, densities, and sizes

4 Objective Presenting a novel hierarchical clustering algorithm – CHAMELEON Facilitating discovery of natural and homogeneous Being applicable to all types of data

5 Research Restrict In this paper, authors ignored the issue of scaling to large data sets that cannot fit in the main memory

6 Literature Review Clustering An overview of related clustering algorithms The limitations of the recently proposed state of the art clustering algorithms

7 Clustering The intracluster similarity is maximized and the intercluster similarity is minimized [Jain and Dubes, 1988] Serving as the foundation for data mining and analysis techniques

8 Clustering(cont ’ d) Applications Purchasing patterns Categorization of documents on WWW [Boley, et al., 1999] Grouping of genes and proteins that have similar functionality[Harris, et al., 1992] Grouping if spatial locations prone to earth quakes[Byers and Adrian, 1998]

9 An Overview of Related Clustering Algorithms Partitional techniques Hierarchical techniques

10 Partitional Techniques K means[Jain and Dubes, 1988]

11 Hierarchical Techniques CURE [Guha, Rastogi and Shim, 1998] ROCK [Guha, Rastogi and Shim, 1999]

12 Limitations of Existing Hierarchical Schemas CURE Fail to take into account special characteristics

13 Limitations of Existing Hierarchical Schemas(cont ’ d) ROCK Irrespective of densities and shapes

14 CHAMELEON Overview Modeling the data Modeling the cluster similarity A two-phase clustering algorithm Performance analysis Experimental Results

15 Overall Framework CHAMELEON

16 Modeling the Data K-nearest graphs from an original data in 2D

17 Modeling the Cluster Similarity Relative inter-connectivity

18 Modeling the Cluster Similarity(cont ’ d) Relative closeness

19 A Two-phase Clustering Algorithm Phase I: Finding initial sub-clusters

20 A Two-phase Clustering Algorithm(cont ’ d) Phase I: Finding initial sub-clusters Multilevel paradigm[Karypis & Kumar, 1999] hMeT|s [Karypis & Kumar, 1999]

21 A Two-phase Clustering Algorithm(cont ’ d) Phase II: Merging sub-clusters using a dynamic framework T RI, T RC : user specified threshold

22 A Two-phase Clustering Algorithm(cont ’ d) Phase II: Merging sub-clusters using a dynamic framework

23 Performance Analysis The amount of time required to compute K-nearest neighbor graph Two-phase clustering

24 Performance Analysis(cont ’ d) The amount of time required to compute K-nearest neighbor graph Low-dimensional data sets = O(n log n) High-dimensional data sets = O(n 2 )

25 Performance Analysis(cont ’ d) The amount of time required to compute Two-phase clustering Computing internal inter-connectivity and closeness for each cluster: O(nm) Selecting the most similar pair of cluster: O(n log n + m 2 log m) Total time = O(nm + n log n + m 2 log m)

26 Experimental Results Program DBSCAN: a publicly available version CURE: a locally implemented version Data sets Qualitative comparison

27 Data Sets Five clusters Different size, shape, and density Noise point Two clusters Close to each other Different region, different densities Six clusters Different size, shape, and orientation Random noise point Special artifacts Eight clusters Different size, shape, and orientation Random noise and special artifacts Eight clusters Different size, shape, density, and orientation Random noise point

28 Concluding remarks CHAMELEON can discover natural clusters of different shapes and sizes It is possible to use other algorithms instead of k-nearest neighbor graph Different domains may require different models for capturing closeness and inter-connectivity

29 Personal Opinion Without further work


Download ppt "CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Author:George et al. Advisor:Dr. Hsu Graduate:ZenJohn Huang IDSL seminar 2001/10/23."

Similar presentations


Ads by Google