Presentation is loading. Please wait.

Presentation is loading. Please wait.

A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.

Similar presentations


Presentation on theme: "A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011."— Presentation transcript:

1 A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011

2 Overview Introduction Data clustering: approaches and current challenges fzSC a novel fuzzy subtractive clustering method for FCM parameter initialization Datasets artificial and real datasets for testing fzSC Experimental results Discussion

3 Clustering problem Data points are clustered based on Similarity Dissimilarity Clusters are defined by Number of clusters Cluster boundaries & overlaps Compactness within clusters Separation between clusters

4 Clustering approaches Hierarchical approach Partitioning approach Hard clustering approach Crisp cluster boundaries Crisp cluster membership Soft/Fuzzy clustering approach Soft/Fuzzy membership Overlapping cluster boundaries Most appropriate for the real problems

5 Fuzzy C-Means algorithm The model Features: Fuzzy membership, soft cluster boundaries Each data point can belong to multiple clusters, more relationship information provided

6 Fuzzy C-Means (contd.) Possibility-based model Fuzzy sets to describe clusters Model parameters estimated using an iteration process Rapid convergence Challenges: Determining the number of clusters Initializing the partition matrix to avoid local optima

7 Methods for partition matrix initialization Based on randomization Problem: Different randomization methods depend on different data distributions Using heuristic algorithms: Particle Swarm Problem: Slow convergence because of velocity adjustment Integrated with optimization algorithms Problem: Still based on other methods of partition matrix initialization

8 Methods for partition matrix…(contd) using Subtractive Clustering Mountain function; the data density,,  : mountain peak radius Mountain amendment; density adjustment,,  : mountain radius Cluster candidate; the most dense data point,  : threshold to stop the cluster center selection

9 Subtractive Clustering method The problems Mountain peak radius?  Remaining density to be selected?  Mountain radius?  OK NO OK NO Computational time: O(n 2 )

10 The proposed method: fzSC for partition matrix initialization 1. Generate a random fuzzy partition 2. Compute cluster density using histogram 3. Use strong uniform fuzzy partition concept 4. Estimate mountain function based on cluster density 5. Amend mountain function: 1. Update cluster density (step 2) 2. Re-estimate mountain function (step 4)

11 fzSC: Optimal number of clusters 1. The most dense data point is a cluster candidate Data density is not much affected, say less than 0.05 of the data density removed by the mountain function amendment process. The number of such points is less than  n 2. , ,  are not required 3. Computational time: O(c*n)

12 Datasets Artificial datasets Finite mixture model based datasets A manually created (MC) dataset Data were generated using finite mixture model Clusters were moved to have different distances among clusters Real datasets Iris, Wine, Glass and Breast Cancer Wisconsin datasets at UC Irvine Machine Learning Repository

13 Visualization of fzSC result on the manually created (MC) dataset Rectangles- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC

14 A visualization… Stars- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC The utility is available online: http://ouray.ucdenver.edu/~tnle/fzsc/http://ouray.ucdenver.edu/~tnle/fzsc/

15 Experimental results on manually created dataset The algorithm performance on the MC dataset Algorithm Correctness ratio by class Avg. Ratio 123456 fzSC1.00 k-means0.970.871.00 0.750.93 k-medians0.950.821.00 0.620.90 FCM0.971.000.951.00 0.960.98

16 Experimental results on artificial datasets The number of clusters generated in the dataset The dataset dimension 2345 50.971.00 6 0.980.901.00 7 8 0.990.971.00 90.870.991.000.96 Correctness ratio in determining cluster number

17 Experimental results on Real datasets Dataset # data points known #clusters predicted #clusters ratio Iris150331.00 Wine178331.00 Glass2146 6565 0.95 0.05 Breast Cancer Wisconsin 6996 6565 0.65 0.35 Correctness ratio in determining cluster number

18 Discussion: The advantages of fzSC Traditional subtractive clustering , ,  are not required Computational time O(c*n) vs. O(n 2 ) Heuristic based approaches Rapid convergence Escape local optima Probability model based Rapid convergence No assumption of data distribution

19 Discussion: Future work Combine fzSC with biological cluster validation methods and optimization algorithms for novel clustering algorithms regarding the gene expression data analysis problem.

20 Thank you! Questions? We acknowledge the support from Vietnamese Ministry of Education and Training, the 322 scholarship program.


Download ppt "A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011."

Similar presentations


Ads by Google