A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Yinyin Yuan and Chang-Tsun Li Computer Science Department
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Outline Data with gaps clustering on the basis of neuro-fuzzy Kohonen network Adaptive algorithm for probabilistic fuzzy clustering Adaptive probabilistic.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES Włodzisław Duch Department of Informatics, Nicolaus Copernicus University, Poland, School of.
Machine Learning and Data Mining Clustering
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Mutual Information Mathematical Biology Seminar
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
What is Cluster Analysis?
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Birch: An efficient data clustering method for very large databases
Introduction to machine learning
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Anomaly detection Problem motivation Machine Learning.
CPSC 386 Artificial Intelligence Ellen Walker Hiram College
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Efficient Model Selection for Support Vector Machines
KE22 FINAL YEAR PROJECT PHASE 2 Modeling and Simulation of Milling Forces SIMTech Project Ryan Soon, Henry Woo, Yong Boon April 9, 2011 Confidential –
Probability-based imputation method for fuzzy cluster analysis of gene expression microarray data Thanh Le, Tom Altman and Katheleen Gardiner University.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
CS654: Digital Image Analysis
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
Haojun Sun,ShengruiWang*,Qingshan Jiang Received 16 December 2002; received in revised form 29 March 2004; accepted 29 March 2004 Presenter Chia-Cheng.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Fuzzy C-Means Clustering
Machine Learning Queens College Lecture 7: Clustering.
Flat clustering approaches
Clustering Algorithms Meta Applier (CAMA) Toolbox Dmitry S. Shalymov Kirill S. Skrygan Dmitry A. Lyubimov.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Thanh Le, Katheleen J. Gardiner University of Colorado Denver
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Fuzzy C-means Clustering Dr. Bernard Chen University of Central Arkansas.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
Introduction of Fuzzy Inference Systems By Kuentai Chen.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Fuzzy Logic in Pattern Recognition
Cluster Analysis II 10/03/2012.
Clustering (3) Center-based algorithms Fuzzy k-means
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
KAIST CS LAB Oh Jong-Hoon
Text Categorization Berlin Chen 2003 Reference:
Machine Learning and Data Mining Clustering
Outlines Introduction & Objectives Methodology & Workflow
Presentation transcript:

A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011

Overview Introduction Data clustering: approaches and current challenges fzSC a novel fuzzy subtractive clustering method for FCM parameter initialization Datasets artificial and real datasets for testing fzSC Experimental results Discussion

Clustering problem Data points are clustered based on Similarity Dissimilarity Clusters are defined by Number of clusters Cluster boundaries & overlaps Compactness within clusters Separation between clusters

Clustering approaches Hierarchical approach Partitioning approach Hard clustering approach Crisp cluster boundaries Crisp cluster membership Soft/Fuzzy clustering approach Soft/Fuzzy membership Overlapping cluster boundaries Most appropriate for the real problems

Fuzzy C-Means algorithm The model Features: Fuzzy membership, soft cluster boundaries Each data point can belong to multiple clusters, more relationship information provided

Fuzzy C-Means (contd.) Possibility-based model Fuzzy sets to describe clusters Model parameters estimated using an iteration process Rapid convergence Challenges: Determining the number of clusters Initializing the partition matrix to avoid local optima

Methods for partition matrix initialization Based on randomization Problem: Different randomization methods depend on different data distributions Using heuristic algorithms: Particle Swarm Problem: Slow convergence because of velocity adjustment Integrated with optimization algorithms Problem: Still based on other methods of partition matrix initialization

Methods for partition matrix…(contd) using Subtractive Clustering Mountain function; the data density,,  : mountain peak radius Mountain amendment; density adjustment,,  : mountain radius Cluster candidate; the most dense data point,  : threshold to stop the cluster center selection

Subtractive Clustering method The problems Mountain peak radius?  Remaining density to be selected?  Mountain radius?  OK NO OK NO Computational time: O(n 2 )

The proposed method: fzSC for partition matrix initialization 1. Generate a random fuzzy partition 2. Compute cluster density using histogram 3. Use strong uniform fuzzy partition concept 4. Estimate mountain function based on cluster density 5. Amend mountain function: 1. Update cluster density (step 2) 2. Re-estimate mountain function (step 4)

fzSC: Optimal number of clusters 1. The most dense data point is a cluster candidate Data density is not much affected, say less than 0.05 of the data density removed by the mountain function amendment process. The number of such points is less than  n 2. , ,  are not required 3. Computational time: O(c*n)

Datasets Artificial datasets Finite mixture model based datasets A manually created (MC) dataset Data were generated using finite mixture model Clusters were moved to have different distances among clusters Real datasets Iris, Wine, Glass and Breast Cancer Wisconsin datasets at UC Irvine Machine Learning Repository

Visualization of fzSC result on the manually created (MC) dataset Rectangles- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC

A visualization… Stars- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC The utility is available online:

Experimental results on manually created dataset The algorithm performance on the MC dataset Algorithm Correctness ratio by class Avg. Ratio fzSC1.00 k-means k-medians FCM

Experimental results on artificial datasets The number of clusters generated in the dataset The dataset dimension Correctness ratio in determining cluster number

Experimental results on Real datasets Dataset # data points known #clusters predicted #clusters ratio Iris Wine Glass Breast Cancer Wisconsin Correctness ratio in determining cluster number

Discussion: The advantages of fzSC Traditional subtractive clustering , ,  are not required Computational time O(c*n) vs. O(n 2 ) Heuristic based approaches Rapid convergence Escape local optima Probability model based Rapid convergence No assumption of data distribution

Discussion: Future work Combine fzSC with biological cluster validation methods and optimization algorithms for novel clustering algorithms regarding the gene expression data analysis problem.

Thank you! Questions? We acknowledge the support from Vietnamese Ministry of Education and Training, the 322 scholarship program.