A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Author: Zhexue Huang Advisor: Dr. Hsu Graduate: Yu-Wei Su
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Fast accurate fuzzy clustering through data reduction Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SEP/COP: An efficient method to find the best partition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Motivated Reinforcement Learning for Non-Player Characters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Enhanced neural gas network for prototype-based clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning multiple nonredundant clusterings Presenter :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Modelling Fish Behaviour Advisor : Dr. Hsu Presenter :
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A survey of kernel and spectral methods for clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A clustering-based approach for prediction of cardiac.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Michael.
Presentation transcript:

A Fuzzy k-Modes Algorithm for Clustering Categorical Data 國立雲林科技大學 National Yunlin University of Science and Technology A Fuzzy k-Modes Algorithm for Clustering Categorical Data Advisor:Dr. Hsu Graduate:Chien-Ming Hsiao Author:Zhexue Huang and Michael K. Ng

Outline Motivation Objective Introduction Notation Hard and fuzzy k-means algorithms Hard and fuzzy k-Modes algorithms Experimental Results Conclusions Personal Opinion

Motivation Working only on numeric data limits the use of these k-means-type algorithms in data mining. Most algorithms for clustering categorical data suffer from a common efficiency problem when applied to massive categorical-only data sets.

Objective To tackle the problem of clustering large categorical data sets in data mining

Introduction Fuzzy versions of k-means algorithm Each pattern is allowed to have membership functions to all clusters. Working only on numeric data limits the use of these k-means-type algorithms in such areas data mining.

Introduction To cluster categorical data methods the k-means algorithm [Ralambondrainy, 1995] hierarchical clustering methods [Gower, 1991] the PAM algorithm [Kaufman et al, 1990] the fuzzy-statistical algorithms [Woodbury, 1974] The conceptual clustering methods [Michalski, 1983]

Notation The set of objects to be clustered is stored in a database table T defined by a set of attributes A1, A2,…, Am.

Hard and fuzzy k-means algorithms Let X be a set of n objects described by m numeric attributes.

Hard and fuzzy k-means algorithms The usual method toward optimization of F is to use partial optimization for Z and W fix Z and find necessary conditions on W to minimize F Fix W and minimize F with respect to Z

Hard and fuzzy k-means algorithms Theorem 1 Let be fixed and consider Problem (P1)

Hard and fuzzy k-means algorithms Theorem 2 Let be fixed and consider Problem (P2)

Hard and fuzzy k-means algorithms The complexity of the algorithm O(tkmn) The space of the algorithm O(n(m+k) + km)

Hard and fuzzy k-Modes algorithms Using a simple matching dissimilarity measure for categorical objects Replacing the means of clusters with the modes Using a frequency-based method to find the modes

Hard and fuzzy k-Modes algorithms Let X and Y be two categorical objects X = Y = The simple matching dissimilarity measure between X and Y is defined as follows:

Hard and fuzzy k-Modes algorithms Using a frequency-based method to update Z The Hard k-modes Update Method The Fuzzy k-modes Update Method

Hard and fuzzy k-Modes algorithms Theorem 3 : The Hard k-modes Update Method The category of attribute Aj of the cluster mode Zl is determined by the mode of categories of attribute Aj in the set of objects belonging to cluster l the quantity

Hard and fuzzy k-Modes algorithms Theorem 4 : The Fuzzy k-modes Update Method The category of attribute Aj of the cluster mode Zl is given by the category that achieves the maximum of the summation of wli to cluster l over all categories. the quantity

Hard and fuzzy k-Modes algorithms Theorem 5

Hard and fuzzy k-Modes algorithms

Experimental Results To evaluate the performance and efficiency of the fuzzy k-modes algorithm To compare the fuzzy k-modes algorithm with the conceptual k-means algorithm and the hard k-modes algorithm Use real and artificial data Soybean disease data set.

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Experimental Results

Conclusions Introduced the fuzzy k-modes algorithm for clustering categorical objects based on extensions to the fuzzy k-means algorithm. The consequence of Theorem 4 that allows the k-means paradigm to be used in generating the fuzzy partition matrix from categorical data

Personal Opinion The fuzzy partition matrix provides more information to help the user to determine the final clustering and to identify the boundary objects