Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

Slides:

Advertisements

Similar presentations

Random Forest Predrag Radenković 3237/10

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Clustering Basic Concepts and Algorithms

Unsupervised Learning

Lecture 3 Nonparametric density estimation and classification

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

NAACSOS 2005Scott Christley, Temporal Analysis of Social Positions An Algorithm for Temporal Analysis of Social Positions Scott Christley, Greg Madey Dept.

Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.

Segmentation Divide the image into segments. Each segment:

Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Radial Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Evaluating Performance for Data Mining Techniques

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong

Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Mean-shift and its application for object tracking

Image Segmentation Seminar III Xiaofeng Fan. Today ’ s Presentation Problem Definition Problem Definition Approach Approach Segmentation Methods Segmentation.

Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.

A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Image Classification 영상분류

Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang

EECS 274 Computer Vision Segmentation by Clustering II.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.

CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:

Computer Graphics and Image Processing (CIS-601).

Academic Research Academic Research Dr Kishor Bhanushali M

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.

CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

Unsupervised Mining of Statistical Temporal Structures in Video Liu ze yuan May 15,2011.

Flat clustering approaches

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.

Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.

1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Methods of multivariate analysis Ing. Jozef Palkovič, PhD.

Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.

Semi-Supervised Clustering

Clustering (3) Center-based algorithms Fuzzy k-means

A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:

Clustering Wei Wang.

Text Categorization Berlin Chen 2003 Reference:

CSE572: Data Mining by H. Liu

Presentation transcript:

Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.

What is Clustering Cluster: a collection of data objects ◦ Similar to one another within the same cluster ◦ Dissimilar to the objects in other clusters Cluster analysis ◦ Grouping a set of data objects into clusters Clustering is unsupervised classification: no predefined classes

Good Clustering A good clustering method will produce high quality clusters with ◦ high intra-class similarity ◦ low inter-class similarity The quality of a clustering result depends on both the similarity measure used by the method and its implementation.

Part I: Clustering Continuous Data There are many algorithms K-means PAM Hierarchical Clustering Model-based Clustering …..

Why non-parametric clustering Our goal is to develop algorithms that 1) do not assume the functional form, 2) do not depend on the parameters, 3) have less dependence on the distance function, 4) achieve better accuracy.

Particle Movement

Gravitational Field

Shrink Method 1 The convergence can be achieved if only one point is moved while the rest of data points are not. It can be proved by using the previous theorem together with Fixed Point Theorem. We also derived an optimization algorithm based on this idea.

Shrinking Method 2 K-nearest neighbor method is slow. To achieve faster convergence, we propose to update each data point after each iteration. This is less accurate but faster. It corresponds to the following stochastic process:

CLUES ALGORITHM Given k=2 j+1, j=0,1,2, … For each data point, perform the shrinking procedure. Replace each data point with the median of its K-nearest neighbors. Repeat the procedure until convergence. Detect cluster centers and partition. Since the result is dependent for K, we need to search for the K giving optimal results.

Convergence If one point is moved while keeping other points fixed, the convergence can be proved by Fixed Point Theorem. If all points are moving at the same time, the process corresponds to a random stochastic process, anti-diffusion process.

Partition by PAM

Partition by CLUES

Relationship with Similar algorithms Mean Shift algorithm Data Sharpening by Heckman and Hall. Gravitational clustering. What is our advantage? 1) K-nearest neighbor is highly adaptive. 2) Our algorithm is parameter-free.

References Wang, X., Qiu, W. and Zamar, R. (2006). An Iterative Non-parametric Clustering Algorithm Based on Local Shrinking. Computational Statistics and Data Analysis.An Iterative Non-parametric Clustering Algorithm Based on Local Shrinking. Wang, X., Liang, D, Feng, X. and Ye, L. (2007). A Derivative Free Optimization Algorithm based on Conditional Moments. Journal of Mathematical Analysis and Applications. Vol. 331, No. 2, A Derivative Free Optimization Algorithm based on Conditional Moments.

Clustering Categorical Data Review of current literature Hamming Distance and CD vector Modified Chi-square test Description of our Algorithm Numerical Results Conclusion and Discussions

Review of existing algorithms K-modes 1. This algorithm is built on the idea of K- means algorithm. 2. It demands the number of clusters. 3. Partition is sensitive to the input order. 4. Computational Complexity O(n)

AutoClass Algorithm This algorithm can cluster both categorical and numeric data types. 1. It utilizes the EM algorithm. 2. It searches the optimal number of clusters 3. EM algorithm is known to have slow convergence. 4. The computational complexity is O(n).

Categorical Sample Space Assume that the data set is stored in a n*p matrix, where n is the number of observations and p the number of categorical variables. The sample space consists of all possible combinations generated by p variables. The sample space is discrete and has no natural origin.

Hamming Distance and CD vector Hamming distance measures the number of different attributes between two categorical variables. Hamming Distance has been used in clustering categorical data in algorithms similar to K-modes. We construct Categorical Distance (CD) vector to project the sample space into 1- dimesional space.

Example of a CD vector

More on CD vector The dense region of the CD vector is NOT necessarily a cluster! The length of the CD vector is p. We can construct many CD vectors on one data set by choosing different “origin”.

How to detect a cluster ? The CD vector shows some clustering pattern. But are they statistically significant? Statistical Hypothesis Testing: Null Hypothesis: Uniformly distributed. Alternative: Not uniformly distributed. We call the expected CD vector under the null Uniform CD vector (UCD).

UCD: Expected CD vector under Null. UCD: Expected CD vector under Null.

CD Vector UCD Vector

How to compare these 2 vectors? One is the observed CD vector. The other is the expected CD vector under null hypothesis. Chi-square is the most natural tool to test the null hypothesis based on these two vectors. However clustering patterns are all local features. Thus we are not interested in a comparison at a global level.

Modified Chi-square Test The modified Chi-square is defined as:

Choice of C and Radius of a Cluster

CD Algorithm Find a cluster center; Construct the CD vector given the current center ; Perform modified Chi-square test; If we reject the null, then determine the radius of the current cluster; Extract the cluster Repeat until we do not reject the null.

Numerical Comparison with K-mode and AutoClass CD AutoClass K-mode No. of Clusters 4 4 [3] [4] [5] _____________________________________________________ Classi. Rates 100% 100% 75% 84% 82% “Variations” 0% 0% 6% 15% 10% Inform. Gain 100% 100% 67% 84% 93% “Variations” 0% 0% 10% 15% 11% _____________________________________________________ Soybean Data: n=47 and p=35. No of clusters=4.

Numerical Comparison with K-mode and AutoClass CD AutoClass K-mode No. of Clusters 7 3 [6] [7] [8] ____________________________________________________ _ Classi. Rates 95% 73% 74% 72% 71% “Variations” 0% 0% 6% 15% 10% Inform. Gain 92% 60% 75% 79% 81% “Variations” 0% 0% 7% 6% 6% ____________________________________________________ _ Zoo Data: n=101 and p=16. No of clusters=7.

Run Times Comparison K-modes CD ____________________________________ Soybean Average S.D Zoo Data Average S.D ____________________________________ Note that AutoClass requires human intervention.

Computational Complexity The upper bound of the computational complexity of our algorithm is O(kpn) Note that the sample size shrinks if the CD algorithm detects a cluster It is less computational intensive than K- modes and AutoClass since both have complexity of O(akpn) where a>1.

Conclusion Our algorithm requires no convergence criterion. It automatically estimate the number of clusters. It does not demand or search for the true number of clusters. The sample size is reduced after one detected cluster is extracted. The computational complexity of our algorithm is bounded by O(n).

Future Work Clustering functional data. Clustering mixed data, continuous and categorical data at the same time. Clustering data with spatial and temporal structure. Develop parallel computing algorithms for our methods.

Reference: Zhang, P, Wang, X. and Song, P. (2007) Clustering Categorical Data Based on Distance Vectors. JASA.

Software Download Please go to Software section and you can download the program. We are also in the process of developing a C program for both methods.