Model-based Clustering Saisandeep Sanka Chengjun Zhu
Introduction Data mining :extract information from any data source. Where data come from? Experiments Observations
supervised learning & unsupervised learning. With labels & Without labels
History and background Back to the late 1950s. Clustering: It is a method to find the cohesive groups based on measured characteristics using numerical measurement. heuristic methods: such as partitioning method and hierarchical clustering. partitioning methods are K-means References:Model-based Clustering by HaiJiang Steven Shi
it is hard to know how to compare the performance between methods, and there is no way to deal with outliers in heuristic methods. References:Model-based Clustering by HaiJiang Steven Shi
Clustering Classification Clustering algorithms Hierarchial Partitional Density,Grid Model-based complete single average ward Graph theoretic K-means FCM MDS
Model Based Clustering Model-based clustering method is based on probability model for the data. We assume data come from some distribution functions. So the reason to divide the data into groups is that they come from different probability models (one for each group). References:Model-based Clustering by HaiJiang Steven Shi
Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} Mixture Model The observations are often heterogeneous, rather than one single homogeneous group, and can often be modeled by a mixture distribution. A finite mixture distribution is a weighted linear combination of a finite number of simple component distributions: Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} References:Model-based Clustering by HaiJiang Steven Shi
Multivariate Normal mixture models The multivariate normal distribution is often used as the common mixture component. f(xi|Θ)= 𝑘=1 𝑔 { 𝜋 𝑘 ∗𝜑( 𝑥 𝑖 ; 𝜇 𝑘 , Σ 𝑘 )} Parameters: 𝜃 𝑘 become { 𝑢 𝑘 , Σ 𝑘 } Θ is ( 𝜋 1 ,𝜋 2, 𝜋 3 ,….. 𝜋 𝑔−1 , 𝜇 1, 𝜇 2 …… 𝜇 𝑔 , Σ 1 , Σ 2 …… Σ 𝑔 ) References:Model-based Clustering by HaiJiang Steven Shi
Examples 1 Breast Cancer 1:No Cancer 2: Cancer obs1 .. obs2 obs3 obsN Group 1 2 Age Breast pain redness Skin dimpling obs1 .. obs2 obs3 obsN
Terminology References:Introduction to Mixture Modeling Kevin A. Kupzyk, MA Methodological Consultant, CYFS SRM Unit
Covariance Matrix Decomposition Σ 𝑘 = 𝜆 𝑘 𝑂 𝑘 𝐷 𝑘 𝑂 𝑘 𝑇 𝜆 𝑘 is a scalar constant, and represents the volume of the kth covariate matrix 𝑂 𝑘 is an orthogonal matrix representing the orientation of the kth covariate matrix 𝐷 𝑘 is a diagonal matrix, represents the shape of the kth covariate matrix 𝐷𝑖𝑎𝑔{ 𝛼 1𝑘 , 𝛼 2𝑘 ,…, 𝛼 𝑝𝑘 }, 𝑤ℎ𝑒𝑟𝑒 𝛼 1𝑘 ≥ 𝛼 2𝑘 ≥…≥𝛼 𝑝𝑘 ≥0 This is the Covariance matrix representation given in Banfield and Raftery (1993) Count the parameters Reference: Model-based Clustering by HaiJiang Steven Shi
Structure of Cluster based on Covariance matrix
Number of Parameters in Covariance Matrix Reference: Model-Based Clustering: An Overview by Paul McNicholas Department of Mathematics & Statistics, University of Guelph.
Parameter Estimation Maximum likelihood, Bayes Iterative methods needed
Maximum likelihood estimation Maximum likelihood estimation method has been by far the most commonly used approach to the fitting of mixture distributions with the likelihood function. MLE (maximum likelihood estimation) is used in model-based clustering method to find the parameter inside the probability model L(Θ|X)∝ 𝑖=1 𝑛 𝑓 ( 𝑥 𝑖 | 𝜃 𝑘 ) = 𝑖=1 𝑛 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 )
Log liklihood function l(Θ|X) = 𝑖=1 𝑛 log( 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 ) ) The log-likelihood function leads to a non-linear optimization problem.
Estimation(EM algorithm) Handle missing data problem Introducing latent variables : class labels are missing General iterative optimization algorithm for maximizing a likelihood function Each EM-step the likelihood can only increase
Mahalanobis distance - Criteria to classify
BIC criterion 2𝑙𝑜𝑔𝑓 𝑋 𝑀 𝑘 ≈2𝑙𝑜𝑔𝑓 𝑋 Θ 𝑘 , 𝑀 𝑘 − 𝑣 𝑘 log 𝑛 =𝐵𝐼𝐶 𝑣 𝑘 is the number of parameters to be estimated in Model 𝑀 𝑘 Θ 𝑘 maximum likelihood estimate of parameter vector Θ 𝑘 . Higher the BIC better the clustering Reference: Model-based Clustering by HaiJiang Steven Shi
Outlier We can add an initial guess regarding the outliers to the Mclust as a prior information
Thank You