Model-based Clustering

Name: Model-based Clustering
Uploaded: 2017-08-28T04:57:26+00:00
Duration: PTM7S22
Channel: Abner Heath
Description: Model-based Clustering

Model-based Clustering
Saisandeep Sanka Chengjun Zhu

Introduction Data mining :extract information from any data source.
Where data come from? Experiments Observations

supervised learning & unsupervised learning.
With labels & Without labels

History and background
Back to the late 1950s. Clustering: It is a method to find the cohesive groups based on measured characteristics using numerical measurement. heuristic methods: such as partitioning method and hierarchical clustering. partitioning methods are K-means References:Model-based Clustering by HaiJiang Steven Shi

it is hard to know how to compare the performance between methods, and there is no way to deal with outliers in heuristic methods. References:Model-based Clustering by HaiJiang Steven Shi

Clustering Classification
Clustering algorithms Hierarchial Partitional Density,Grid Model-based complete single average ward Graph theoretic K-means FCM MDS

Model Based Clustering
Model-based clustering method is based on probability model for the data. We assume data come from some distribution functions. So the reason to divide the data into groups is that they come from different probability models (one for each group). References:Model-based Clustering by HaiJiang Steven Shi

Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )}
Mixture Model The observations are often heterogeneous, rather than one single homogeneous group, and can often be modeled by a mixture distribution. A finite mixture distribution is a weighted linear combination of a finite number of simple component distributions: Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} References:Model-based Clustering by HaiJiang Steven Shi

Multivariate Normal mixture models
The multivariate normal distribution is often used as the common mixture component. f(xi|Θ)= 𝑘=1 𝑔 { 𝜋 𝑘 ∗𝜑( 𝑥 𝑖 ; 𝜇 𝑘 , Σ 𝑘 )} Parameters: 𝜃 𝑘 become { 𝑢 𝑘 , Σ 𝑘 } Θ is ( 𝜋 1 ,𝜋 2, 𝜋 3 ,….. 𝜋 𝑔−1 , 𝜇 1, 𝜇 2 …… 𝜇 𝑔 , Σ 1 , Σ 2 …… Σ 𝑔 ) References:Model-based Clustering by HaiJiang Steven Shi

Examples 1 Breast Cancer 1:No Cancer 2: Cancer obs1 .. obs2 obs3 obsN
Group 1 2 Age Breast pain redness Skin dimpling obs1 .. obs2 obs3 obsN

Terminology References:Introduction to Mixture Modeling
Kevin A. Kupzyk, MA Methodological Consultant, CYFS SRM Unit

Covariance Matrix Decomposition
Σ 𝑘 = 𝜆 𝑘 𝑂 𝑘 𝐷 𝑘 𝑂 𝑘 𝑇 𝜆 𝑘 is a scalar constant, and represents the volume of the kth covariate matrix 𝑂 𝑘 is an orthogonal matrix representing the orientation of the kth covariate matrix 𝐷 𝑘 is a diagonal matrix, represents the shape of the kth covariate matrix 𝐷𝑖𝑎𝑔{ 𝛼 1𝑘 , 𝛼 2𝑘 ,…, 𝛼 𝑝𝑘 }, 𝑤ℎ𝑒𝑟𝑒 𝛼 1𝑘 ≥ 𝛼 2𝑘 ≥…≥𝛼 𝑝𝑘 ≥0 This is the Covariance matrix representation given in Banfield and Raftery (1993) Count the parameters Reference: Model-based Clustering by HaiJiang Steven Shi

Structure of Cluster based on Covariance matrix

Number of Parameters in Covariance Matrix
Reference: Model-Based Clustering: An Overview by Paul McNicholas Department of Mathematics & Statistics, University of Guelph.

Parameter Estimation Maximum likelihood, Bayes
Iterative methods needed

Maximum likelihood estimation
Maximum likelihood estimation method has been by far the most commonly used approach to the fitting of mixture distributions with the likelihood function. MLE (maximum likelihood estimation) is used in model-based clustering method to find the parameter inside the probability model L(Θ|X)∝ 𝑖=1 𝑛 𝑓 ( 𝑥 𝑖 | 𝜃 𝑘 ) = 𝑖=1 𝑛 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 )

Log liklihood function
l(Θ|X) = 𝑖=1 𝑛 log( 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 ) ) The log-likelihood function leads to a non-linear optimization problem.

Estimation(EM algorithm)
Handle missing data problem Introducing latent variables : class labels are missing General iterative optimization algorithm for maximizing a likelihood function Each EM-step the likelihood can only increase

Mahalanobis distance - Criteria to classify

BIC criterion 2𝑙𝑜𝑔𝑓 𝑋 𝑀 𝑘 ≈2𝑙𝑜𝑔𝑓 𝑋 Θ 𝑘 , 𝑀 𝑘 − 𝑣 𝑘 log 𝑛 =𝐵𝐼𝐶
𝑣 𝑘 is the number of parameters to be estimated in Model 𝑀 𝑘 Θ 𝑘 maximum likelihood estimate of parameter vector Θ 𝑘 . Higher the BIC better the clustering Reference: Model-based Clustering by HaiJiang Steven Shi

Outlier We can add an initial guess regarding the outliers to the Mclust as a prior information

Thank You

Model-based Clustering

Similar presentations

Presentation on theme: "Model-based Clustering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model-based Clustering

Similar presentations

Presentation on theme: "Model-based Clustering"— Presentation transcript:

Similar presentations

About project

Feedback