Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model-based Clustering

Similar presentations


Presentation on theme: "Model-based Clustering"— Presentation transcript:

1 Model-based Clustering
Saisandeep Sanka Chengjun Zhu

2 Introduction Data mining :extract information from any data source.
Where data come from? Experiments Observations

3 supervised learning & unsupervised learning.
With labels & Without labels

4 History and background
Back to the late 1950s. Clustering: It is a method to find the cohesive groups based on measured characteristics using numerical measurement. heuristic methods: such as partitioning method and hierarchical clustering. partitioning methods are K-means References:Model-based Clustering by HaiJiang Steven Shi

5 it is hard to know how to compare the performance between methods, and there is no way to deal with outliers in heuristic methods. References:Model-based Clustering by HaiJiang Steven Shi

6 Clustering Classification
Clustering algorithms Hierarchial Partitional Density,Grid Model-based complete single average ward Graph theoretic K-means FCM MDS

7 Model Based Clustering
Model-based clustering method is based on probability model for the data. We assume data come from some distribution functions. So the reason to divide the data into groups is that they come from different probability models (one for each group). References:Model-based Clustering by HaiJiang Steven Shi

8 Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )}
Mixture Model The observations are often heterogeneous, rather than one single homogeneous group, and can often be modeled by a mixture distribution. A finite mixture distribution is a weighted linear combination of a finite number of simple component distributions: Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} References:Model-based Clustering by HaiJiang Steven Shi

9 Multivariate Normal mixture models
The multivariate normal distribution is often used as the common mixture component. f(xi|Θ)= 𝑘=1 𝑔 { 𝜋 𝑘 ∗𝜑( 𝑥 𝑖 ; 𝜇 𝑘 , Σ 𝑘 )} Parameters: 𝜃 𝑘 become { 𝑢 𝑘 , Σ 𝑘 } Θ is ( 𝜋 1 ,𝜋 2, 𝜋 3 ,….. 𝜋 𝑔−1 , 𝜇 1, 𝜇 2 …… 𝜇 𝑔 , Σ 1 , Σ 2 …… Σ 𝑔 ) References:Model-based Clustering by HaiJiang Steven Shi

10 Examples 1 Breast Cancer 1:No Cancer 2: Cancer obs1 .. obs2 obs3 obsN
Group 1 2 Age Breast pain redness Skin dimpling obs1 .. obs2 obs3 obsN

11 Terminology References:Introduction to Mixture Modeling
Kevin A. Kupzyk, MA Methodological Consultant, CYFS SRM Unit

12 Covariance Matrix Decomposition
Σ 𝑘 = 𝜆 𝑘 𝑂 𝑘 𝐷 𝑘 𝑂 𝑘 𝑇 𝜆 𝑘 is a scalar constant, and represents the volume of the kth covariate matrix 𝑂 𝑘 is an orthogonal matrix representing the orientation of the kth covariate matrix 𝐷 𝑘 is a diagonal matrix, represents the shape of the kth covariate matrix 𝐷𝑖𝑎𝑔{ 𝛼 1𝑘 , 𝛼 2𝑘 ,…, 𝛼 𝑝𝑘 }, 𝑤ℎ𝑒𝑟𝑒 𝛼 1𝑘 ≥ 𝛼 2𝑘 ≥…≥𝛼 𝑝𝑘 ≥0 This is the Covariance matrix representation given in Banfield and Raftery (1993) Count the parameters Reference: Model-based Clustering by HaiJiang Steven Shi

13 Structure of Cluster based on Covariance matrix

14 Number of Parameters in Covariance Matrix
Reference: Model-Based Clustering: An Overview by Paul McNicholas Department of Mathematics & Statistics, University of Guelph.

15 Parameter Estimation Maximum likelihood, Bayes
Iterative methods needed

16 Maximum likelihood estimation
Maximum likelihood estimation method has been by far the most commonly used approach to the fitting of mixture distributions with the likelihood function. MLE (maximum likelihood estimation) is used in model-based clustering method to find the parameter inside the probability model L(Θ|X)∝ 𝑖=1 𝑛 𝑓 ( 𝑥 𝑖 | 𝜃 𝑘 ) = 𝑖=1 𝑛 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 )

17 Log liklihood function
l(Θ|X) = 𝑖=1 𝑛 log( 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 ) ) The log-likelihood function leads to a non-linear optimization problem.

18 Estimation(EM algorithm)
Handle missing data problem Introducing latent variables : class labels are missing General iterative optimization algorithm for maximizing a likelihood function Each EM-step the likelihood can only increase

19 Mahalanobis distance - Criteria to classify

20 BIC criterion 2𝑙𝑜𝑔𝑓 𝑋 𝑀 𝑘 ≈2𝑙𝑜𝑔𝑓 𝑋 Θ 𝑘 , 𝑀 𝑘 − 𝑣 𝑘 log 𝑛 =𝐵𝐼𝐶
𝑣 𝑘 is the number of parameters to be estimated in Model 𝑀 𝑘 Θ 𝑘 maximum likelihood estimate of parameter vector Θ 𝑘 . Higher the BIC better the clustering Reference: Model-based Clustering by HaiJiang Steven Shi

21 Outlier We can add an initial guess regarding the outliers to the Mclust as a prior information

22 Thank You


Download ppt "Model-based Clustering"

Similar presentations


Ads by Google