Model-based Clustering

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Mixture Models and the EM Algorithm
Unsupervised Learning
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
Chapter 4: Linear Models for Classification
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Machine Learning and Data Mining Clustering
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Classification and risk prediction
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Expectation-Maximization
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Anomaly detection with Bayesian networks Website: John Sandiford.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Classification Heejune Ahn SeoulTech Last updated May. 03.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Big Data Infrastructure
Usman Roshan CS 675 Machine Learning
CH 5: Multivariate Methods
Classification of unlabeled data:
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Unsupervised-learning Methods for Image Clustering
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
POINT ESTIMATOR OF PARAMETERS
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EM Algorithm 主講人:虞台文.
Machine Learning and Data Mining Clustering
Presentation transcript:

Model-based Clustering Saisandeep Sanka Chengjun Zhu

Introduction Data mining :extract information from any data source. Where data come from? Experiments Observations

supervised learning & unsupervised learning. With labels & Without labels

History and background Back to the late 1950s. Clustering: It is a method to find the cohesive groups based on measured characteristics using numerical measurement. heuristic methods: such as partitioning method and hierarchical clustering. partitioning methods are K-means References:Model-based Clustering by HaiJiang Steven Shi

it is hard to know how to compare the performance between methods, and there is no way to deal with outliers in heuristic methods. References:Model-based Clustering by HaiJiang Steven Shi

Clustering Classification Clustering algorithms Hierarchial Partitional Density,Grid Model-based complete single average ward Graph theoretic K-means FCM MDS

Model Based Clustering Model-based clustering method is based on probability model for the data. We assume data come from some distribution functions. So the reason to divide the data into groups is that they come from different probability models (one for each group). References:Model-based Clustering by HaiJiang Steven Shi

Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} Mixture Model The observations are often heterogeneous, rather than one single homogeneous group, and can often be modeled by a mixture distribution. A finite mixture distribution is a weighted linear combination of a finite number of simple component distributions: Xi ~f(xi|Θ) = 𝑘=1 𝑔 { 𝜋 𝑘 ∗ 𝑓 𝑘 ( 𝑥 𝑖 ; 𝜃 𝑘 )} References:Model-based Clustering by HaiJiang Steven Shi

Multivariate Normal mixture models The multivariate normal distribution is often used as the common mixture component. f(xi|Θ)= 𝑘=1 𝑔 { 𝜋 𝑘 ∗𝜑( 𝑥 𝑖 ; 𝜇 𝑘 , Σ 𝑘 )} Parameters: 𝜃 𝑘 become { 𝑢 𝑘 , Σ 𝑘 } Θ is ( 𝜋 1 ,𝜋 2, 𝜋 3 ,….. 𝜋 𝑔−1 , 𝜇 1, 𝜇 2 …… 𝜇 𝑔 , Σ 1 , Σ 2 …… Σ 𝑔 ) References:Model-based Clustering by HaiJiang Steven Shi

Examples 1 Breast Cancer 1:No Cancer 2: Cancer obs1 .. obs2 obs3 obsN Group 1 2 Age Breast pain redness Skin dimpling obs1 .. obs2 obs3 obsN

Terminology References:Introduction to Mixture Modeling Kevin A. Kupzyk, MA Methodological Consultant, CYFS SRM Unit

Covariance Matrix Decomposition Σ 𝑘 = 𝜆 𝑘 𝑂 𝑘 𝐷 𝑘 𝑂 𝑘 𝑇 𝜆 𝑘 is a scalar constant, and represents the volume of the kth covariate matrix 𝑂 𝑘 is an orthogonal matrix representing the orientation of the kth covariate matrix 𝐷 𝑘 is a diagonal matrix, represents the shape of the kth covariate matrix 𝐷𝑖𝑎𝑔{ 𝛼 1𝑘 , 𝛼 2𝑘 ,…, 𝛼 𝑝𝑘 }, 𝑤ℎ𝑒𝑟𝑒 𝛼 1𝑘 ≥ 𝛼 2𝑘 ≥…≥𝛼 𝑝𝑘 ≥0 This is the Covariance matrix representation given in Banfield and Raftery (1993) Count the parameters Reference: Model-based Clustering by HaiJiang Steven Shi

Structure of Cluster based on Covariance matrix

Number of Parameters in Covariance Matrix Reference: Model-Based Clustering: An Overview by Paul McNicholas Department of Mathematics & Statistics, University of Guelph.

Parameter Estimation Maximum likelihood, Bayes Iterative methods needed

Maximum likelihood estimation Maximum likelihood estimation method has been by far the most commonly used approach to the fitting of mixture distributions with the likelihood function. MLE (maximum likelihood estimation) is used in model-based clustering method to find the parameter inside the probability model L(Θ|X)∝ 𝑖=1 𝑛 𝑓 ( 𝑥 𝑖 | 𝜃 𝑘 ) = 𝑖=1 𝑛 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 )

Log liklihood function l(Θ|X) = 𝑖=1 𝑛 log( 𝑘=1 𝑔 𝜋 𝑘 𝑓 𝑘 ( 𝑥 𝑖 | 𝜃 𝑘 ) ) The log-likelihood function leads to a non-linear optimization problem.

Estimation(EM algorithm) Handle missing data problem Introducing latent variables : class labels are missing General iterative optimization algorithm for maximizing a likelihood function Each EM-step the likelihood can only increase

Mahalanobis distance - Criteria to classify

BIC criterion 2𝑙𝑜𝑔𝑓 𝑋 𝑀 𝑘 ≈2𝑙𝑜𝑔𝑓 𝑋 Θ 𝑘 , 𝑀 𝑘 − 𝑣 𝑘 log 𝑛 =𝐵𝐼𝐶 𝑣 𝑘 is the number of parameters to be estimated in Model 𝑀 𝑘 Θ 𝑘 maximum likelihood estimate of parameter vector Θ 𝑘 . Higher the BIC better the clustering Reference: Model-based Clustering by HaiJiang Steven Shi

Outlier We can add an initial guess regarding the outliers to the Mclust as a prior information

Thank You