PROBABILISTIC MODEL-BASED CLUSTERING USING MIXTURE MODELS Data Mining Lecture VI [4.5, 8.4, 9.2, 9.6, Hand, Manilla, Smyth ]
Probabilistic Model-Based Clustering using Mixture Models A probability mixture model A mixture model is a formalism for modeling a probability density function as a sum of parameterized functions. In mathematical terms:
A probability mixture model where p X (x) is the modeled probability distribution function, K is the number of components in the mixture model, and a k is mixture proportion of component k. By definition, 0 < a k < 1 for all k = 1…K and:
A probability mixture model h(x | λ k ) is a probability distribution parameterized by λ k. Mixture models are often used when we know h(x) and we can sample from p X (x), but we would like to determine the a k and λ k values. Such situations can arise in studies in which we sample from a population that is composed of several distinct subpopulations.
A common approach for ‘decomposing’ a mixture model It is common to think of mixture modeling as a missing data problem. One way to understand this is to assume that the data points under consideration have "membership" in one of the distributions we are using to model the data. When we start, this membership is unknown, or missing. The job of estimation is to devise appropriate parameters for the model functions we choose, with the connection to the data points being represented as their membership in the individual model distributions.
Probabilistic Model-Based Clustering using Mixture Models The EM-algorithm [book section 8.4]
Mixture Decomposition: The ‘Expectation-Maximization’ Algorithm The Expectation-maximization algorithm computes the missing memberships of data points in our chosen distribution model. It is an iterative procedure, where we start with initial parameters for our model distribution (the a k 's and λ k 's of the model listed above). The estimation process proceeds iteratively in two steps, the Expectation Step, and the Maximization Step.
The ‘Expectation-Maximization’ Algorithm The expectation step With initial guesses for the parameters in our mixture model, we compute "partial membership" of each data point in each constituent distribution. This is done by calculating expectation values for the membership variables of each data point.
The ‘Expectation-Maximization’ Algorithm The maximization step With the expectation values in hand for group membership, we can recompute plug-in estimates of our distribution parameters. For the mixing coefficient of this is simply the fractional membership of all data points in the second distribution.
EM-algorithm for Clustering The Suppose we have data D with a model with parameters and hidden parameters H Interpretation: H = the class label Log-likelihood of observed data:
EM-algorithm for Clustering With p the probability over the data D. Let Q be the unknown distribution over the hidden parameters H Then the log-likelihood is:
Jensen’s inequality for a concave-down function, the expected value of the function is less than the function of the expected value. The gray rectangle along the horizontal axis represents the probability distribution of x, which is uniform for simplicity, but the general idea applies for any distribution
EM-algorithm So: F(Q, ) is a lower-bound on the log-likelihood function l(Q, ). EM alternates between: E-step: maximising F to Q with fixed , and: M-step: maximising F to with fixed Q.
Gaussian Mixture Decomposition Gaussian mixture Decomposition is a good classificator. It allows supervised as well as unsupervised learning (find how many classes is optimal, how they should be defined,...). But training is iterative and time consuming. Idea is to set position and width of gaussian distribution(s) to optimize the coverage of learning samples.
Probabilistic Model-Based Clustering using Mixture Models