 # Segmentation and Fitting Using Probabilistic Methods

## Presentation on theme: "Segmentation and Fitting Using Probabilistic Methods"— Presentation transcript:

Segmentation and Fitting Using Probabilistic Methods
Or, How Expectation-Maximization Can Cure Your Computer Vision System of Almost Anything Well… maybe...

Departure Point Up to now, most of what we’ve done in the grouping, segmentation arena has been local. Now we want to model things globally, and in probabilistic terms. Explain a large collection of tokens with a few parameters. (Hmmm…. Like the Hough?)

Missing Data Problems, Fitting, Segmentation
Often, if some parameters were known, the maximum likelihood problem would be easy Fitting: If you know which line each token comes from, getting the parameters is easy Segmentation: If you the segment each pixel comes from, the segment’s parameters are easily determined Fundamental Matrix: If you know the correspondences….

Missing Data Problem A missing data problem is one where…
Some terms in a data vector are missing in some instances, but present in others An inference problem can be made simpler by rewriting it using some variables whose values are unknown Algorithm Concept: Take an expectation over the missing data

Missing Data Problems Strategy For example
Estimate values for the missing data Plug these in, now estimate parameters Re-estimate values for missing data Continue to convergence For example Guess a mapping of points to lines Fit each line to its points Reallocate points to the fitted lines Loop to convergence Reminiscent of K-means, is it not?

Refining the Strategy The problem has parameters to be estimated, and missing variables (data) Iterate to convergence: Replace missing data with expected values, given fixed parameter values Fix the missing data, do a maximium likelihood estimate of the parameters, given that data

Refining the Example Allocate each point to a line with a weight equal to the probability of the point, given the line’s parameters Refit the lines to the weighted set of points Converges to local extremum (caution) Can be generalized…

Image Segmentation pl: Probability of choosing segment l at random
(a priori) p(x|ql): Conditional density of feature vector x, given that it comes from segment l, l=1,…g Model: p(x|ql) is Gaussian, ql=(ml,Sl) The total density for the feature vector of any pixel drawn at random… Segment 1, q1 Segment 2, q2 Segment 3, q3 Segment 4, q4 This is known as a Mixture Model

Mixture Model: Generative
To produce a pixel (feature vector) Pick an image segment l with prior probability pl Draw a sample from p(x|ql) Density in x space is a set of g Gaussian blobs, one per segment We want to determine The parameters of each blob (the m and S values) The mixing weights (the p values) A mapping of pixels to components (the segmentation)

Package all these things into a parameter vector:
mixing weights blob parameters The mixture model becomes: With each component a multivariate Gaussian:

The Chicken and the Egg If we knew which pixel belonged to which component, Q would be straightforward: Use Max Likelihood estimates for each ql Fraction of image in each component gives al If we knew Q, then For each pixel, assign it to its most likely blob Unfortunately, we know neither That’s where Expectation-Maximization (EM) comes in; iterate guesses until convergence

Formal Statement of Missing Data Problems
X Complete data space f Y Incomplete data space Measurements at each pixel and Set of variables matching pixels to mixture components Measurements at each token Mapping of tokens to lines Measurements at each pixel Measurements at each token

Missing, Formally Mixing weights and Parameters (mean, covariance) of each mixture component (parameters of each line) U Parameter space We want to obtain a maximum-likelihood estimate of these parameters given incomplete data. If we had complete data, the we could use the joint density function for the complete data space, pc(x;u). Complete data log-likelihood:

OK. We maximize this to estimate each segment’s parameters (image segmentation) or the mixing weights and parameters of the lines, given the mapping of the tokens to lines (for the line fitting example). Problem. We don’t have complete data. The density for the incomplete space is the marginal density of the complete space where we’ve integrated out the parameters we don’t know.

This is a pain in the neck… We don’t know which of the many possible x values that could correspond to the y values we observe are correct. We’ve taken a projection (of some sort), and we cannot uniquely reconstruct the full joint density. So we have to average over all those possibilities to make our best guess. But all is not lost… We have the following strategy: 1. Obtain some estimate of the missing data using a guess at the parameters. 2. Form a maximum likelihood estimate of the free parameters using the estimate of the missing data. 3. Iterate to (hopefully) convergence.

Strategy by Example Image segmentation Tokens and lines
Obtain an estimate of the component from which each pixel comes using an estimate of the ql Update the ql and the mixing weights using this estimate Tokens and lines Obtain an estimate of the correspondence between tokens and lines, using a guess at the line parameters Revise the estimate of the line parameters using the estimated correspondences

Expectation-Maximization For Mixture Models
Assume the complete log-likelihood is linear in the missing variables. (Common) Mixture model: Missing data indicate the mixture component from which a data item is drawn. Represent this by associating with each data point a bit vector z of g elements (one per component in the mix).

Mixture components, one Gaussian per column l Data points, one per row. That is, one row per observation, each row a z vector. j 1 if pixel (token) j produced by Gaussian mixture component l. Expectation: Probability of that event. g n

So our complete information can be written as:
Write the mixture model as (line example): Complete data log-likelihood is: This is linear in the missing variables. Good news! How did we ensure that that would happen? We will think of the entries in z as probabilities, expectations.

EM: The Key Idea Obtain working values for the missing data, and so for x by substituting the expectation for each missing value. That is, fix the parameters, then compute each expectation E[zjl], given yj and the parameter values. Plug E[zjl] into the complete data log-likelihood and find parameters maxing that. E[zjl] has probably changed, so repeat.

More Formally Given us we form us+1 by:
1. E-Step: Compute expected value for complete data using the incomplete data and the current parameter estimates. We know the expected value of yj (the means of the current Gaussian guesses) and only need expected value of zj for each j. Denote these values as Superscript indicates that the expectation depends on current parameter values at step s. 2. M-Step: Maximize the complete data log-likelihood with respect to u using the expectation from the E-step.

Image Segmentation In Practice (Warning: Your text is a typo minefield)
Set up an n by g array of indicators I (Each row like z vector) E-Step: The j, l element of I is 1 if pixel j comes from blob l E(Ijl)= Prob (pixel j comes from Gaussian blob l) Note: This is no longer a binary value! ~ b/(a+b) a b x

Practice… M-Step: Now form a maximum-likelihood estimate of Qs+1
… average value in each column … weighted average feature vector for each column … weighted average covariance matrix for each column

When it Converges... Can make a maximum a posteriori (MAP) decision by assigning each pixel to the Gaussian for which it has the highest E(Ijl). Can also keep the probabilities and work with them in, for instance, a probabilistic relaxation framework. (coming attractions)