Presentation on theme: "Segmentation and Fitting Using Probabilistic Methods"— Presentation transcript:
1 Segmentation and Fitting Using Probabilistic Methods Or, How Expectation-Maximization Can Cure Your Computer Vision System of Almost AnythingWell… maybe...
2 Departure PointUp to now, most of what we’ve done in the grouping, segmentation arena has been local.Now we want to model things globally, and in probabilistic terms.Explain a large collection of tokens with a few parameters. (Hmmm…. Like the Hough?)
3 Missing Data Problems, Fitting, Segmentation Often, if some parameters were known, the maximum likelihood problem would be easyFitting: If you know which line each token comes from, getting the parameters is easySegmentation: If you the segment each pixel comes from, the segment’s parameters are easily determinedFundamental Matrix: If you know the correspondences….
4 Missing Data Problem A missing data problem is one where… Some terms in a data vector are missing in some instances, but present in othersAn inference problem can be made simpler by rewriting it using some variables whose values are unknownAlgorithm Concept: Take an expectation over the missing data
5 Missing Data Problems Strategy For example Estimate values for the missing dataPlug these in, now estimate parametersRe-estimate values for missing dataContinue to convergenceFor exampleGuess a mapping of points to linesFit each line to its pointsReallocate points to the fitted linesLoop to convergenceReminiscent of K-means, is it not?
6 Refining the StrategyThe problem has parameters to be estimated, and missing variables (data)Iterate to convergence:Replace missing data with expected values, given fixed parameter valuesFix the missing data, do a maximium likelihood estimate of the parameters, given that data
7 Refining the ExampleAllocate each point to a line with a weight equal to the probability of the point, given the line’s parametersRefit the lines to the weighted set of pointsConverges to local extremum (caution)Can be generalized…
8 Image Segmentation pl: Probability of choosing segment l at random (a priori)p(x|ql): Conditional density of feature vector x,given that it comes from segment l, l=1,…gModel: p(x|ql) is Gaussian, ql=(ml,Sl)The total density for the feature vector of any pixel drawn at random…Segment 1, q1Segment 2, q2Segment 3, q3Segment 4, q4This is known as a Mixture Model
9 Mixture Model: Generative To produce a pixel (feature vector)Pick an image segment l with prior probability plDraw a sample from p(x|ql)Density in x space is a set of g Gaussian blobs, one per segmentWe want to determineThe parameters of each blob (the m and S values)The mixing weights (the p values)A mapping of pixels to components (the segmentation)
10 Package all these things into a parameter vector: mixing weights blob parametersThe mixture model becomes:With each component a multivariate Gaussian:
11 The Chicken and the EggIf we knew which pixel belonged to which component, Q would be straightforward:Use Max Likelihood estimates for each qlFraction of image in each component gives alIf we knew Q, thenFor each pixel, assign it to its most likely blobUnfortunately, we know neitherThat’s where Expectation-Maximization (EM) comes in; iterate guesses until convergence
12 Formal Statement of Missing Data Problems XComplete data spacefYIncomplete data spaceMeasurements at each pixelandSet of variables matching pixels to mixture componentsMeasurements at each tokenMapping of tokens to linesMeasurements at each pixelMeasurements at each token
13 Missing, FormallyMixing weights andParameters (mean, covariance) of each mixture component (parameters of each line)UParameter spaceWe want to obtain a maximum-likelihood estimate of these parameters given incomplete data. If we had complete data, the we could use the joint density function for the complete data space, pc(x;u).Complete data log-likelihood:
14 OK. We maximize this to estimate each segment’s parameters (image segmentation) or the mixing weights and parameters of the lines, given the mapping of the tokens to lines (for the line fitting example).Problem. We don’t have complete data. The density for the incomplete space is the marginal density of the complete space where we’ve integrated out the parameters we don’t know.
15 This is a pain in the neck… We don’t know which of the many possible x values that could correspond to the y values we observe are correct. We’ve taken a projection (of some sort), and we cannot uniquely reconstruct the full joint density. So we have to average over all those possibilities to make our best guess.But all is not lost… We have the following strategy:1. Obtain some estimate of the missing data using a guess at the parameters.2. Form a maximum likelihood estimate of the free parameters using the estimate of the missing data.3. Iterate to (hopefully) convergence.
16 Strategy by Example Image segmentation Tokens and lines Obtain an estimate of the component from which each pixel comes using an estimate of the qlUpdate the ql and the mixing weights using this estimateTokens and linesObtain an estimate of the correspondence between tokens and lines, using a guess at the line parametersRevise the estimate of the line parameters using the estimated correspondences
17 Expectation-Maximization For Mixture Models Assume the complete log-likelihood is linear in the missing variables. (Common)Mixture model: Missing data indicate the mixture component from which a data item is drawn.Represent this by associating with each data point a bit vector z of g elements (one per component in the mix).
18 About the z Vectors (matrix) Mixture components, one Gaussian per columnlData points, one per row.That is, one row per observation, each row a z vector.j1 if pixel (token) j produced by Gaussian mixture component l.Expectation: Probability of that event.gn
19 So our complete information can be written as: Write the mixture model as (line example):Complete data log-likelihood is:This is linear in the missing variables. Good news!How did we ensure that that would happen?We will think of the entries in z as probabilities, expectations.
20 EM: The Key IdeaObtain working values for the missing data, and so for x by substituting the expectation for each missing value.That is, fix the parameters, then compute each expectation E[zjl], given yj and the parameter values.Plug E[zjl] into the complete data log-likelihood and find parameters maxing that.E[zjl] has probably changed, so repeat.
21 More Formally Given us we form us+1 by: 1. E-Step: Compute expected value for complete data using the incomplete data and the current parameter estimates. We know the expected value of yj (the means of the current Gaussian guesses) and only need expected value of zj for each j. Denote these values as Superscript indicates that the expectation depends on current parameter values at step s.2. M-Step: Maximize the complete data log-likelihood with respect to u using the expectation from the E-step.
22 Image Segmentation In Practice (Warning: Your text is a typo minefield) Set up an n by g array of indicators I (Each row like z vector)E-Step: The j, l element of I is 1 if pixel j comes from blob lE(Ijl)= Prob (pixel j comes from Gaussian blob l)Note: This is no longer a binary value!~ b/(a+b)abx
23 Practice… M-Step: Now form a maximum-likelihood estimate of Qs+1 … average value in each column… weighted average feature vector for each column… weighted average covariance matrix for each column
24 When it Converges...Can make a maximum a posteriori (MAP) decision by assigning each pixel to the Gaussian for which it has the highest E(Ijl).Can also keep the probabilities and work with them in, for instance, a probabilistic relaxation framework. (coming attractions)