Download presentation
Presentation is loading. Please wait.
Published byStewart Wood Modified over 7 years ago
1
Model fitting ECE 847: Digital Image Processing Stan Birchfield
Clemson University
2
Fitting Choose a parametric object/some objects to represent a set of tokens Most interesting case is when criterion is not local cannot tell whether a set of points lies on a line by looking only at each point and the next. Three main questions: what object represents this set of tokens best? which of several objects gets which token? how many objects are there? (you could read line for object here, or circle, or ellipse or...) D. Forsyth,
3
Line fitting Least squares: Alternative:
Represent line as y=mx+b Stack equations yi=mxi+b for all i=1,...,N with N points Rewrite in matrix notation: Az=d where z=(m,b) are the unknown parameters, and A and d collect the known (xi,yi) coordinates Solve Az=d using least squares Result is same as z=(ATA)-1ATd, but in practice using Gaussian elimination (must faster than computing inverse) Result minimizes ||Az-d||, which is vertical distance Alternative: Represent line as ax+by+c=0 Stack equations, write in matrix notation: Au=0 where u=(a,b,c) are unknown parameters This is homogeneous equation, so solve using SVD right singular vector associated with smallest singular value gives result of u that minimizes ||Au|| subject to ||u||=1 Result minimizes perpendicular distance to line Either way, it is best to first shift origin to centroid of points (normalization)
4
Other curves Many 2D curves that can be represented using linear equations (in the coeff of the curve) ax+by+c=0 Conics: x’Ax=0 includes parabola, hyperbola, ellipses Same procedure can be used for any of these
5
Fitting and the Hough Transform
Purports to answer all three questions We explain for lines One representation: a line is the set of points (x, y) such that: (cos X + (sin Y = r Different choices of , r>0 give different lines For any token (x, y) there is a one parameter family of lines through this point, given by (cos X + (sin Y = r Each point gets to vote for each line in the family; if there is a line that has lots of votes, that should be the line passing through the points D. Forsyth,
6
Brightest point = 20 votes
Tokens r: 0 to 1.55 Theta = 45º = rad r = (1√2) / 2 = 0.707 Figure 15.1, top half. Note that most points in the vote array are very dark, because they get only one vote. Theta: 0 to 3.14 (rad) Brightest point = 20 votes D. Forsyth,
7
Mechanics of the Hough transform
Construct an array representing , r For each point, render the curve (, r) into this array, adding one at each cell Difficulties how big should the cells be? (too big, and we cannot distinguish between quite different lines; too small, and noise causes lines to be missed) How many lines? count the peaks in the Hough array Who belongs to which line? tag the votes Hardly ever satisfactory in practice, because problems with noise and cell size defeat it D. Forsyth,
8
Hough algorithm initialize accumulator A(r,q) to zero for all r,q
for each (x,y) if |I(x,y)| > threshold, for each 0 ≤ q < 2p, compute r = (cos X + (sin Y A(r,q) = A(r,q) + 1 find peaks in A Note: Be sure to translate origin to center of image for best results
9
Hough transform Notice that (q,r) and (q+p,-r) are the same line
image space line space Note symmetry: flip vertical then slide by p Notice that (q,r) and (q+p,-r) are the same line That’s why we get two peaks Solution: Let 0 <= q < p
10
Brightest point = 6 votes
Tokens This is 15.1 lower half q Brightest point = 6 votes D. Forsyth,
11
15.2; main point is that lots of noise can lead to large peaks in the array
D. Forsyth,
12
Noise Lowers the Peaks This is the number of votes that the real line of 20 points gets with increasing noise (figure15.3) D. Forsyth,
13
Noise Increases the Votes in Spurious Accumulator Elements
Figure 15.4; as the noise increases in a picture without a line, the number of points in the max cell goes up, too D. Forsyth,
14
Optimizations to the Hough Transform
Noise: If the orientation of tokens (pixels) is known, only accumulator elements for lines with that general orientation are voted on. (Most edge detectors give orientation information.) Speed: The accumulator array can be coarse, then repeated in areas of interest at a finer scale. F. Dellaert,
15
Real World Example Original Edge Detection Found Lines Parameter Space
F. Dellaert,
16
Circle Example The Hough transform can be used to fit points to any object that can be parameterized. Most common are circles, ellipses. With no orientation, each token (point) votes for all possible circles. With orientation, each token can vote for a smaller number of circles. F. Dellaert,
17
Real World Circle Examples
Crosshair indicates results of Hough transform, bounding box found via motion differencing. F. Dellaert,
18
Finding Coins Original Edges (note noise)
F. Dellaert,
19
Finding Coins (Continued)
Penny Quarters F. Dellaert,
20
Finding Coins (Continued)
Note that because the quarters and penny are different sizes, a different Hough transform (with separate accumulators) was used for each circle size. Coin finding sample images from: Vivik Kwatra F. Dellaert,
21
Fitting other objects The Hough transform is closely related to template matching. The Hough transform can be used to fit points to any object that can be parameterized. Objects of arbitrary shape can be parameterized by building an R-Table. (Assumes orientation information for each token is available.) More detail in the book. I typically draw on the whiteboard and wave my hands for this particular topic, rather than work from slides. R and beta value(s) are obtained from the R-Table, based upon omega (orientation) F. Dellaert,
22
Generalized Hough transform
Each edge point with gradient orientation indicates location of reference point R-Table: reference point gradient orientation tangent vector radius Ballard and Brown, Computer Vision, 1982, p. 129
23
Generalized Hough transform
Ballard and Brown, Computer Vision, 1982, p. 129
24
Conclusion Finding lines and other parameterized objects is an important task for computer vision. The (generalized) Hough transform can detect arbitrary shapes from (edge detected) tokens. Success rate depends directly upon the noise in the edge image. Downsides: Can be slow, especially for objects in arbitrary scales and orientations (extra parameters increase accumulator space exponentially). F. Dellaert,
25
Tensor voting Another voting technique
26
Line fitting Line fitting can be max. likelihood - but choice of
This is figure 15.5; I didn’t have the energy to write the algebra around that figure out on a slide, but the point is well worth making. Line fitting can be max. likelihood - but choice of model is important D. Forsyth,
27
Who came from which line?
Assume we know how many lines there are - but which lines are they? easy, if we know who came from which line Three strategies Incremental line fitting K-means (MacQueen 1967) Probabilistic (later!) D. Forsyth,
28
Incremental line fitting is a really useful idea; it’s the easiest way to fit a set of lines to a curve, and is often quite useful. D. Forsyth,
29
Douglas-Peucker algorithm
30
that pass through one point, etc.
You should point out that a little additional cleverness will be required to avoid having lines that pass through one point, etc. D. Forsyth,
31
Robustness As we have seen, squared error can be a source of bias in the presence of noise points One fix is EM - we’ll do this shortly Another is an M-estimator Square nearby, threshold far away A third is RANSAC Search for good points D. Forsyth,
32
Line fit to set of points
Least squares fit to the red points D. Forsyth,
33
... with one outlier Also a least squares fit. The problem is the single point on the right; the error for that point is so large that it drags the line away from the other points (the total error is lower if these points are some way from the line, because then the point on the line is not so dramatically far away). D. Forsyth,
34
... with a different outlier
Same problem as above D. Forsyth,
35
Zoom of previous – clearly a bad fit
Detail of the previous slide - the line is actually quite far from the points. (All this is fig 15.7) D. Forsyth,
36
M-estimators r(x;s)=x2/(s2+x2)
Figure The issue here is that one wants to replace (distance)^2 with something that looks like distance^2 for small distances, and is about constant for large distances. We use d^2/(d^2+s^2) for some parameter s, for different values of which the curve is plotted. r(x;s)=x2/(s2+x2) D. Forsyth,
37
s is just right: noise is ignored
Fit to earlier data with an appropriate choice of s D. Forsyth,
38
Too small: all data is ignored
Here the parameter is too small, and the error for every point looks like distance^2 D. Forsyth,
39
Too large: noise influences outcome
Here the parameter is too large, meaning that the error value is about constant for every point, and there is very little relationship between the line and the points. D. Forsyth,
40
RANSAC Choose a small subset uniformly at random Fit to that
Anything that is close to result is signal; all others are noise Refit Do this many times and choose the best Issues How many times? Often enough that we are likely to have a good line How big a subset? Smallest possible What does close mean? Depends on the problem What is a good line? One where the number of nearby points is so big it is unlikely to be all outliers This algorithm contains concepts that are hugely useful; in my experience, everyone with some experience of applied problems who doesn’t yet know RANSAC is almost immediately able to use it to simplify some problem they’ve seen. D. Forsyth,
41
D. Forsyth, http://luthuli.cs.uiuc.edu/~daf/book/bookpages/slides.html
42
Fitting curves other than lines
In principle, an easy generalization The probability of obtaining a point, given a curve, is given by a negative exponential of distance squared In practice, rather hard It is generally difficult to compute the distance between a point and a curve More detail in the book. I typically draw on the whiteboard and wave my hands for this particular topic, rather than work from slides. D. Forsyth,
43
K-means Recall line fitting
Now suppose that you have more than one line But you do not know which points belong to which line
44
K-Means Choose a fixed number of clusters (K)
Choose cluster centers and point-cluster allocations to minimize error can’t do this by search, because there are too many possible allocations. Algorithm fix cluster centers; allocate points to closest cluster fix allocation; compute best cluster centers x could be any set of features for which we can compute a distance (careful about scaling) * From Marc Pollefeys COMP
45
K-Means * From Marc Pollefeys COMP
46
K-Means compute mean of data points assign data points to clusters
S. Thrun,
47
Image Segmentation by K-Means
Select a value of K Select a feature vector for every pixel (color, texture, position, or combination of these etc.) Define a similarity measure between feature vectors (Usually Euclidean Distance). Apply K-Means Algorithm. Apply Connected Components Algorithm (to enforce spatial continuity). Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP
48
Example Image Clusters on intensity Clusters on color
S. Thrun,
49
Idea Data generated from mixture of Gaussians
Latent (hidden) variables: Correspondence between Data Items and Gaussians S. Thrun,
50
Expectation-Maximization (Generalized K-Means)
Notice: Given the mixture model, it’s easy to calculate the correspondence Given the correspondence it’s easy to estimate the mixture models K-Means involves Model (hypothesis space): Mixture of N Gaussians Latent variables: Correspondence of data and Gaussians Replace hard assignments with soft assignments EM EM is guaranteed to converge (EM steps do not decrease likelihood)
51
Learning a Gaussian Mixture (with known covariance)
E-Step M-Step S. Thrun,
52
Generalized K-Means (EM)
compute weighted mean of data points compute weights of assigning data points to clusters S. Thrun,
53
EM Clustering: Results
54
Application: Clustering Flow
ML correspondence Clustered Flow
55
Missing variable problems
In many vision problems, if some variables were known the maximum likelihood inference problem would be easy fitting; if we knew which line each token came from, it would be easy to determine line parameters segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix etc. This sort of thing happens in statistics, too D. Forsyth,
56
Missing variable problems
Strategy estimate appropriate values for the missing variables plug these in, now estimate parameters re-estimate appropriate values for missing variables, continue e.g. guess which line gets which point now fit the lines now reallocate points to lines, using our knowledge of the lines now refit, etc. We’ve seen this line of thought before (k means) D. Forsyth,
57
Missing variables - strategy
We have a problem with parameters, missing variables This suggests: Iterate until convergence replace missing variable with expected values, given fixed values of parameters fix missing variables, choose parameters to maximise likelihood given fixed values of missing variables e.g., iterate till convergence allocate each point to a line with a weight, which is the probability of the point given the line refit lines to the weighted set of points Converges to local extremum Somewhat more general form is available D. Forsyth,
58
EM E-step: M-step: Demo with fixed sigma=20 segment01.m
calculate errors calculate probabilities M-step: re-calculate RGB clusters Demo with fixed sigma=20 segment01.m F. Dellaert,
59
MATLAB code essentials
% init models sigma=20; m(:,g)=128+50*randn(3,1); pi(g)=1/nrClusters; % E-step % calculate errors E{g}=zeros(h,w); e=(I(:,:,c)-m(c,g))/sigma; E{g}=E{g}+e.*e; % unnormalized probabilities q{g}=pi(g)*exp(-0.5*E{g}); % normalize probabilities p{g}=q{g}./sumq; % M-step P=p{g}; R=P.*I(:,:,1); G=P.*I(:,:,2); B=P.*I(:,:,3); pi(g)=sum(P(:)); m(:,g) = [sum(R(:)); sum(G(:)); sum(B(:))]/pi(g); pi=pi/sum(pi) F. Dellaert,
60
Expectation-Maximization
See Dellaert’s TR online We have parameters and data U Also: hidden “nuisance” variables J We want to find optimal F. Dellaert,
61
Example: Mixture 2 Components Gaussian
F. Dellaert,
62
Posterior in Parameter Space
2D ! F. Dellaert,
63
EM Successive lower bounds
F. Dellaert,
64
Chapter 3, p. 33 from Duda, Hart, Stork, Pattern Classification, 2nd ed., 2000
65
Line Fitting Parameters q=(f, c) c = distance to origin c
F. Dellaert,
66
Lines and robustness We have one line, and n points
Some come from the line, some from “noise” This is a mixture model: We wish to determine line parameters p(comes from line) P(comes from noise)=(1-P(comes from line)) D. Forsyth,
67
Estimating the mixture model
Introduce a set of hidden variables, d, one for each point. They are one when the point is on the line, and zero when off. If these are known, the negative log-likelihood becomes (the line’s parameters are f, c): Here K is a normalising constant, kn is the noise intensity (we’ll choose this later). D. Forsyth,
68
Substituting for delta
We shall substitute the expected value of d, for a given q recall q=(f, c, l) E(d_i)=1. P(d_i=1|q)+0.... Notice that if kn is small and positive, then if distance is small, this value is close to 1 and if it is large, close to zero D. Forsyth,
69
Algorithm for line fitting
Obtain some start point Now compute d’s using formula above Now compute maximum likelihood estimate of f, c come from fitting to weighted points l comes by counting Iterate to convergence D. Forsyth,
70
D. Forsyth, http://luthuli.cs.uiuc.edu/~daf/book/bookpages/slides.html
71
The expected values of the deltas at the maximum
(notice the one value close to zero). D. Forsyth,
72
Closeup of the fit D. Forsyth,
73
Choosing parameters What about the noise parameter, and the sigma for the line? several methods from first principles knowledge of the problem (seldom really possible) play around with a few examples and choose (usually quite effective, as precise choice doesn’t matter much) notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie usually biases the fit, by pushing outliers into the line rule of thumb; its better to fit to the better fitting points, within reason; if this is hard to do, then the model could be a problem D. Forsyth,
74
Other examples Fitting multiple lines Segmentation
a segment is a gaussian that emits feature vectors (which could contain colour; or colour and position; or colour, texture and position). segment parameters are mean and (perhaps) covariance if we knew which segment each point belonged to, estimating these parameters would be easy rest is on same lines as fitting line Fitting multiple lines rather like fitting one line, except there are more hidden variables easiest is to encode as an array of hidden variables, which represent a table with a one where the i’th point comes from the j’th line, zeros otherwise rest is on same lines as above D. Forsyth,
75
Issues with EM Local maxima Starting
can be a serious nuisance in some problems no guarantee that we have reached the “right” maximum Starting k means to cluster the points is often a good idea D. Forsyth,
76
Local maximum Things can go wrong with EM, too -
D. Forsyth,
77
which is an excellent fit to some points
D. Forsyth,
78
and the deltas for this maximum
D. Forsyth,
79
A dataset that is well fitted by four lines
D. Forsyth,
80
Result of EM fitting, with one line (or at least,
one available local maximum). D. Forsyth,
81
Result of EM fitting, with two lines (or at least,
one available local maximum). D. Forsyth,
82
Seven lines can produce a rather logical answer
D. Forsyth,
83
Segmentation with EM This is figure story in the caption Figure from “Color and Texture Based Image Segmentation Using EM and Its Application to Content Based Image Retrieval”,S.J. Belongie et al., Proc. Int. Conf. Computer Vision, 1998, c1998, IEEE D. Forsyth,
84
Motion segmentation with EM
Model image pair (or video sequence) as consisting of regions of parametric motion affine motion is popular Now we need to determine which pixels belong to which region estimate parameters Likelihood assume Straightforward missing variable problem, rest is calculation D. Forsyth,
85
Three frames from the MPEG “flower garden” sequence
Story in figure 16.5 Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE D. Forsyth,
86
Grey level shows region no. with highest probability
Figure 16.6 Segments and motion fields associated with them Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE D. Forsyth,
87
If we use multiple frames to estimate the appearance
of a segment, we can fill in occlusions; so we can re-render the sequence with some segments removed. Story in fig 16.7 Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE D. Forsyth,
88
Some generalities Many, but not all problems that can be attacked with EM can also be attacked with RANSAC need to be able to get a parameter estimate with a manageably small number of random choices. RANSAC is usually better Didn’t present in the most general form in the general form, the likelihood may not be a linear function of the missing variables in this case, one takes an expectation of the likelihood, rather than substituting expected values of missing variables Issue doesn’t seem to arise in vision applications. D. Forsyth,
89
Model Selection We wish to choose a model to fit to data Issue
e.g. is it a line or a circle? e.g is this a perspective or orthographic camera? e.g. is there an aeroplane there or is it noise? Issue In general, models with more parameters will fit a dataset better, but are poorer at prediction This means we can’t simply look at the negative log-likelihood (or fitting error) D. Forsyth,
90
Top is not necessarily a better fit than bottom
(actually, almost always worse) A general principle here; it’s usually a good idea to accept some fitting error, because your fitted object may then reflect the behaviour of data points that you haven’t seen rather better. Equivalently, if you fit a “simple” object to a lot of points, your estimate of that object’s parameters is accurate, even though it may not represent those points as well as a “complex” object does; in the case of the complex object, your estimate of the object’s parameters may be very unreliable, so new data points are poorly represented. This is sometimes thought of as a tradeoff between bias --- the model doesn’t represent the data perfectly --- and variance --- but I can estimate its parameters accurately, meaning that it will handle future data points well. D. Forsyth,
91
We expect a picture that looks like this; more flexible things fit better (but usually predict worse). D. Forsyth,
92
We can discount the fitting error with some term in the number
of parameters in the model. D. Forsyth,
93
Discounts AIC (an information criterion)
choose model with smallest value of p is the number of parameters BIC (Bayes information criterion) choose model with smallest value of N is the number of data points Minimum description length same criterion as BIC, but derived in a completely different way D. Forsyth,
94
Cross-validation Split data set into two pieces, fit to one, and compute negative log-likelihood on the other Average over multiple different splits Choose the model with the smallest value of this average The difference in averages for two different models is an estimate of the difference in KL divergence of the models from the source of the data D. Forsyth,
95
Model averaging Very often, it is smarter to use multiple models for prediction than just one e.g. motion capture data there are a small number of schemes that are used to put markers on the body given we know the scheme S and the measurements D, we can estimate the configuration of the body X We want If it is obvious what the scheme is from the data, then averaging makes little difference If it isn’t, then not averaging underestimates the variance of X --- we think we have a more precise estimate than we do. D. Forsyth,
96
Mosiacking Algorithm Choose a central picture which overlaps with the highest percentage of pictures in the picture set Find correspondence points (~30 pts per pair of pictures) Because the final mosaic will be much larger than each picture, each picture must be placed onto a larger canvas in order to be shifted with the homography matrix ( in addition, the correspondence points must be offset to correspond with the new origin of the canvas and not the original picture) Matt Pepper 96
97
Mosaicking (cont) Algorithm (cont)
Create homography mappings from each picture to the central picture, or to a picture that is touching the center For the outer pictures with no overlap with the central picture, homographies must be multiplied to put the picture into same frame as the center (think rotation matrices in ECE 455/655, H6 = H65 * H53) After each picture has been shifted with the Homography matrix, stitch it onto a final canvas Helpful Blepo functions: MatrixMultiply, Transpose, InitWarpHomography, Warp Matt Pepper 97
98
Two ways to warp Forward warp: “For each pixel in image, where does it go?” for (x,y) in I (x’,y’) = H(x,y) if (x’,y’) in bounds, canvas(x’,y’) = I(x,y) leads to gaps in canvas Backward warp: “For each pixel in canvas, where does it come from?” for (x’,y’) in canvas (x,y) = H-1(x’-x0,y’-y0) if (x,y) in bounds, I(x,y) = canvas(x’,y’) correctly fills in every pixel of canvas subtract offset = top-left corner of reference in canvas
99
Feathering d2 d1 I1 I2 Iout(x,y) = d1 / (d1+d2) I1(x,y) + d2 / (d1+d2) I2(x,y) Distances d1 and d2 can be found by chamfering
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.