Multiple View Geometry Unified

Multiple View Geometry Unified
by Rene Vidal Yi Ma (UIUC), Kun Huang (UIUC) and Jana Kosecka (GMU) Electrical Engineering & Computer Sciences University of California at Berkeley This talk is about geometry of multiple images. It is a work with my student Kun Huang at UIUC, a student Rene Vidal at UC Berkeley, and professor Jana Kosecka at George Mason University. The title seems to be a little bit big and I may have set the goal too high. But I will try to convince you by the end of this talk that what we are going to propose is worth a try.

FORMULATION: camera model and multiple images
ALGEBRA: multilinear constraints v.s. rank deficiency condition GEOMETRY: geometric interpretation of rank deficiency condition ALGORITHM: matching, transfer, motion and structure recovery Here is the outline of my talk. For people who are not necessarily experts in computer vision, I will first talk a little bit about what is multiple view geomtry: its history, previous work and basic mathematical models for camera and image. Then I am going to show how to study multiple view geometry using *algebraic* techniques. The reason for doing this first is because it makes a good connection with existing results on this subject. I will compare two approaches, the existing multilinear constraints v.s. the so called rank deficiency condition that we propose in this talk. Then I will give a clear geometric interpretation for all the algebraic constraints and conditions. In particular, the geometry of the newly proposed rank deficiency condition is the most interesting. Since rank deficiency is simply a linear algebraic condition, I will show you how to use it to develop useful algorithms for all kinds of purposes that people used to do by multilinear constraints and try to convince you that the rank deficiency condition makes the life much easier. The algorithms include, matching multiple images, transferring images to a new view and most importantly, recovering motion and structure from multiple images. Finally, I will show you how rank deficiency condition can be used to unify the study of multiple view geometry for lines, curves and even surfaces. GENERALIZATION: line features, 3-D curves and surfaces

FORMULATION - Fundamental Geometric Problem
Input: Corresponding images (of point or line) in multiple images. Output: Camera motion, camera calibration, object 3D structure. Here is a picture illustrating the fundamental problem that we are interested in multiple view geometry. The scenario is that we are given multiple pictures or images of some 3D object and after establishing correspondence of certain geometric features such as a point, line or curve, we then try to use such information to recover the relative locations of the camera where these images are taken, as well as recover the 3D structure of the object.

FORMULATION – Literature Review
Multiple view geometry theory Two views: Longuet-Higgins’81, Huang & Faugeras’89, … Three views: Spetsakis & Aloimonos’90, Shashua’94, Hartley’94, … Four views: Triggs’95, Shashua’00, … Multiple views: Heyden & Astrom’97’98, Ma et. al.’99, … Multiple view geometry algorithms Euclidean: Maybank’93, Weng, Ahuja & Huang’93, … Affine: Quan & Kanade’96, … Projective: Triggs’96, … Orthographic: Tomasi & Kanade’92, … Recent books on multiple view geometry 1. Multiple view geometry in computer vision, Hartley & Zisserman’00. 2. Geometry of multiple images, Faugeras & Luong’01. It turns out this is a very difficult problem in its full generality and many people have contributed to it in the past twenty years or so. The theoretical work was gradually developed for two, three or four views, and in recent years, results on multiple views become attractive. On the other hand, people have also developed numerous algorithms to solve various versions of this problem. Depending on the type of camera model assumed, these algorithms can be classified as Euclidean, affine, projective or orthographic. Two comprehensive books on this subject were recently published. Such a vast literature makes us wonder, what still can we do to make things better?

FORMULATION – An Anatomy of Cases (State of the Art)
surface curve line point theory algorithm practice Euclidean affine projective 2 views 3 views 4 views m views algebra geometry optimization As state of the art, multiple view geometry consists of many cases: first, different types of 3D objects are studied separately; secondly, depending on whether the camera is calibrated or not, different algorithms are developed for recovering the solution up to a Euclidean, affine or projective transformation; thirdly, different techniques are used to understand different aspects of multiple view geometry, including various algebraic, geometric and optimization methods; finally, existing theory very often treats 2, 3, or 4 views differently; this unavoidably causes a separation between theory and algorithm: while theory is developed for these different cases, but in practice, we want the algorithm to utilize the data from all images simultaneously. There is no consensus on how to choose a particular set of pair-wise, triple-wise or quadruple-wise views for recovery. Even if such a choice is made, we are still facing the problem how to cascade the so-called bifocal, trifocal or quadrifocal tensors in a systematic way.

FORMULATION – A Need for Unification
Euclidean surface curve line point 2 views 3 views 4 views m views theory algorithm practice affine projective algebra geometry optimization rank deficiency What we really want now is something that can bring everything together! In this talk, I am going to convince you that that something is exactly what we called the rank deficiency condition. It works like charm and indeed unifies all these different cases very nicely, which used to be thought as a mission impossible.

FORMULATION – Pinhole Camera Model
Homogeneous coordinates of a 3-D point Homogeneous coordinates of its 2-D image Projection of a 3-D point to an image plane Now let me quickly go through the basic mathematical model for a camera system. Here is the notation. We will use a four dimensional vector X for the homogeneous coordinate of a 3-D point p, its image on a pre-specified plane will be described also in homogeneous coordinate as a three dimensional vector x. If everything is Euclidean, then W and z can be chosen to be 1. We use a 3x4 matrix Pi to denote the transformation from the world frame to the camera frame. R may stand for rotation, T for translation. Then the image x and the world coordinate X of a point is related through the equation, where lambda is a scale associated to the depth of the 3D point relative to the camera center o. But in general the matrix Pi can be any 3x4 matrix, because the camera may add some unknown linear transformation on the image plane. Usually it is denoted by a 3x 3 matrix A(t).

FORMULATION – Hat Operator
We will always use column vectors, except for one case which I will mention later on. Given a three dimensional vector u, we use u-hat to represent a 3x3 skew symmetric matrix associated to it. In the literature, people also use u-product for the same thing. Using this notation, u-hat multiplying a vector v is then equal to their cross product. In particular, u crosses with u itself gets zero.

FORMULATION – Multiple View Structure From Motion
Given corresponding images of points: recover everything else from equations: “incidental condition” Now mathematically we can formulate the problem at hand as following: given m corresponding images of n 3D points. That is, for each of these n points, say p, its images with respect to m distinct camera frames are known. These images then satisfy the following equations, where Pi_i is a 3x4 matrix representing the transformation and projection from the world frame to the ith image. Lambda_i is the depth of the point p relative to the center of the ith frame. . . .

ALGEBRA: multilinear constraints v.s. rank deficiency condition GEOMETRY: geometric interpretation of rank deficiency condition ALGORITHM: matching, transfer, motion and structure recovery Let’s first see how to study this problem algebraically and how the rank deficiency condition can be a better algebraic tool than multilinear constraints. GENERALIZATION: line features, 3-D curves and surfaces

ALGEBRA – Multilinear Constraints
For images of the same 3-D point : Let us first see how multiple view geometry is traditionally studied. In particular, what is the nature of multilinear constraints among multiple images. For given m images of a 3-D point p, we can rewrite the equations that these images must satisfy in a single matrix form as follows. Speculate this equation for a while, we may observe that both lambda and X are associated to the 3-D location of the point p, relative to different coordinate frames. To eliminate these 3-D structural parameters, we can form the following matrix N. From the above equation, it is straightforward to see that this matrix is rank deficient. Furthermore, if this matrix has an exact rank m+3, we can recover \lambda and X as its only kernel. Hence we can reduce our study to examine this matrix N. Notice that given a camera configuration specified by the matrices Pi’s, not any set of vectors x1, …, xm would make this matrix rank deficient. That means that images from the same 3-D point should satisfy certain relationship. One way to describe such relationship is by writing down the determinants of all m+4 by m+4 submatrices of N and these determinants must be zero. Turns out these equations can be reduced to equations involving images from 2, 3, or 4 views at a time, which leads to the traditional multilinear constraints for 2, 3, 4 views… is rank deficient (leading to the conventional approach) Multilinear constraints among 2, 3, 4 views

ALGEBRA – Rank Deficiency of the Multiple View Matrix
WLOG, choose camera frame 1 as the reference Multiple View Matrix Theorem [Rank Deficiency Condition] My opinion on this approach now is that it is way too early to write down the rank deficiency condition on N in terms of the determinants of its submatrices. After a little linear algebraic manipulation on N, we may obtain a matrix H of this form whose rank is closely related to that of N. More specifically, rank(N) is equal to m+2 plus rank(H). This gives us two possible cases: one is more generic and the other degenerate. In any case, we know that if x1, …, xm are images of some 3-D point p, the matrix H must be rank deficient, or in other words, the two columns of H are linearly dependent. (generic) (degenerate) Let then and are linearly dependent.

ALGEBRA – M Matrix Implies Bilinear Constraints
Fact: Given non-zero vectors Hence, we have Now let us examine the rank deficient H matrix and see what we get from it. First observe that the two columns of H are linearly dependent implies that the pair of vectors from each three rows must be linearly dependent. This in fact gives rise to the well-known epipolar constraints. One thing we notice from this derivation is that, epipolar constraints are only necessary but not sufficient for matrix H to be rank deficient. These constraints are only necessary but NOT sufficient!

ALGEBRA – M Matrix Implies Trilinear Constraints
Fact: Given non-zero vectors Hence, we have These constraints are only necessary but NOT sufficient! Now let us see what else we can get from the H matrix. Another simple linear algebraic fact directly leads to the so called trilinear constraints. Again, these constraints are only necessary but not sufficient since some of the components in the H matrix might be zero. In any case, you do not have any further relationship among any 4 views. Hence quadrilinear constraints do not exist! So all papers in the computer vision literature on quadrilinear constraints and quadrifocal tensors are in fact studying something empty and null. However, there is NO further relationship among any 4 views. Quadrilinear constraints hence do not exist!

ALGEBRA: multilinear constraints v.s. rank deficiency condition GEOMETRY: geometric interpretation of rank deficiency condition ALGORITHM: matching, transfer, motion and structure recovery So far we have derived the rank deficient matrix H for multiple images, and showed you how it algebraically implies all the well-known constraints among 2, 3, or 4 views. The next thing we like to know is what these condition and constraints mean geometrically. GENERALIZATION: line features, 3-D curves and surfaces

GEOMETRY – Uniqueness of Pre-image by Bilinear Constraints
“Bilinear means pair-wise coplanar”: except in a rare coplanar case: First, let us see how bilinear and trilinear constraints can be interpreted geometrically and then we will compare them with that of the H matrix. Trifocal plane Rectilinear motion

GEOMETRY – Uniqueness of Pre-image by Trilinear Constraints
“Trilinear means triple-wise incidental”: except in a rare collinear case: Geometrically, trilinear constraints mean… However, if we are given more than three views, in order to apply the bilinear and trilinear conditions we need to do it for all possible pairwise or triplewise views. Since each of them may cause certain degeneracy, it is very difficult to make a statement on whether or not the overall configuration for all views is degenerate.

GEOMETRY – Uniqueness of Pre-image by M Matrix
Theorem [Uniqueness of Pre-image] Given vectors with respect to camera frames, they correspond to a unique point in the 3-D space if the rank of the matrix is of rank 1. If the rank is 0, the point is determined up to a line on which all the camera centers must lie. “incidental condition” Using H matrix instead, you won’t have such a problem. There are only two cases about its rank. If the rank is one, the pre-image of all the images x1, …, xm would be uniquely determined in 3-D; if the rank is zero, i.e., the matrix is zero, it corresponds to the only degenerate configuration: the point is determined up a line on which all the camera centers must lie. . . .

GEOMETRY – Geometric Interpretation of M Matrix
is the “depth” of the point relative to the camera center. Points that give the same matrix are on a sphere of radius We have seen how the rank deficiency of the matrix H have imposed conditions on the m image vectors. We here give an interesting geometric interpretation to the value of the matrix H. In other words, we want to know what are all the possible points in 3-D that may give rise to the same H matrix. The answer is very clear. The coefficient that relates the two linearly dependent columns of H corresponds to the depth of the 3-D point. On the other hand, any point of the same distance from the center of the reference camera frame may give the same H matrix. Hence each H matrix corresponds to a sphere around the camera center. If you have the H matrix relative to a different reference camera frame, you obtain a second sphere. They intersect at a circle on which the point p must lie.

ALGEBRA: multilinear constraints v.s. rank deficiency condition GEOMETRY: geometric interpretation of rank deficiency condition ALGORITHM: matching, transfer, motion and structure recovery Knowing the algebraic and geometric characteristics about the H matrix, we next try to see how to use such condition to develop new algorithms to solve old problems in multiple view geometry. GENERALIZATION: line features, 3-D curves and surfaces

ALGORITHM 1 – Multiple View Matching Test
Given the projection matrix associated to camera frames. Then for vectors The first problem is how to tell whether a set of m vectors could be images of some 3-D point relative to a given set of m camera frames.

ALGORITHM 2 – Motion and Structure from Multiple Views
Given images of points: The second problem, also the most important one, is how to recover camera configuration from given m images of a set of n points. Using the rank deficiency condition on H, the problem becomes looking for T2, R2, …, Tm, Rm such that the two columns of H are linearly dependent. That is, there exists coefficients alpha^j’s such that the equations hold. I won’t get into the detail how to determine those coefficients alpha^j’s even without knowing all the camera motions. For now, I only tell you their values depend on a choice of coordinate frames. These frames could be either Euclidean, affine or projective. Assuming these coefficients are known, then finding the camera configuration simply becomes a problem of solving a linear equation. This equation have a unique solution if we have in general more than 6 points.

ALGORITHM 2 – SVD Based Four Step Algorithm
Here is the outline of this algorithm. The only thing I’d like to point out is that the algorithm is initialized by a two view algorithm due to the fact that non-trivial constraints for point features start with two views.

ALGORITHM 2 – Simulation Results
Motion XX-YY, 1000 trials and T/R ratio 1.5 Here are some simulation results for the proposed linear algorithm.

ALGORITHM 2 – Simulation Results
Motion XX-YY-ZZ, 1000 trials and T/R ratio 1.5

ALGORITHM 3 – Mapping Images to a New View
Given the projection matrix associated to camera frames. Then for given vectors So given images, rank deficiency adds a linear constraint on the image. Computing the kernel of gives the new image. We can also try to play with the rank condition and have some fun. One particular thing we can do is to map multiple images to a new view without performing any 3-D reconstruction.

ALGEBRA: multilinear constraints v.s. rank deficiency condition GEOMETRY: geometric interpretation of rank deficiency condition ALGORITHM: matching, transfer, motion and structure recovery So far we have only demonstrated how to use rank deficiency condition to study point features. Now let’s see how the same idea can be generalized so as to unify the study of multiple view geometry for line, curve and even surface. GENERALIZATION: line features, 3-D curves and surfaces

GENERALIZATION – Line Features
Homogeneous representation of a 3-D line Homogeneous representation of its 2-D image Projection of a 3-D line to an image plane First let us talk about line features. To describe a line in 3-D, we need to specify a base point on the line and a vector indicating the direction of the line. On the image plane we can use a three dimensional vector l to describe the image of a line L. More specifically, if x is the image of a point on this line, its inner product with l is 0.

GENERALIZATION – Multiple View Matrix: Line v.s. Point
Point Features Line Features We may then derive a H matrix for a line feature, we call it H_l. Comparing with the H matrix for a point feature, now denoted as H_p, they both have rank 1. H_l is however a matrix of four columns. The linear dependency between any two rows of H_l gives rise to the trilinear constraints in terms of image lines, which is well-known in the literature. Geometric interpretation for the H_l matrix is no longer a sphere, but a circle. This circle gives a family of parallel lines which may give the same H_l matrix. The radius of the circle is the distance of these lines from the center of the reference camera frame and the normal of this circle is the direction of these lines.

GENERALIZATION – Point/Point Duality
Point/point duality between a camera center and a 3-D point: Theorem [Point/Point Duality] From matrix only, if a camera center is moving on a straight line, a fixed 3-D point is determined up to a circle; if a camera center is fixed but a point is moving on a line, the line is determined up to a circle. Immediately, from the geometric interpretation of the H matrix for point and line features, we get some interesting duality between 3-D point and camera center.

GENERALIZATION – Line Features (v.s. Point Features)
Continue with constraints that the rank of the matrix H_l imposes on multiple images of a line feature. If rank(H) is 1, it means that all the planes through every camera center and image line intersect at a unique line in 3-D. If the rank is 0, it corresponds to the only degenerate case that all the planes are the same hence the 3-D line is determined only up to a plane on which all the camera centers must lie.

GENERALIZATION – SFM from Line Features
Given images of lines: Using the rank deficiency condition for line feature, we can develop in exact parallel a set of algorithms as for point feature. In particular, for motion estimation from line features, the problem is simply to find R2, T2,…, Rm, Tm such that the matrix H_l is of rank 1. Hence for some special alpha, they satisfy the following equation. In general, given more than 12 lines, the rank of the matrix Mi is 11 hence the solution to the equation will be unique. The rest of the algorithm is ignored here.

GENERALIZATION – Planar Features
Homogeneous representation of a 3-D plane Projection of a planar point to the image Projection of a planar line to the image

GENERALIZATION – Multiple View Matrix: Coplanar Features
Given that a point and line features lie on a plane in 3-D space: Besides multilinear constraints, it simultaneously gives homography:

GENERALIZATION – Coplanar Point/Line Duality
On the plane any two points determine a line and any two lines determine a point. Theorem [Point/Line Duality] For planar features, points and lines are hence equivalent!

GENERALIZATION – SFM from Coplanar Features
On the plane a set of points is equivalent to a set of lines, vice versa. One can use either planar or to solve SFM as in the generic point and line case. Algorithms need only minor changes. Rank deficiency of planar or exploits multilinear constraints and homography constraints simultaneously.

GENERALIZATION – 3-D Curves & Surfaces
Differentiating the matrix of a point (moving) along a curve: gives rise to rank deficiency condition for curve. intensity level sets region boundaries . . .

GENERALIZATION – From Tangent to Point-wise Correspondence
gives rise to a set of ordinary differential equations: The rank deficiency condition for relates points and tangent lines of a curve. Solving these equations establishes point-wise correspondence for image curves and in fact eventually for surface as well. gives constraints on curvature and normals of image curves.

CONCLUSIONS AND ON-GOING WORK
Rank deficiency condition simplifies and unifies existing algebraic results in multilinear constraints (no tensor and algebraic geometry). Rank deficiency condition exhibits clear geometric interpretation. Rank deficiency condition unifies the study of point, line, curve and even surface in 3-D. Rank deficiency condition naturally reveals point/point and point/line duality. Rank deficiency condition gives rise to uniform linear algorithms for feature matching, motion recovery and new view synthesis. The results no longer discriminate two, three, four or multiple views, nor Euclidean, affine or projective camera models. Consistent, optimal and robust reconstruction of motion/structure. Numerical algorithms for curve, surface reconstruction from m views. Multiple views of multiple rigid body motions.

Multiple View Geometry Unified
by Yi Ma Kun Huang, Rene Vidal and Jana Kosecka CSL Technical Report, UILU-ENG # (DC-200), 05/08/01 CSL Technical Report, UILU-ENG # (DC-201), 05/08/01

Multiple View Geometry Unified

Similar presentations

Presentation on theme: "Multiple View Geometry Unified"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple View Geometry Unified

Similar presentations

Presentation on theme: "Multiple View Geometry Unified"— Presentation transcript:

Similar presentations

About project

Feedback