Rank Conditions in Multiple View Geometry

Rank Conditions in Multiple View Geometry
by Yi Ma Rene Vidal (EECS.UCB), Kun Huang (ECE.UIUC) Jana Kosecka (CS.GMU), Robert Fossum (Math.UIUC) Perception & Decision Laboratory Decision & Control Group, CSL Image Formation & Processing Group, Beckman Electrical & Computer Engineering Dept., UIUC This talk is about geometry of multiple images. It is a work with my student Kun Huang at UIUC, a student Rene Vidal at UC Berkeley, and professor Jana Kosecka at George Mason University, Robert Fossum, at the mathematics department at University of Illinois. This work is really a continuation of my PhD project at UC Berkeley. Of course I wish I had the same material for my interview talk here last year. However, I am also afraid that what I am going to talk about today will to some extent trivialize the four year worth of my PhD work, or the twenty years worth of people’s work on multiple view geometry. But in any case, the simplicity and beauty of this subject is so irresistible that I am willing to damage my career a little bit, if that is necessary.

FORMULATION: camera model and multiple images
POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion Here is the outline of my talk. For people who are not necessarily experts in computer vision, I will first talk a little bit about what is multiple view geometry: its history, previous work and basic mathematical models for camera and image. Then I am going to start with the simplest case, i.e. studying the geometry of multiple images of a point. I will compare two approaches, the traditional multilinear multifocal constraints v.s. the so called rank conditions that I propose in this talk. Once we are clear about the weakness of the multilinear constraint approach, we will see how far the rank conditions will take us. Since rank conditions are simply linear algebraic, I will show you a few conceptual algorithms for all kinds of purposes that people used to do by multilinear constraints and try to convince you that the rank conditions make our life much easier. In the end, I hope I will be able to sell you the following message that: we are merely at the beginning of understanding multiple view geometry, either its theory or its practice… CONCLUSIONS AND ON-GOING WORK

POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion First of all, some background material and problem formulation. CONCLUSIONS AND ON-GOING WORK

FORMULATION - Fundamental Geometric Problem
Input: Corresponding images (of “features”) in multiple images. Output: Camera motion, camera calibration, object structure. Jana’s apartment Here is a picture illustrating the fundamental problem that we are interested in multiple view geometry. The scenario is that we are given multiple pictures or images of some 3D object and after establishing correspondence of certain geometric features such as a point, line or curve, we then try to use such information to recover the relative locations of the camera where these images are taken, as well as recover the 3D structure of the object. Image courtesy of Jana Kosecka

FORMULATION – Orthodox View (State of the Art)
curve & surface line point plane Euclidean affine projective geometry algebra algorithm 2 views 3 views 4 views m views perspective orthographic omni-directional This problem has been extensively studied in the past twenty years by people in computer vision and robotics. An orthodox approach to multiple view geometry in the literature is a subtle network of case studies: first, a generic scene usually consists of different types of features, they are mostly studied or used separately; secondly, depending on whether the camera is calibrated or not, different theories or algorithms are developed for Euclidean, affine or projective transformations or spaces; or for different physical or geometrical models of cameras; surprisingly the existing multiple view geometry is really not about multiple view: analysis for m images is typically reduced to pairwise, triple wise or quadruple wise view analysis; in the effort to generalize multiple view geometry to higher spaces, the results are even more sporadic; this unavoidably causes a separation between theory and algorithm: while theory is developed for these different cases, but in practice, we want the algorithm to utilize all the features, all the constraints from all images simultaneously. There is no consensus on how to do that. Despite so many cases have been studied, many theoretical questions remain largely open. For example, a simple but important question: have we found all the intrinsic constraints among multiple images, after all these 1,728 combinations of cases?

FORMULATION – Literature Review
“Multiple” view geometry theory Two views: Kruppa’13, Longuet-Higgins’81, Huang & Faugeras’89, … Three views: Spetsakis & Aloimonos’90, Shashua’94, Hartley’94, … Four views: Triggs’95, Shashua’00, … Multiple views: Heyden & Astrom’97’98, … Higher dimension: Wolf & Shashua’01 Multiple view geometry algorithms Euclidean: Maybank’93, Weng, Ahuja & Huang’93, … Affine: Quan & Kanade’96, … Projective: Triggs’96, … Orthographic: Tomasi & Kanade’92, … Recent books on multiple view geometry The theoretical study of multiple view geometry started in the beginning of last century, it was revived around twenty years ago. I underlined a few papers which have really made genuine contributions to the theory. In parallel to the theory, people have also developed numerous algorithms to solve various versions of this problem. Two comprehensive books on this subject were recently published. They give a rather good organization of this orthodox approach to multiple view geometry. Such a vast literature makes us wonder, what still can we do to make things better? 1. Multiple view geometry in computer vision, Hartley & Zisserman’00. 2. Geometry of multiple images, Faugeras, Luong & Papadopoulo’01.

FORMULATION – A Provocative Stand: One Theorem for All & More?
curve & surface plane line algorithm projective algebra point affine geometry Euclidean Rank Conditions perspective orthographic 2 views omni-directional What is in common to all the things here in multiple view geometry? They all turn out to be instantiations of something truly fundamental, what we call as the rank conditions. The rank conditions are nothing but a unified way of expressing all kinds of incidence relations that are present among multiple features and among multiple images. What we can promise here is that after this talk, you will know a great deal of multiple view geometry by remembering only one theorem! 3 views 4 views m views

FORMULATION – A Little Notation: Hat Operator
We will always use column vectors, except for one case which I will mention later on. Given a three dimensional vector u, we use u-hat to represent a 3x3 skew symmetric matrix associated to it. In the literature, people also use u-product for the same thing. Using this notation, u-hat multiplying a vector v is then equal to their cross product. In particular, u crosses with u itself gets zero.

POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion Let us start with the simplest object, or point features and try to compare these two different approaches: multilinear or multifocal constraints and the rank conditions. CONCLUSIONS AND ON-GOING WORK

POINT FEATURE – Pinhole Camera Model
Homogeneous coordinates of a 3-D point Homogeneous coordinates of its 2-D image Projection of a 3-D point to an image plane Now let me quickly go through the basic mathematical model for a camera system. Here is the notation. We will use a four dimensional vector X for the homogeneous coordinates of a 3-D point p, its image on a pre-specified plane will be described also in homogeneous coordinate as a three dimensional vector x. If everything is normalized, then W and z can be chosen to be 1. We use a 3x4 matrix Pi to denote the transformation from the world frame to the camera frame. R may stand for rotation, T for translation. Then the image x and the world coordinate X of a point is related through the equation, where lambda is a scale associated to the depth of the 3D point relative to the camera center o. But in general the matrix Pi can be any 3x4 matrix, because the camera may add some unknown linear transformation on the image plane. Usually it is denoted by a 3x 3 matrix A(t).

POINT FEATURE – Multiple View Structure From Motion
Given corresponding images of points: recover everything else from equations associated to each: Now mathematically we can formulate the problem at hand as following: given m corresponding images of n 3D points. That is, for each of these n points, say p, its images with respect to m distinct camera frames are known. These images then satisfy the following equations, where Pi_i is a 3x4 matrix representing the transformation and projection from the world frame to the ith image. Lambda_i is the depth of the point p relative to the center of the ith frame. . . .

POINT FEATURE – Conventional Multilinear (Multifocal) Constraints
For images of the same 3-D point : Let us first see how multiple view geometry is traditionally studied. In particular, what is the nature of multilinear constraints among multiple images. For given m images of a 3-D point p, we can rewrite the equations in a single matrix form as follows. Speculate this equation for a while, we may observe that both lambda and X are associated to the 3-D location of the point p, relative to different coordinate frames. To eliminate these 3-D structural parameters, we can form the following matrix N. From the above equation, it is straightforward to see that this matrix is rank deficient. Furthermore, if this matrix has an exact rank m+3, we can recover \lambda and X as its only kernel. Hence we can reduce our study to examine this matrix N. Notice that given a camera configuration specified by the matrices Pi’s, not any set of vectors x1, …, xm would make this matrix rank deficient. That means that images from the same 3-D point should satisfy certain relationship. One way to describe such relationship is by writing down the determinants of all m+4 by m+4 submatrices of N and these determinants must be zero. Turns out these equations can be reduced to equations involving images from 2, 3, or 4 views at a time, which leads to the traditional multilinear constraints for 2, 3, 4 views… (leading to the conventional approach) Multilinear constraints among 2, 3, 4-wise views

POINT FEATURE – Rank Condition on the Multiple View Matrix
WLOG, choose camera frame 1 as the reference Multiple View Matrix Lemma [Rank Condition for Point Features] My opinion on this approach now is that it is way too early to write down the rank deficiency condition on N in terms of the determinants of its submatrices. After a little linear algebraic manipulation on N, we may obtain a matrix H of this form whose rank is closely related to that of N. More specifically, rank(N) is equal to m+2 plus rank(M). This gives us two possible cases: one is more generic and the other degenerate. In any case, we know that if x1, …, xm are images of some 3-D point p, the matrix H must be rank deficient, or in other words, the two columns of H are linearly dependent. (generic) (degenerate) Let then and are linearly dependent.

POINT FEATURE – Rank Condition Implies Bilinear Constraints
Fact: Given non-zero vectors Hence, we have Now let us examine the rank deficient M matrix and see what we get from it. First observe that the two columns of M are linearly dependent implies that the pair of vectors from each three rows must be linearly dependent. This in fact gives rise to the well-known epipolar constraints. One thing we notice from this derivation is that, epipolar constraints are only necessary but not sufficient for matrix H to be rank deficient. These constraints are only necessary but NOT sufficient!

POINT FEATURE – Rank Condition Implies Trilinear Constraints
Fact: Given non-zero vectors Hence, we have These constraints are only necessary but NOT sufficient! Now let us see what else we can get from the M matrix. Another simple linear algebraic fact directly leads to the so called trilinear constraints. Again, these constraints are only necessary but not sufficient since some of the components in the M matrix might be zero. In any case, you do not have any further relationship among any 4 views. Hence quadrilinear constraints do not exist! So all papers in the computer vision literature on quadrilinear constraints and quadrifocal tensors are in fact studying something empty and null. However, there is NO further relationship among quadruple wise views. Quadrilinear constraints hence are redundant!

POINT FEATURE – Uniqueness of Pre-image (Bilinear Constraints)
“Bilinear means pair-wise coplanar”: except in a rare coplanar case: First, let us see how bilinear and trilinear constraints can be interpreted geometrically and then we will compare them with that of the M matrix. Trifocal plane Rectilinear motion

POINT FEATURE – Uniqueness of Pre-image (Trilinear Constraints)
“Trilinear means triple-wise incidental”: except in a rare collinear case: Geometrically, trilinear constraints mean… However, if we are given more than three views, in order to apply the bilinear and trilinear conditions we need to do it for all possible pairwise or triplewise views. Since each of them may cause certain degeneracy, it is very difficult to make a statement on whether or not the overall configuration for all views is degenerate.

POINT FEATURE – Uniqueness of Pre-image (Multiple View Matrix)
Proposition [Uniqueness of Pre-image] Given vectors with respect to camera frames, they correspond to a unique point in the 3-D space if the rank of the matrix is1. If the rank is 0, the point is determined up to a line on which all the camera centers must lie. “point incidence condition” Using M matrix instead, you won’t have such a problem. There are only two cases about its rank. If the rank is one, the pre-image of all the images x1, …, xm would be uniquely determined in 3-D; if the rank is zero, i.e., the matrix is zero, it corresponds to the only degenerate configuration: the point is determined up a line on which all the camera centers must lie. . . .

POINT FEATURE – Geometric Interpretation
is the “depth” of the point relative to the camera center. Points that give the same matrix are on a sphere of radius We have seen how the rank deficiency of the matrix M have imposed conditions on the m image vectors. We here give an interesting geometric interpretation to the value of the matrix M. In other words, we want to know what are all the possible points in 3-D that may give rise to the same M matrix. The answer is very clear. The coefficient that relates the two linearly dependent columns of M corresponds to the depth of the 3-D point. On the other hand, any point of the same distance from the center of the reference camera frame may give the same M matrix. Hence each M matrix corresponds to a sphere around the camera center. If you have the M matrix relative to a different reference camera frame, you obtain a second sphere. They intersect at a circle on which the point p must lie.

POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion Now let us see how deep the rabbit hole runs along the rank condition. CONCLUSIONS AND ON-GOING WORK

GENERALIZATION – Line Feature
Homogeneous representation of a 3-D line Homogeneous representation of its 2-D co-image Projection of a 3-D line to an image plane First let us talk about line features. To describe a line in 3-D, we need to specify a base point on the line and a vector indicating the direction of the line. On the image plane we can use a three dimensional vector l to describe the image of a line L. More specifically, if x is the image of a point on this line, its inner product with l is 0.

GENERALIZATION – Multiple View Matrix: Line v.s. Point
Point Features Line Features We may then derive a M matrix for a line feature, we call it M_l. Comparing with the M matrix for a point feature, now denoted as M_p, they both have rank 1. M_l is however a matrix of four columns. The linear dependency between any two rows of M_l gives rise to the trilinear constraints in terms of image lines, which is well-known in the literature. Geometric interpretation for the M_l matrix is no longer a sphere, but a circle. This circle gives a family of parallel lines which may give the same M_l matrix. The radius of the circle is the distance of these lines from the center of the reference camera frame and the normal of this circle is the direction of these lines.

GENERALIZATION – Rank Conditions: Line v.s. Point
Continue with constraints that the rank of the matrix M_l imposes on multiple images of a line feature. If rank(M) is 1, it means that all the planes through every camera center and image line intersect at a unique line in 3-D. If the rank is 0, it corresponds to the only degenerate case that all the planes are the same hence the 3-D line is determined only up to a plane on which all the camera centers must lie.

GENERALIZATION – Incidence Relations Among Features
Incidence conditions: inclusion, intersection, and restriction. “nothing but incidence conditions” Now consider multiple images of a simplest object, say a cube. All the constraints are incidence relations, are all of the same nature. Is there any way that we can express all the constraints in a unified way? Yes, there is. . . .

GENERALIZATION – What is the Matrix?
Theorem 1 [The Universal Rank Condition] for images of a point on a line: Multi-nonlinear constraints among 3, 4-wise images. Multi-linear constraints among 2, 3-wise images. Here comes my favorite slide: Consider multiple images of a point on a line. In order to express all the incidence constraints associated to these features and all their images, you can formally define a matrix M as following, where the components D_i and D_i^perp are up to you to choose and plug in. D_I’s are really the images of either the point or the line, D_I^perp’s are the coimages. Choices for D_I and D_j are independent if I is not equal to j. The interesting thing is no matter what you choose, the rank of the resulting matrix must be one of the two cases. The second case gives you all the multi-linear constraints, of course only up to 2 or 3 views; if you really want to know what is among four views, there are some nonlinear constraints. We will see a few examples and give you some intuitive ideas…

GENERALIZATION – Implications of the Rank Condition
Examples: Case 1: a line reference Case 2: a point reference All previously known constraints are the theorem’s instantiations. It implies more constraints and is now complete. Degenerate configurations if and only if a drop of rank. Here we take two instantionations of the multiple view matrix.

GENERALIZATION – Global Multiple-View Analysis (Example)
The rank condition actually allows to you to do multiple view analysis globally. For example if you choose a multiple view matrix as following. Its rank can only be 1 or 2. Corresponding to each value, there is a generic picture of the configuration of the 3D features involved and the relative camera configuration.

GENERALIZATION – A Family of Incidental Lines (A Corollary)
each can randomly take the image of any of the lines: Nonlinear constraints among up to four views What is the essential meaning of this rank 2 case? Here is an example explaining it. Consider a family of lines in 3D intersecting at one point p. You then randomly choose the image of any of the lines in the family in each view and form a multiple view matrix. Then this matrix in general has rank 2. We know before that if all the images chosen happen to correspond to the same 3D line, the rank of M is 1. Here, you don’t need to have exact correspondence among those lines, yet you still get some non-trivial constraints among their images… . . .

GENERALIZATION – Restriction to a Plane (A Corollary)
Homogeneous representation of a 3-D plane Rank conditions on the new extended remain exactly the same! Corollary [Coplanar Features] We have talked about points and lines, what about plane? Suppose now the point or the line feature belongs to some plane in 3D, say pi. We know in general such a plane can be expressed by the following equation. We may lump all the coefficients into a vector pi. Pi^1 is just the first three components and pi^2 is the d. Then the rank condition we had before must be modified somehow since now we have this extra restriction. It turns out all you need to do is to append an extra row to the multiple view matrix and then all the rank conditions remain exactly the same.

GENERALIZATION – Multiple View Matrix: Coplanar Features
Given that a point and line features lie on a plane in 3-D space: In addition to previous constraints, it simultaneously gives homography: For example, the multiple view matrices for m images of a point or a line on such a plane are given by the following two matrices. According to the theorem, they both should have rank 1. What this extra row gives you is exactly the so called homography in computer vision literature. In addition to the homography, the multiple view matrix keeps all the constraints.

GENERALIZATION – Coplanar Point/Line Duality
On the plane any two points determine a line and any two lines determine a point. Corollary [Point/Line Duality] For planar features, points and lines are hence equivalent! Here is just a simple result which shows how the duality between coplanar points and lines are revealed or kept through the rank condition. However, I don’t really like these new multiple view matrices for the planar case. Why? Because these multiple view matrix depends explicitly on the vector pi which is the 3D location of the plane. Usually we only know points and lines are on a plane but we do not know what pi is. This raises a question: can we express coplanar condition intrinsically using the rank condition? The answer is yes.

GENERALIZATION – Intrinsic Rank Condition for Planar Features
Multi-quadratic equations in given images Suppose you have four coplanar points. Then take any two pairs of points and each pair should give you a virtual line through them. These two virtual lines then intersect at a virtual point. Although you do not physically see these lines and point on the images, their location on the image plane can however be computed from the images of these four points. From the old rank condition, we know how to express the incidence condition that two lines intersect at a point. That is to build a multiple view matrix of the form, and it must have rank 1. This leads to some multi-quadratic equations in given images of the four points. Taking different pairs of 4 coplanar points, you can get a total of 3 virtual points. Rank conditions like this associated to these 7 points are then equivalent to all the homographic constraints. 4 coplanar points + 3 virtual points = 7 effective points

GENERALIZATION – Euclidean Imbedding of Dynamic Scenes
Before: Now: This is a perspective projection from <n to <2. Time base We have seen that the rank conditions really capture all the geometric relationships among multiple images of a static scene. What about situations like these? There are multiple independent objects or links in a scene? The previous imaging model no longer apply here because even with respect to the world reference frame, coordinates of each point maybe time-varying. In situations that you can find a so-called time base such that X(t) is just a linear combination of these base functions, you can substitute this express in and lump the time base with the projection matrix Pi and call the resulting matrix Pi-bar. We call this process as Euclidean imbedding. As a result, you end up with a non-traditional projection from R^n to R^2. What are all the constraints among multiple images generated by such a projection?

GENERALIZATION – Rank Condition in Space of High Dimension
Theorem 2 [Generalized Rank Condition for from <n to <k] In fact, mathematically we can study the most general case and consider the projection from R^n to R^k with k < n. Then if you have multiple images of a hyperplane lying inside another, you get a natural generalization of the rank condition we had for classic multiple view geometry. Just to notice that if n = 3, k = 2 and for a point lying on a line, this is exactly the same theorem we had before!

GENERALIZATION – Rank Conditions for Curves & Surfaces
Differentiating the matrix of a point (moving) along a curve: gives rise to a rank condition for curve. intensity level sets region boundaries Of course, using the point, line or plane as basic ingredients it is almost trivial to find out what are the rank conditions for a 3D curve or surface. Here I don’t want to get into the detail. But just want to say that multiple view geometry may surely be used to understand shape from shading. . . .

GENERALIZATION – From Tangent to Point-wise Correspondence
gives rise to a set of ordinary differential equations: The rank deficiency condition for relates points and tangent lines of a curve. Solving these equations establishes point-wise correspondence for image curves and in fact eventually for surface as well. gives constraints on curvature and normals of image curves. Well this slide is just the continuation on the curve story…

POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion CONCLUSIONS AND ON-GOING WORK

APPLICATIONS – Multiple View Matching Test
Given the projection matrix associated to camera frames. Then for vectors The first problem is how to tell whether a set of m vectors could be images of some 3-D point relative to a given set of m camera frames.

APPLICATIONS – Transferring Images to a New View
Given the projection matrix associated to camera frames. Then for given vectors So given images, rank deficiency adds a linear constraint on the image. Computing the kernel of gives the new image.

APPLICATIONS – Motion and Structure from Multiple Views
Given images of points: The third problem, also the most important one, is how to recover camera configuration from given m images of a set of n points. Using the rank deficiency condition on M, the problem becomes looking for T2, R2, …, Tm, Rm such that the two columns of M are linearly dependent. That is, there exists coefficients alpha^j’s such that the equations hold. I won’t get into the detail how to determine those coefficients alpha^j’s even without knowing all the camera motions. For now, I only tell you their values depend on a choice of coordinate frames. These frames could be either Euclidean, affine or projective. Assuming these coefficients are known, then finding the camera configuration simply becomes a problem of solving a linear equation. This equation have a unique solution if we have in general more than 6 points.

APPLICATIONS – SVD Based Four Step Algorithm for SFM
Here is the outline of this algorithm. The only thing I’d like to point out is that the algorithm is initialized by a two view algorithm due to the fact that non-trivial constraints for point features start with two views.

APPLICATIONS – Utilize All Incidence Conditions (Example)
Three edges intersect at each vertex. . . .

APPLICATIONS – Utilizing All Incidence Conditions (Simulations)

APPLICATIONS – Landing a Helicopter (Experiments)
Images courtesy of Omid Shakernia, UCB

POINT FEATURE: multilinear constraints v.s. rank conditions GENERALIZATION: line, plane, space of higher dimensions APPLICATIONS: matching, transfer, structure from motion CONCLUSIONS AND ON-GOING WORK

CONCLUSIONS AND ON-GOING WORK
Rank condition simplifies, unifies and completes existing algebraic results on multiview constraints (no tensor and algebraic geometry). Rank condition is for all features, all incidence relations, all number of views, all types of projection, all (linear) spaces of arbitrary dimensions. Rank condition intrinsically ties together geometry and algebra. Metric multiple view geometry. Consistent, optimal and robust algorithms for correspondence, image-based view synthesis, and reconstruction of motion & structure. Apply to dynamical scenes: human motion; sensor networks… Real-time algorithms for autonomous navigation and robotic control.

Rank Conditions in Multiple View Geometry
by Rene Vidal (EECS.UCB), Kun Huang (ECE.UIUC) Jana Kosecka (CS.GMU), Robert Fossum (Math.UIUC) Since we are not able to publish our papers in any of the main stream computer vision conferences, we submitted our papers directly to journals. At the mean time, we are writing a book on geometry of multiple images. We hope that it will change the traditional view on multiple view geometry. “Rank conditions on the multiple view matrix”, submitted to IJCV. “General rank conditions in multiple view geometry”, submitted to D&CG “An invitation to 3-D vision”, Ma, Soatto, Kosecka, and Sastry, 2002.

Rank Conditions in Multiple View Geometry

Similar presentations

Presentation on theme: "Rank Conditions in Multiple View Geometry"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rank Conditions in Multiple View Geometry

Similar presentations

Presentation on theme: "Rank Conditions in Multiple View Geometry"— Presentation transcript:

Similar presentations

About project

Feedback