© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.

Slides:



Advertisements
Similar presentations
Epipolar Geometry.
Advertisements

Alignment Visual Recognition “Straighten your paths” Isaiah.
Announcements. Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving.
Invariants (continued).
3D reconstruction.
3D Reconstruction – Factorization Method Seong-Wook Joo KG-VISA 3/10/2004.
Mapping: Scaling Rotation Translation Warp
Geometry (Many slides adapted from Octavia Camps and Amitabh Varshney)
Camera calibration and epipolar geometry
Structure from motion.
HCI 530 : Seminar (HCI) Damian Schofield. HCI 530: Seminar (HCI) Transforms –Two Dimensional –Three Dimensional The Graphics Pipeline.
2D Geometric Transformations
Linear Algebra and SVD (Some slides adapted from Octavia Camps)
Invariants. Definitions and Invariants A definition of a class means that given a list of properties: –For all props, all objects have that prop. –No.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Structure-from-Motion Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. –Equivalently,
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
CS485/685 Computer Vision Prof. George Bebis
3-D Geometry.
COMP322/S2000/L221 Relationship between part, camera, and robot (cont’d) the inverse perspective transformation which is dependent on the focal length.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
3D Geometry for Computer Graphics
Affine Structure-from-Motion: A lot of frames (1) ISP.
The Pinhole Camera Model
Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.
Stockman MSU/CSE Math models 3D to 2D Affine transformations in 3D; Projections 3D to 2D; Derivation of camera matrix form.
CS 450: Computer Graphics 2D TRANSFORMATIONS
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #15.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Mathematical Fundamentals
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Euclidean cameras and strong (Euclidean) calibration Intrinsic and extrinsic parameters Linear least-squares methods Linear calibration Degenerate point.
Epipolar geometry The fundamental matrix and the tensor
1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.
Homogeneous Coordinates (Projective Space) Let be a point in Euclidean space Change to homogeneous coordinates: Defined up to scale: Can go back to non-homogeneous.
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Geometric Models & Camera Calibration
视觉的三维运动理解 刘允才 上海交通大学 2002 年 11 月 16 日 Understanding 3D Motion from Images Yuncai Liu Shanghai Jiao Tong University November 16, 2002.
CSCE 643 Computer Vision: Structure from Motion
Geometric Camera Models
Affine Structure from Motion
Single-view geometry Odilon Redon, Cyclops, 1914.
3D Imaging Motion.
EECS 274 Computer Vision Affine Structure from Motion.
1 Chapter 2: Geometric Camera Models Objective: Formulate the geometrical relationships between image and scene measurements Scene: a 3-D function, g(x,y,z)
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Determining 3D Structure and Motion of Man-made Objects from Corners.
Graphics Lecture 2: Slide 1 Lecture 2 Transformations for animation.
Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.
Digital Image Processing Additional Material : Imaging Geometry 11 September 2006 Digital Image Processing Additional Material : Imaging Geometry 11 September.
Stereo March 8, 2007 Suggested Reading: Horn Chapter 13.
Instructor: Mircea Nicolescu Lecture 9
Lec 26: Fundamental Matrix CS4670 / 5670: Computer Vision Kavita Bala.
Reconstruction of a Scene with Multiple Linearly Moving Objects Mei Han and Takeo Kanade CISC 849.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
Depth from disparity (x´,y´)=(x+D(x,y), y)
René Vidal and Xiaodong Fan Center for Imaging Science
Homogeneous Coordinates (Projective Space)
Epipolar geometry.
Structure from motion Input: Output: (Tomasi and Kanade)
Two-view geometry.
Geometric Camera Models
Uncalibrated Geometry & Stratification
Reconstruction.
The Pinhole Camera Model
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure of the world, and/or the motion of a camera using a sequence of images taken by a moving camera. –Equivalently, we can think of the world as moving and the camera as fixed. Like stereo, but the position of the camera isn’t known (and it’s more natural to use many images with little motion between them, not just two with a lot of motion). –We may or may not assume we know the parameters of the camera, such as its focal length.

© 2003 by Davi GeigerComputer Vision October 2003 L1.2 As with stereo, we can divide the problem: –Correspondence. –Reconstruction. We will focus on the reconstruction. –So we assume that each image contains some points, and we know which points match which. Structure-from-EgoMotion …

© 2003 by Davi GeigerComputer Vision October 2003 L1.3 Representation We’ll talk about a fixed camera, and moving object. We use scaled orthographic projection (weak perspective). –we remove the z coordinate and scale all x and y coordinates the same amount. Key point : Points Some matrix The image Then:

© 2003 by Davi GeigerComputer Vision October 2003 L1.4 Examples: S = [s, 0, 0, 0; 0, s, 0, 0]; This is just projection (3D to 2D), with scaling by s. S = [s, 0, 0, s*tx; 0, s, 0, s*ty]; This is 3D translation by (tx,ty,anything), projection, and scaling. S encodes: –Projection, Scaling,Translation (by tx and ty), and Rotation. –Let us now derive S.

© 2003 by Davi GeigerComputer Vision October 2003 L1.5 Rotation R can represent a 3D rotation of the points in P. What are the constraints on R?. First, look at 2D rotation (easier) RR T = Identity. R T is also a rotation matrix, in the opposite direction to R.

© 2003 by Davi GeigerComputer Vision October 2003 L1.6 Full 3D Rotation Any rotation can be expressed as combination of three rotations about three axes. Rows (and columns) of R are orthonormal vectors. R has determinant 1 (not -1). Rotation about z axis: Rotates x,y coordinates. Leaves z coordinates fixed.

© 2003 by Davi GeigerComputer Vision October 2003 L1.7 Intuitively, it makes sense that 3D rotations can be expressed as 3 separate rotations about fixed axes. Rotations have 3 degrees of freedom; two describe an axis of rotation, and one the amount. Rotations preserve the length of a vector, and the angle between two vectors. Therefore, (1,0,0), (0,1,0), (0,0,1) must be orthonormal after rotation. After rotation, they are the three columns of R. So these columns must be orthonormal vectors for R to be a rotation. Similarly, if they are orthonormal vectors (with determinant 1) R will have the effect of rotating (1,0,0), (0,1,0), (0,0,1). Same reasoning as 2D tells us all other points rotate too. Note if R has determinant -1, then R is a rotation plus a reflection.

© 2003 by Davi GeigerComputer Vision October 2003 L1.8 S: Putting it Together Scale Projection 3D Translation 3D Rotation We can just write st x as t x and st y as t y.

© 2003 by Davi GeigerComputer Vision October 2003 L1.9 Affine Structure from Motion

© 2003 by Davi GeigerComputer Vision October 2003 L1.10 Affine Structure-from-Motion: Two Frames (1) To simplify, suppose for the first four points:

© 2003 by Davi GeigerComputer Vision October 2003 L1.11 Affine Structure-from-Motion: Two Frames (2) Looking at the first four points, we get: We can solve for motion by inverting matrix of points. Or, explicitly, we see that first column on left (images of first point) give the translations. After solving for these, we can solve for the each column of the s components of the motion using the images of each point, in turn.

© 2003 by Davi GeigerComputer Vision October 2003 L1.12 Affine Structure-from-Motion: Two Frames (3) Once we know the motion, we can use the images of another point to solve for the structure. We have four linear equations, with three unknowns.

© 2003 by Davi GeigerComputer Vision October 2003 L1.13 Affine Structure-from-Motion: Two Frames (4) Suppose we just know where the k’th point is in image 1. Then, we can use the first two equations to write a k and b k as linear in c k. The final two equations lead to two linear equations in the missing values and c k. If we eliminate c k we get one linear equation in the missing values. This means the unknown point lies on a known line. That is, we recover the epipolar constraint. Furthermore, these lines are all parallel.

© 2003 by Davi GeigerComputer Vision October 2003 L1.14 Affine Structure-from-Motion: Two Frames (5) But, what if the first four points aren’t so simple? Then we define A, affine transformation, so that: This is always possible as long as the points aren’t coplanar. Then, given: Note that corresponds to translation of the points, plus a linear transformation.

© 2003 by Davi GeigerComputer Vision October 2003 L1.15 Affine Structure-from-Motion: Two Frames (6) We have: And: Then: is our motion. Thus, we can never determine the exact 3D structure of the scene. We can only determine it up to some transformation, A.

© 2003 by Davi GeigerComputer Vision October 2003 L1.16 Affine Structure-from-Motion: Many frames (1) ISP

© 2003 by Davi GeigerComputer Vision October 2003 L1.17 First Step: Solve for Translation (1) We pick the center of mass as origin, i.e., the average of all 3d points. It also averages noise locations. Rotation doesn’t move the origin, which is now the center of mass. Neither does scaled orthographic projection.

© 2003 by Davi GeigerComputer Vision October 2003 L1.18 First Step: Solve for Translation (2) Thus, translation can be eliminated.

© 2003 by Davi GeigerComputer Vision October 2003 L1.19 Rank Theorem has rank 3. This means there are 3 vectors such that every row of is a linear combination of these vectors. These vectors are the rows of P S P P SVD is made to do this. D is diagonal with non-increasing values, select the first/top three values, i.e., make D a 3 x 3 matrix. U and V have orthonormal rows, 2f x 3 and 3 x n respectively.

© 2003 by Davi GeigerComputer Vision October 2003 L1.20 Linear Ambiguity (as before) = U(:,1:3) * D(1:3,1:3) * V(1:3,:) = (U(:,1:3) * A) * (inv(A) *D(1:3,1:3) * V(1:3,:)) has full rank. Best solution is to estimate I that’s as near to as possible, with estimate of I having rank 3. Our current method does this. Noise

© 2003 by Davi GeigerComputer Vision October 2003 L1.21 Weak Perspective Motion S P Row 2k and 2k+1 of S should be orthogonal. All rows should be unit vectors. (Push all scale into P). =(U(:,1:3)*A)*(inv(A) *D(1:3,1:3)*V(1:3,:)) Choose A so (U(:,1:3) * A) satisfies these conditions.

© 2003 by Davi GeigerComputer Vision October 2003 L1.22 Related problems we won’t cover Missing data. Points with different, known noise. Multiple moving objects.

© 2003 by Davi GeigerComputer Vision October 2003 L1.23 Final Messages Structure-from-egomotion for points can be reduced to linear algebra. Epipolar constraint reemerges. SVD useful. Rank Theorem says the images a scene produces aren’t complicated (also important for recognition).