Computer Vision – Lecture 17

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

The fundamental matrix F
Lecture 11: Two-view geometry
3D reconstruction.
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
Stereo and Projective Structure from Motion
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Computer vision: models, learning and inference
Two-view geometry.
Lecture 8: Stereo.
Lecture 11: Structure from Motion
Camera calibration and epipolar geometry
Structure from motion.
Algorithms & Applications in Computer Vision
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Lecture 21: Multiple-view geometry and structure from motion
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
Many slides and illustrations from J. Ponce
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Stereo and Structure from Motion
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Single-view geometry Odilon Redon, Cyclops, 1914.
CSE473/573 – Stereo Correspondence
CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Automatic Camera Calibration
Computer vision: models, learning and inference
Camera Geometry and Calibration Thanks to Martial Hebert.
Multi-view geometry.
Epipolar geometry The fundamental matrix and the tensor
Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –
Homogeneous Coordinates (Projective Space) Let be a point in Euclidean space Change to homogeneous coordinates: Defined up to scale: Can go back to non-homogeneous.
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Structure from Motion Computer Vision CS 143, Brown James Hays 11/18/11 Many slides adapted from Derek Hoiem, Lana Lazebnik, Silvio Saverese, Steve Seitz,
Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/06/12 Many slides adapted from Lana Lazebnik, Silvio Saverese,
Computer Vision – Lecture 16
Structure from Motion Course web page: vision.cis.udel.edu/~cv April 25, 2003  Lecture 26.
CSCE 643 Computer Vision: Structure from Motion
Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:
Stereo Many slides adapted from Steve Seitz.
Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Computer Vision – Lecture 17 Structure-from-Motion Bastian Leibe RWTH Aachen.
Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.
Affine Structure from Motion
Single-view geometry Odilon Redon, Cyclops, 1914.
A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,
Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.
EECS 274 Computer Vision Affine Structure from Motion.
stereo Outline : Remind class of 3d geometry Introduction
Computer vision: models, learning and inference M Ahad Multiple Cameras
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Structure from Motion ECE 847: Digital Image Processing
Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –
EECS 274 Computer Vision Projective Structure from Motion.
CSE 185 Introduction to Computer Vision Stereo 2.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
Two-view geometry Computer Vision Spring 2018, Lecture 10
Epipolar geometry.
Structure from motion Input: Output: (Tomasi and Kanade)
Epipolar geometry continued
Multiple View Geometry for Robotics
Uncalibrated Geometry & Stratification
Multi-view geometry.
Single-view geometry Odilon Redon, Cyclops, 1914.
Structure from motion.
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

Computer Vision – Lecture 17 Structure-from-Motion 17.01.2012 Bastian Leibe RWTH Aachen http://www.mmp.rwth-aachen.de leibe@umic.rwth-aachen.de Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

Announcements Exams Oral exams Requirement: at least 50% of exercise points achieved Proposed periods: … Will send around doodle poll with 3-4 dates

Course Outline Image Processing Basics Segmentation & Grouping Object Recognition Local Features & Matching Object Categorization 3D Reconstruction Epipolar Geometry and Stereo Basics Camera calibration & Uncalibrated Reconstruction Structure-from-Motion Motion and Tracking

Recap: A General Point Equations of the form How do we solve them? (always!) Apply SVD Singular values of A = square roots of the eigenvalues of ATA. The solution of Ax=0 is the nullspace vector of A. This corresponds to the smallest singular vector of A. SVD Singular values Singular vectors B. Leibe

Recap: Properties of SVD Frobenius norm Generalization of the Euclidean norm to matrices Partial reconstruction property of SVD Let i i=1,…,N be the singular values of A. Let Ap = UpDpVpT be the reconstruction of A when we set p+1,…, N to zero. Then Ap = UpDpVpT is the best rank-p approximation of A in the sense of the Frobenius norm (i.e. the best least-squares approximation). B. Leibe

Recap: Camera Parameters Intrinsic parameters Principal point coordinates Focal length Pixel magnification factors Skew (non-rectangular pixels) Radial distortion Extrinsic parameters Rotation R Translation t (both relative to world coordinate system) Camera projection matrix  General pinhole camera: 9 DoF  CCD Camera with square pixels: 10 DoF  General camera: 11 DoF s s B. Leibe

Recap: Calibrating a Camera Goal Compute intrinsic and extrinsic parameters using observed camera data. Main idea Place “calibration object” with known geometry in the scene Get correspondences Solve for mapping from scene to image: estimate P=PintPext Xi xi Slide credit: Kristen Grauman B. Leibe

Recap: Camera Calibration (DLT Algorithm) P has 11 degrees of freedom. Two linearly independent equations per independent 2D/3D correspondence. Solve with SVD (similar to homography estimation) Solution corresponds to smallest singular vector. 5 ½ correspondences needed for a minimal solution. Solve with …? Slide adapted from Svetlana Lazebnik B. Leibe

Recap: Triangulation – Linear Algebraic Approach Two independent equations each in terms of three unknown entries of X. Stack equations and solve with SVD. This approach nicely generalizes to multiple cameras. O1 O2 x1 x2 X? R1 R2 Stack equations and solve with …? Slide credit: Svetlana Lazebnik B. Leibe

Recap: Epipolar Geometry – Calibrated Case X x x’ t R Camera matrix: [I|0] X = (u, v, w, 1)T x = (u, v, w)T Camera matrix: [RT | –RTt] Vector x’ in second coord. system has coordinates Rx’ in the first one. The vectors x, t, and Rx’ are coplanar Slide credit: Svetlana Lazebnik B. Leibe

Recap: Epipolar Geometry – Calibrated Case X x x’ Essential Matrix (Longuet-Higgins, 1981) Slide credit: Svetlana Lazebnik B. Leibe

Recap: Epipolar Geometry – Uncalibrated Case X The calibration matrices K and K’ of the two cameras are unknown We can write the epipolar constraint in terms of unknown normalized coordinates: x x’ Slide credit: Svetlana Lazebnik B. Leibe

Recap: Epipolar Geometry – Uncalibrated Case X x x’ Fundamental Matrix (Faugeras and Luong, 1992) Slide credit: Svetlana Lazebnik B. Leibe

Recap: The Eight-Point Algorithm x = (u, v, 1)T, x’ = (u’, v’, 1)T This minimizes: Solve using… SVD! Solve using… ? Slide adapted from Svetlana Lazebnik B. Leibe

Recap: Normalized Eight-Point Algorithm Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. Use the eight-point algorithm to compute F from the normalized points. Enforce the rank-2 constraint using SVD. Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, than the fundamental matrix in original coordinates is TT F T’. Set d33 to zero and reconstruct F SVD Slide credit: Svetlana Lazebnik B. Leibe [Hartley, 1995]

Active Stereo with Structured Light Idea: Project “structured” light patterns onto the object simplifies the correspondence problem Allows us to use only one camera camera projector Slide credit: Steve Seitz B. Leibe

Microsoft Kinect – How Does It Work? Built-in IR projector IR camera for depth Regular camera for color B. Leibe

Recall: Optical Triangulation 3D Scene point X? Image plane x1 O1 Camera center

Recall: Optical Triangulation Principle: 3D point given by intersection of two rays. Crucial information: point correspondence Most expensive and error-prone step in the pipeline… 3D Scene point X Image plane x1 x2 O1 O2 Camera center

Active Stereo with Structured Light Idea: Replace one camera by a projector. Project “structured” light patterns onto the object Simplifies the correspondence problem 3D Scene point X Image plane x1 x2 O1 O2 Camera center Projector

What the Kinect Sees… B. Leibe

3D Reconstruction with the Kinect B. Leibe

Digital Michelangelo Project Laser Scanning Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ Slide credit: Steve Seitz B. Leibe

The Digital Michelangelo Project, Levoy et al. Laser Scanned Models The Digital Michelangelo Project, Levoy et al. Slide credit: Steve Seitz B. Leibe

The Digital Michelangelo Project, Levoy et al. Laser Scanned Models The Digital Michelangelo Project, Levoy et al. Slide credit: Steve Seitz B. Leibe

The Digital Michelangelo Project, Levoy et al. Laser Scanned Models The Digital Michelangelo Project, Levoy et al. Slide credit: Steve Seitz B. Leibe

The Digital Michelangelo Project, Levoy et al. Laser Scanned Models The Digital Michelangelo Project, Levoy et al. Slide credit: Steve Seitz B. Leibe

The Digital Michelangelo Project, Levoy et al. Laser Scanned Models The Digital Michelangelo Project, Levoy et al. Slide credit: Steve Seitz B. Leibe

Multi-Stripe Triangulation To go faster, project multiple stripes But which stripe is which? Answer #1: assume surface continuity e.g. Eyetronics’ ShapeCam Slide credit: Szymon Rusienkiewicz

Multi-Stripe Triangulation To go faster, project multiple stripes But which stripe is which? Answer #2: colored stripes (or dots) Slide credit: Szymon Rusienkiewicz

Active Stereo with Color Structured Light L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming. 3DPVT 2002 Slide credit: Steve Seitz B. Leibe

Multi-Stripe Triangulation To go faster, project multiple stripes But which stripe is which? Answer #3: time-coded stripes O. Hall-Holt, S. Rusienkiewicz, Stripe Boundary Codes for Real-Time Structured-Light Scanning of Moving Objects, ICCV 2001. Slide credit: Szymon Rusienkiewicz

Time-Coded Light Patterns Assign each stripe a unique illumination code over time [Posdamer 82] Time Space Slide credit: Szymon Rusienkiewicz

Better codes… Gray code Neighbors only differ one bit Slide credit: David Gallup

Bouget and Perona, ICCV’98 Poor Man’s Scanner Bouget and Perona, ICCV’98

Slightly More Elaborate (But Still Cheap) Software freely available from Robotics Institute TU Braunschweig http://www.david-laserscanner.com/ B. Leibe

Recap: Active Stereo with Structured Light Optical triangulation Project a single stripe of laser light Scan it across the surface of the object This is a very precise version of structured light scanning Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ Slide credit: Steve Seitz B. Leibe

Topics of This Lecture Structure from Motion (SfM) Affine SfM Motivation Ambiguity Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations Applications B. Leibe

Structure from Motion Xj x1j Given: m images of n fixed 3D points x3j xij = Pi Xj , i = 1, … , m, j = 1, … , n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij Slide credit: Svetlana Lazebnik B. Leibe

What Can We Use This For? E.g. movie special effects Video B. Leibe Video Credit: Stefan Hafeneger

Structure from Motion Ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same:  It is impossible to recover the absolute scale of the scene! Slide credit: Svetlana Lazebnik B. Leibe

Structure from Motion Ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change Slide credit: Svetlana Lazebnik B. Leibe

Reconstruction Ambiguity: Similarity Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

Reconstruction Ambiguity: Affine Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

Reconstruction Ambiguity: Projective Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

Projective Ambiguity Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

From Projective to Affine Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

From Affine to Similarity Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

Hierarchy of 3D Transformations With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. Projective 15dof Preserves intersection and tangency Affine 12dof Preserves parallellism, volume ratios Similarity 7dof Preserves angles, ratios of length Euclidean 6dof Preserves angles, lengths Slide credit: Svetlana Lazebnik B. Leibe

Topics of This Lecture Structure from Motion (SfM) Affine SfM Motivation Ambiguity Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations Applications B. Leibe

Structure from Motion Let’s start with affine cameras (the math is easier) center at infinity Slide credit: Svetlana Lazebnik B. Leibe Images from Hartley & Zisserman

Orthographic Projection Special case of perspective projection Distance from center of projection to image plane is infinite Projection matrix: Image World Slide credit: Steve Seitz B. Leibe

Affine Cameras Orthographic Projection Parallel Projection Slide credit: Svetlana Lazebnik B. Leibe

Projection of world origin Affine Cameras A general affine camera combines the effects of an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image: Affine projection is a linear mapping + translation in inhomogeneous coordinates x X a1 a2 Projection of world origin Slide credit: Svetlana Lazebnik B. Leibe

Affine Structure from Motion Given: m images of n fixed 3D points: xij = Ai Xj + bi , i = 1,… , m, j = 1, … , n Problem: use the mn correspondences xij to estimate m projection matrices Ai and translation vectors bi, and n points Xj The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity). Thus, we must have 2mn >= 8m + 3n – 12. For two views, we need four point correspondences. Slide credit: Svetlana Lazebnik B. Leibe

Affine Structure from Motion Centering: subtract the centroid of the image points For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points. After centering, each normalized point xij is related to the 3D point Xi by Slide credit: Svetlana Lazebnik B. Leibe

Affine Structure from Motion Let’s create a 2m × n data (measurement) matrix: Cameras (2 m) Points (n) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992. Slide credit: Svetlana Lazebnik B. Leibe

Affine Structure from Motion Let’s create a 2m × n data (measurement) matrix: The measurement matrix D = MS must have rank 3! Points (3 × n) Cameras (2 m × 3) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992. Slide credit: Svetlana Lazebnik B. Leibe

Factorizing the Measurement Matrix Slide credit: Martial Hebert B. Leibe

Factorizing the Measurement Matrix Singular value decomposition of D: Slide credit: Martial Hebert

Factorizing the Measurement Matrix Singular value decomposition of D: Slide credit: Martial Hebert

Factorizing the Measurement Matrix Obtaining a factorization from SVD: Slide credit: Martial Hebert

Factorizing the Measurement Matrix Obtaining a factorization from SVD: This decomposition minimizes |D-MS|2 Slide credit: Martial Hebert

Affine Ambiguity The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations M → MC, S →C-1S. That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axes to be perpendicular, for example). We need a Euclidean upgrade. Slide credit: Martial Hebert B. Leibe

Estimating the Euclidean Upgrade Orthographic assumption: image axes are perpendicular and scale is 1. This can be converted into a system of 3m equations: for the transformation matrix C  goal: estimate C x X a1 a2 a1 · a2 = 0 |a1|2 = |a2|2 = 1 Slide adapted from S. Lazebnik, M. Hebert B. Leibe

Estimating the Euclidean Upgrade System of 3m equations: Let Then this translates to 3m equations in L Solve for L Recover C from L by Cholesky decomposition: L = CCT Update M and S: M = MC, S = C-1S Slide adapted from S. Lazebnik, M. Hebert B. Leibe

Algorithm Summary Given: m images and n features xij For each image i, center the feature coordinates. Construct a 2m × n measurement matrix D: Column j contains the projection of point j in all views Row i contains one coordinate of the projections of all the n points in image i Factorize D: Compute SVD: D = U W VT Create U3 by taking the first 3 columns of U Create V3 by taking the first 3 columns of V Create W3 by taking the upper left 3 × 3 block of W Create the motion and shape matrices: M = U3W3½ and S = W3½ V3T (or M = U3 and S = W3V3T) Eliminate affine ambiguity Slide credit: Martial Hebert

Reconstruction Results C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2):137-154, November 1992. Slide credit: Svetlana Lazebnik B. Leibe Image Source: Tomasi & Kanade

Dealing with Missing Data So far, we have assumed that all points are visible in all views In reality, the measurement matrix typically looks something like this: Cameras Points Slide credit: Svetlana Lazebnik B. Leibe

Dealing with Missing Data Possible solution: decompose matrix into dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement Perform factorization on a dense sub-block F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Slide credit: Svetlana Lazebnik

Dealing with Missing Data Possible solution: decompose matrix into dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Slide credit: Svetlana Lazebnik

Dealing with Missing Data Possible solution: decompose matrix into dense sub-blocks, factorize each sub-block, and fuse the results Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) (3) Solve for a new camera that sees at least three known 3D points (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007. Slide credit: Svetlana Lazebnik

Comments: Affine SfM Affine SfM was historically developed first. It is valid under the assumption of affine cameras. Which does not hold for real physical cameras… …but which is still tolerable if the scene points are far away from the camera. For good results with real cameras, we typically need projective SfM. Harder problem, more ambiguity Math is a bit more involved… (Here, only basic ideas. If you want to implement it, please look at the H&Z book for details). B. Leibe

Topics of This Lecture Structure from Motion (SfM) Affine SfM Motivation Ambiguity Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations Applications B. Leibe

Projective Structure from Motion x1j x2j x3j Xj P1 P2 P3 Given: m images of n fixed 3D points xij = Pi Xj , i = 1, … , m, j = 1, … , n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij Slide credit: Svetlana Lazebnik B. Leibe

Projective Structure from Motion Given: m images of n fixed 3D points zij xij = Pi Xj , i = 1,… , m, j = 1, … , n Problem: estimate m projection matrices Pi and n 3D points Xj from the mn correspondences xij With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ-1 We can solve for structure and motion when 2mn >= 11m +3n – 15 For two cameras, at least 7 points are needed. Slide credit: Svetlana Lazebnik B. Leibe

Projective SfM: Two-Camera Case Assume fundamental matrix F between the two views First camera matrix: [I|0]Q-1 Second camera matrix: [A|b]Q-1 Let , then And So we have b: epipole (FTb = 0), A = –[b×]F Slide adapted from Svetlana Lazebnik B. Leibe F&P sec. 13.3.1

Projective SfM: Two-Camera Case This means that if we can compute the fundamental matrix between two cameras, we can directly estimate the two projection matrices from F. Once we have the projection matrices, we can compute the 3D position of any point X by triangulation. How can we obtain both kinds of information at the same time? B. Leibe

Projective Factorization If we knew the depths z, we could factorize D to estimate M and S. If we knew M and S, we could solve for z. Solution: iterative approach (alternate between above two steps). Points (4 × n) Cameras (3 m × 4) D = MS has rank 4 Slide credit: Svetlana Lazebnik B. Leibe

Sequential Structure from Motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration Points Cameras Slide credit: Svetlana Lazebnik B. Leibe

Sequential Structure from Motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Points Cameras Slide credit: Svetlana Lazebnik B. Leibe

Sequential Structure from Motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view: Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Refine structure and motion: bundle adjustment Points Cameras Slide credit: Svetlana Lazebnik B. Leibe

Bundle Adjustment Non-linear method for refining structure and motion Minimizing mean-square reprojection error x1j x2j x3j Xj P1 P2 P3 P1Xj P2Xj P3Xj Slide credit: Svetlana Lazebnik B. Leibe

Bundle Adjustment Seeks the Maximum Likelihood (ML) solution assuming the measurement noise is Gaussian. It involves adjusting the bundle of rays between each camera center and the set of 3D points. Bundle adjustment should generally be used as the final step of any multi-view reconstruction algorithm. Considerably improves the results. Allows assignment of individual covariances to each measurement. However… It needs a good initialization. It can become an extremely large minimization problem. Very efficient algorithms available. B. Leibe

Projective Ambiguity If we don’t know anything about the camera or the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q. This can already be useful. E.g. we can answer questions like “at what point does a line intersect a plane”? If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade. Need to put in additional knowledge about the camera (calibration) or about the scene (e.g. from markers). Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19) B. Leibe Images from Hartley & Zisserman

Self-Calibration Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images. For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images. Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form Pi = K [Ri | ti]. Can use constraints on the form of the calibration matrix: square pixels, zero skew, fixed focal length, etc. Slide credit: Svetlana Lazebnik B. Leibe

Practical Considerations (1) Large Baseline Small Baseline Role of the baseline Small baseline: large depth error Large baseline: difficult search problem Solution Track features between frames until baseline is sufficient. Slide adapted from Steve Seitz B. Leibe

Practical Considerations (2) There will still be many outliers Incorrect feature matches Moving objects  Apply RANSAC to get robust estimates based on the inlier points. Estimation quality depends on the point configuration Points that are close together in the image produce less stable solutions.  Subdivide image into a grid and try to extract about the same number of features per grid cell. B. Leibe

General Guidelines Use calibrated cameras wherever possible. It makes life so much easier, especially for SfM. SfM with 2 cameras is far more robust than with a single camera. Triangulate feature points in 3D using stereo. Perform 2D-3D matching to recover the motion. More robust to loss of scale (main problem of 1-camera SfM). Any constraint on the setup can be useful E.g. square pixels, zero skew, fixed focal length in each camera E.g. fixed baseline in stereo SfM setup E.g. constrained camera motion on a ground plane Making best use of those constraints may require adapting the algorithms (some known results are described in H&Z). B. Leibe

Structure-from-Motion: Limitations Very difficult to reliably estimate metric SfM unless Large (x or y) motion or Large field-of-view and depth variation Camera calibration important for Euclidean reconstruction Need good feature tracker Slide adapted from Steve Seitz B. Leibe

Topics of This Lecture Structure from Motion (SfM) Affine SfM Motivation Ambiguity Affine SfM Affine cameras Affine factorization Euclidean upgrade Dealing with missing data Projective SfM Two-camera case Projective factorization Bundle adjustment Practical considerations Applications B. Leibe

Commercial Software Packages boujou (http://www.2d3.com/) PFTrack (http://www.thepixelfarm.co.uk/) MatchMover (http://www.realviz.com/) SynthEyes (http://www.ssontech.com/) Icarus (http://aig.cs.man.ac.uk/research/reveal/icarus/) Voodoo Camera Tracker (http://www.digilab.uni-hannover.de/) B. Leibe

boujou demo (We have a license available at i8, so if you want to try it for interesting projects, contact us.) B. Leibe

Applications: Matchmoving Putting virtual objects into real-world videos Original sequence Tracked features SfM results Final video B. Leibe Videos from Stefan Hafeneger

Applications: 3D City Modeling N. Cornelis, K. Cornelis, B. Leibe, L. Van Gool, 3D Urban Scene Modeling Integrating Recognition and Reconstruction, IJCV, Vol. 78(2-3), pp. 121-141, 2008. B. Leibe

Another Example: The Campanile Movie Video from SIGGRAPH’97 Animation Theatre http://www.debevec.org/Campanile/#movie

Applications: Large-Scale SfM from Flickr S. Agarwal, N. Snavely, I. Simon, S.M. Seitz, R. Szeliski, Building Rome in a Day, ICCV’09, 2009. (Video from http://grail.cs.washington.edu/rome/) B. Leibe

References and Further Reading A (relatively short) treatment of affine and projective SfM and the basic ideas and algorithms can be found in Chapters 12 and 13 of More detailed information (if you really want to implement this) and better explanations can be found in Chapters 10, 18 (factorization) and 19 (self-calibration) of D. Forsyth, J. Ponce, Computer Vision – A Modern Approach. Prentice Hall, 2003 R. Hartley, A. Zisserman Multiple View Geometry in Computer Vision 2nd Ed., Cambridge Univ. Press, 2004 B. Leibe