Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Computer Vision – Lecture 17 Structure-from-Motion 21.01.2009 Bastian Leibe RWTH Aachen.

Slides:

Advertisements

Similar presentations

Structure from motion.

Advertisements

The fundamental matrix F

3D reconstruction.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Computer vision: models, learning and inference

Two-View Geometry CS Sastry and Yang

Jan-Michael Frahm, Enrique Dunn Spring 2012

Two-view geometry.

Lecture 8: Stereo.

Lecture 11: Structure from Motion

Camera calibration and epipolar geometry

Structure from motion.

Algorithms & Applications in Computer Vision

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Uncalibrated Geometry & Stratification Sastry and Yang

Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.

Multiple-view Reconstruction from Points and Lines

Many slides and illustrations from J. Ponce

Stereo and Structure from Motion

Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.

Single-view geometry Odilon Redon, Cyclops, 1914.

CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.

3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.

Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/12/11 Many slides adapted from Lana Lazebnik,

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Epipolar Geometry and Stereo Vision Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/05/15 Many slides adapted from Lana Lazebnik,

Computer vision: models, learning and inference

Epipolar Geometry and Stereo Vision Computer Vision ECE 5554 Virginia Tech Devi Parikh 10/15/13 These slides have been borrowed from Derek Hoiem and Kristen.

Camera Geometry and Calibration Thanks to Martial Hebert.

Multi-view geometry.

Epipolar geometry The fundamental matrix and the tensor

Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –

Homogeneous Coordinates (Projective Space) Let be a point in Euclidean space Change to homogeneous coordinates: Defined up to scale: Can go back to non-homogeneous.

Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.

Structure from Motion Computer Vision CS 143, Brown James Hays 11/18/11 Many slides adapted from Derek Hoiem, Lana Lazebnik, Silvio Saverese, Steve Seitz,

Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/06/12 Many slides adapted from Lana Lazebnik, Silvio Saverese,

Computer Vision – Lecture 16

Structure from Motion Course web page: vision.cis.udel.edu/~cv April 25, 2003  Lecture 26.

CSCE 643 Computer Vision: Structure from Motion

Computer Vision – Lecture 17

Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:

Stereo Course web page: vision.cis.udel.edu/~cv April 11, 2003  Lecture 21.

Affine Structure from Motion

Single-view geometry Odilon Redon, Cyclops, 1914.

Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.

EECS 274 Computer Vision Affine Structure from Motion.

Computer vision: models, learning and inference M Ahad Multiple Cameras

776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.

Structure from Motion ECE 847: Digital Image Processing

Single-view geometry Odilon Redon, Cyclops, 1914.

Uncalibrated reconstruction Calibration with a rig Uncalibrated epipolar geometry Ambiguities in image formation Stratified reconstruction Autocalibration.

Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –

EECS 274 Computer Vision Projective Structure from Motion.

Camera Calibration Course web page: vision.cis.udel.edu/cv March 24, 2003  Lecture 17.

CSE 185 Introduction to Computer Vision Stereo 2.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

Parameter estimation class 5

Two-view geometry Computer Vision Spring 2018, Lecture 10

Epipolar geometry.

Structure from motion Input: Output: (Tomasi and Kanade)

Uncalibrated Geometry & Stratification

Two-view geometry.

Two-view geometry.

Multi-view geometry.

Single-view geometry Odilon Redon, Cyclops, 1914.

Structure from motion.

Structure from motion Input: Output: (Tomasi and Kanade)

Presentation transcript:

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Computer Vision – Lecture 17 Structure-from-Motion Bastian Leibe RWTH Aachen TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A A A Many slides adapted from Svetlana Lazebnik, Martial Hebert, Steve Seitz

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Course Outline Image Processing Basics Segmentation & Grouping Object Recognition Local Features & Matching Object Categorization 3D Reconstruction  Epipolar Geometry and Stereo Basics  Camera calibration & Uncalibrated Reconstruction  Structure-from-Motion Motion and Tracking 2

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: A General Point Equations of the form How do we solve them? (always!)  Apply SVD  Singular values of A = square roots of the eigenvalues of A T A.  The solution of Ax=0 is the nullspace vector of A.  This corresponds to the smallest singular vector of A. 3 SVD Singular valuesSingular vectors B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Properties of SVD Frobenius norm  Generalization of the Euclidean norm to matrices Partial reconstruction property of SVD  Let  i i=1,…,N be the singular values of A.  Let A p = U p D p V p T be the reconstruction of A when we set  p+1,…,  N to zero.  Then A p = U p D p V p T is the best rank-p approximation of A in the sense of the Frobenius norm (i.e. the best least-squares approximation). 4 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Camera Parameters Intrinsic parameters  Principal point coordinates  Focal length  Pixel magnification factors  Skew (non-rectangular pixels)  Radial distortion Extrinsic parameters  Rotation R  Translation t (both relative to world coordinate system) Camera projection matrix  General pinhole camera: 9 DoF  CCD Camera with square pixels: 10 DoF  General camera: 11 DoF 5 B. Leibe ss

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Calibrating a Camera Goal Compute intrinsic and extrinsic parameters using observed camera data. Main idea Place “calibration object” with known geometry in the scene Get correspondences Solve for mapping from scene to image: estimate P=P int P ext 6 B. Leibe Slide credit: Kristen Grauman XiXi xixi

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Camera Calibration (DLT Algorithm) P has 11 degrees of freedom. Two linearly independent equations per independent 2D/3D correspondence. Solve with SVD (similar to homography estimation)  Solution corresponds to smallest singular vector. 5 ½ correspondences needed for a minimal solution. 7 B. Leibe Slide adapted from Svetlana Lazebnik Solve with …?

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Two independent equations each in terms of three unknown entries of X. Stack equations and solve with SVD. This approach nicely generalizes to multiple cameras. Recap: Triangulation – Linear Algebraic Approach 8 B. Leibe Slide credit: Svetlana Lazebnik O1O1 O2O2 x1x1 x2x2 X? R1R1 R2R2 Stack equations and solve with …?

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Epipolar Geometry – Calibrated Case 9 B. Leibe X xx’ Camera matrix: [I|0] X = (u, v, w, 1) T x = (u, v, w) T Camera matrix: [R T | –R T t] Vector x’ in second coord. system has coordinates Rx’ in the first one. t The vectors x, t, and Rx’ are coplanar R Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Epipolar Geometry – Calibrated Case 10 B. Leibe X xx’ Slide credit: Svetlana Lazebnik Essential Matrix (Longuet-Higgins, 1981)

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Epipolar Geometry – Uncalibrated Case The calibration matrices K and K’ of the two cameras are unknown We can write the epipolar constraint in terms of unknown normalized coordinates: 11 B. Leibe X xx’ Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Epipolar Geometry – Uncalibrated Case 12 B. Leibe X xx’ Slide credit: Svetlana Lazebnik Fundamental Matrix (Faugeras and Luong, 1992)

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Problem: poor numerical conditioning Recap: The Eight-Point Algorithm 13 B. Leibe x = (u, v, 1) T, x’ = (u’, v’, 1) T Minimize: under the constraint |F| 2 = 1 Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Normalized Eight-Point Algorithm 1. Center the image data at the origin, and scale it so the mean squared distance between the origin and the data points is 2 pixels. 2. Use the eight-point algorithm to compute F from the normalized points. 3. Enforce the rank-2 constraint using SVD. 4. Transform fundamental matrix back to original units: if T and T’ are the normalizing transformations in the two images, than the fundamental matrix in original coordinates is T T F T’. 14 B. Leibe [Hartley, 1995] Slide credit: Svetlana Lazebnik SVD Set d 33 to zero and reconstruct F

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Comparison of Estimation Algorithms 15 B. Leibe 8-pointNormalized 8-pointNonlinear least squares Av. Dist pixels0.92 pixel0.86 pixel Av. Dist pixels0.85 pixel0.80 pixel Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Epipolar Transfer Assume the epipolar geometry is known Given projections of the same point in two images, how can we compute the projection of that point in a third image? 16 B. Leibe x1x1 x2x2 x3x3 l 32 l 31 l 31 = F T 13 x 1 l 32 = F T 23 x 2 Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Recap: Active Stereo with Structured Light Optical triangulation  Project a single stripe of laser light  Scan it across the surface of the object  This is a very precise version of structured light scanning 17 B. Leibe Digital Michelangelo Project Slide credit: Steve Seitz

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Topics of This Lecture Structure from Motion (SfM)  Motivation  Ambiguity Affine SfM  Affine cameras  Affine factorization  Euclidean upgrade  Dealing with missing data Projective SfM  Two-camera case  Projective factorization  Bundle adjustment  Practical considerations Applications 18 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Structure from Motion Given: m images of n fixed 3D points x ij = P i X j, i = 1, …, m, j = 1, …, n Problem: estimate m projection matrices P i and n 3D points X j from the mn correspondences x ij 19 x1jx1j x2jx2j x3jx3j XjXj P1P1 P2P2 P3P3 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 What Can We Use This For? 20 B. Leibe E.g. movie special effects Video Video Credit: Stefan Hafeneger

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Structure from Motion Ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same:  It is impossible to recover the absolute scale of the scene! 21 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Structure from Motion Ambiguity If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/k, the projections of the scene points in the image remain exactly the same. More generally: if we transform the scene using a transformation Q and apply the inverse transformation to the camera matrices, then the images do not change 22 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Reconstruction Ambiguity: Similarity 23 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Reconstruction Ambiguity: Affine 24 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Reconstruction Ambiguity: Projective 25 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective Ambiguity 26 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 From Projective to Affine 27 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 From Affine to Similarity 28 B. Leibe Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Hierarchy of 3D Transformations With no constraints on the camera calibration matrix or on the scene, we get a projective reconstruction. Need additional information to upgrade the reconstruction to affine, similarity, or Euclidean. 29 B. Leibe Projective 15dof Affine 12dof Similarity 7dof Euclidean 6dof Preserves intersection and tangency Preserves parallellism, volume ratios Preserves angles, ratios of length Preserves angles, lengths Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Topics of This Lecture Structure from Motion (SfM)  Motivation  Ambiguity Affine SfM  Affine cameras  Affine factorization  Euclidean upgrade  Dealing with missing data Projective SfM  Two-camera case  Projective factorization  Bundle adjustment  Practical considerations Applications 30 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Structure from Motion Let’s start with affine cameras (the math is easier) 31 B. Leibe center at infinity Slide credit: Svetlana Lazebnik Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Orthographic Projection Special case of perspective projection  Distance from center of projection to image plane is infinite  Projection matrix: 32 B. Leibe Slide credit: Steve Seitz Image World

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Cameras 33 B. Leibe Orthographic Projection Parallel Projection Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Cameras A general affine camera combines the effects of an affine transformation of the 3D space, orthographic projection, and an affine transformation of the image: Affine projection is a linear mapping + translation in inhomogeneous coordinates 34 B. Leibe x X a1a1 a2a2 Projection of world origin Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Structure from Motion Given: m images of n fixed 3D points: x ij = A i X j + b i, i = 1,…, m, j = 1, …, n Problem: use the mn correspondences x ij to estimate m projection matrices A i and translation vectors b i, and n points X j The reconstruction is defined up to an arbitrary affine transformation Q (12 degrees of freedom): We have 2mn knowns and 8m + 3n unknowns (minus 12 dof for affine ambiguity).  Thus, we must have 2mn >= 8m + 3n – 12.  For two views, we need four point correspondences. 35 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Structure from Motion Centering: subtract the centroid of the image points For simplicity, assume that the origin of the world coordinate system is at the centroid of the 3D points. After centering, each normalized point x ij is related to the 3D point X i by 36 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Structure from Motion Let’s create a 2m × n data (measurement) matrix: 37 B. Leibe Cameras (2 m) Points (n) C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method. Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Structure from Motion Let’s create a 2m × n data (measurement) matrix: The measurement matrix D = MS must have rank 3! 38 B. Leibe C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method. Slide credit: Svetlana Lazebnik Cameras (2 m × 3) Points (3 × n)

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Factorizing the Measurement Matrix 39 B. Leibe Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Factorizing the Measurement Matrix Singular value decomposition of D: 40 Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Factorizing the Measurement Matrix Singular value decomposition of D: 41 Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Factorizing the Measurement Matrix Obtaining a factorization from SVD: 42 Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Factorizing the Measurement Matrix Obtaining a factorization from SVD: 43 Slide credit: Martial Hebert This decomposition minimizes |D-MS| 2

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Affine Ambiguity The decomposition is not unique. We get the same D by using any 3×3 matrix C and applying the transformations M → MC, S → C -1 S. That is because we have only an affine transformation and we have not enforced any Euclidean constraints (like forcing the image axis to be perpendicular, for example). We need a Euclidean upgrade. 44 B. Leibe Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Estimating the Euclidean Upgrade Orthographic assumption: image axes are perpendicular and scale is 1. This can be converted into a system of 3m equations: 45 B. Leibe x X a1a1 a2a2 a 1 · a 2 = 0 |a 1 | 2 = |a 2 | 2 = 1 Slide adapted from S. Lazebnik, M. Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Estimating the Euclidean Upgrade This can be converted into a system of 3m equations: Let Then this translates to 3m equations in L  Solve for L  Recover C from L by Cholesky decomposition: L = CC T  Update M and S: M = MC, S = C -1 S 46 B. Leibe Slide adapted from S. Lazebnik, M. Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Algorithm Summary Given: m images and n features x ij For each image i, center the feature coordinates. Construct a 2m × n measurement matrix D:  Column j contains the projection of point j in all views  Row i contains one coordinate of the projections of all the n points in image i Factorize D:  Compute SVD: D = U W V T  Create U 3 by taking the first 3 columns of U  Create V 3 by taking the first 3 columns of V  Create W 3 by taking the upper left 3 × 3 block of W Create the motion and shape matrices:  M = U 3 W 3 ½ and S = W 3 ½ V 3 T (or M = U 3 and S = W 3 V 3 T ) Eliminate affine ambiguity 47 Slide credit: Martial Hebert

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Reconstruction Results 48 B. Leibe C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV, 9(2): , November 1992.Shape and motion from image streams under orthography: A factorization method. Slide credit: Svetlana Lazebnik Image Source: Tomasi & Kanade

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Dealing with Missing Data So far, we have assumed that all points are visible in all views In reality, the measurement matrix typically looks something like this: 49 B. Leibe Cameras Points Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Dealing with Missing Data Possible solution: decompose matrix into dense sub- blocks, factorize each sub-block, and fuse the results  Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement 50 (1)Perform factorization on a dense sub-block F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Dealing with Missing Data Possible solution: decompose matrix into dense sub- blocks, factorize each sub-block, and fuse the results  Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement 51 (1)Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Dealing with Missing Data Possible solution: decompose matrix into dense sub- blocks, factorize each sub-block, and fuse the results  Finding dense maximal sub-blocks of the matrix is NP-complete (equivalent to finding maximal cliques in a graph) Incremental bilinear refinement 52 (1)Perform factorization on a dense sub-block (2) Solve for a new 3D point visible by at least two known cameras (linear least squares) (3) Solve for a new camera that sees at least three known 3D points (linear least squares) F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. PAMI 2007.Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects. Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Comments: Affine SfM Affine SfM was historically developed first. It is valid under the assumption of affine cameras.  Which does not hold for real physical cameras…  …but which is still tolerable if the scene points are far away from the camera. For good results with real cameras, we typically need projective SfM.  Harder problem, more ambiguity  Math is a bit more involved… (Here, only basic ideas. If you want to implement it, please look at the H&Z book for details). 53 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Topics of This Lecture Structure from Motion (SfM)  Motivation  Ambiguity Affine SfM  Affine cameras  Affine factorization  Euclidean upgrade  Dealing with missing data Projective SfM  Two-camera case  Projective factorization  Bundle adjustment  Practical considerations Applications 54 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective Structure from Motion Given: m images of n fixed 3D points x ij = P i X j, i = 1, …, m, j = 1, …, n Problem: estimate m projection matrices P i and n 3D points X j from the mn correspondences x ij 55 x1jx1j x2jx2j x3jx3j XjXj P1P1 P2P2 P3P3 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective Structure from Motion Given: m images of n fixed 3D points z ij x ij = P i X j, i = 1,…, m, j = 1, …, n Problem: estimate m projection matrices P i and n 3D points X j from the mn correspondences x ij With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ -1 We can solve for structure and motion when 2mn >= 11m +3n – 15 For two cameras, at least 7 points are needed. 56 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective SfM: Two-Camera Case Assume fundamental matrix F between the two views  First camera matrix: [I|0]Q -1  Second camera matrix: [A|b]Q -1 Let, then And So we have 57 B. Leibe b : epipole ( F T b = 0 ), A = –[b × ]F Slide adapted from Svetlana Lazebnik F&P sec

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective SfM: Two-Camera Case This means that if we can compute the fundamental matrix between two cameras, we can directly estimate the two projection matrices from F. Once we have the projection matrices, we can compute the 3D position of any point X by triangulation. How can we obtain both kinds of information at the same time? 58 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective Factorization If we knew the depths z, we could factorize D to estimate M and S. If we knew M and S, we could solve for z. Solution: iterative approach (alternate between above two steps). 59 B. Leibe Cameras (3 m × 4) Points (4 × n) D = MS has rank 4 Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Sequential Structure from Motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view:  Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration 60 B. Leibe Cameras Points Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Sequential Structure from Motion Initialize motion from two images using fundamental matrix Initialize structure For each additional view:  Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration  Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation 61 B. Leibe Cameras Points Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Initialize motion from two images using fundamental matrix Initialize structure For each additional view:  Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration  Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Refine structure and motion: bundle adjustment Sequential Structure from Motion 62 B. Leibe Cameras Points Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Bundle Adjustment Non-linear method for refining structure and motion Minimizing mean-square reprojection error 63 B. Leibe x1jx1j x2jx2j x3jx3j XjXj P1P1 P2P2 P3P3 P1XjP1Xj P2XjP2Xj P3XjP3Xj Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Bundle Adjustment Seeks the Maximum Likelihood (ML) solution assuming the measurement noise is Gaussian. It involves adjusting the bundle of rays between each camera center and the set of 3D points. Bundle adjustment should generally be used as the final step of any multi-view reconstruction algorithm.  Considerably improves the results.  Allows assignment of individual covariances to each measurement. However…  It needs a good initialization.  It can become an extremely large minimization problem. Very efficient algorithms available. 64 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Projective Ambiguity If we don’t know anything about the camera or the scene, the best we can get with this is a reconstruction up to a projective ambiguity Q.  This can already be useful.  E.g. we can answer questions like “at what point does a line intersect a plane”? If we want to convert this to a “true” reconstruction, we need a Euclidean upgrade.  Need to put in additional knowledge about the camera (calibration) or about the scene (e.g. from markers).  Several methods available (see F&P Chapter 13.5 or H&Z Chapter 19) 65 B. Leibe Images from Hartley & Zisserman

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Self-Calibration Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images. For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images.  Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form P i = K [R i | t i ]. Can use constraints on the form of the calibration matrix: square pixels, zero skew, fixed focal length, etc. 66 B. Leibe Slide credit: Svetlana Lazebnik

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Practical Considerations (1) 1. Role of the baseline  Small baseline: large depth error  Large baseline: difficult search problem Solution  Track features between frames until baseline is sufficient. 67 B. Leibe Large BaselineSmall Baseline Slide adapted from Steve Seitz

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Practical Considerations (2) 2. There will still be many outliers  Incorrect feature matches  Moving objects  Apply RANSAC to get robust estimates based on the inlier points. 3. Estimation quality depends on the point configuration  Points that are close together in the image produce less stable solutions.  Subdivide image into a grid and try to extract about the same number of features per grid cell. 68 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 General Guidelines Use calibrated cameras wherever possible.  It makes life so much easier, especially for SfM. SfM with 2 cameras is far more robust than with a single camera.  Triangulate feature points in 3D using stereo.  Perform 2D-3D matching to recover the motion.  More robust to loss of scale (main problem of 1-camera SfM). Any constraint on the setup can be useful  E.g. square pixels, zero skew, fixed focal length in each camera  E.g. fixed baseline in stereo SfM setup  E.g. constrained camera motion on a ground plane  Making best use of those constraints may require adapting the algorithms (some known results are described in H&Z). 69 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Topics of This Lecture Structure from Motion (SfM)  Motivation  Ambiguity Affine SfM  Affine cameras  Affine factorization  Euclidean upgrade  Dealing with missing data Projective SfM  Two-camera case  Projective factorization  Bundle adjustment  Practical considerations Applications 70 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Commercial Software Packages boujou ( PFTrack ( MatchMover ( SynthEyes ( Icarus ( Voodoo Camera Tracker ( 71 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 boujou demo (We have a license available, so if you want to try it for interesting projects, contact us.) 72 B. Leibe

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Applications: Matchmoving 73 B. Leibe Putting virtual objects into real-world videos Original sequenceTracked features SfM resultsFinal video Videos from Stefan Hafeneger

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 Another Example: The Campanile Movie 74 VideoVideo from SIGGRAPH’97 Animation Theatre

Perceptual and Sensory Augmented Computing Computer Vision WS 08/09 References and Further Reading A (relatively short) treatment of affine and projective SfM and the basic ideas and algorithms can be found in Chapters 12 and 13 of More detailed information (if you really want to implement this) and better explanations can be found in Chapters 10, 18 (factorization) and 19 (self-calibration) of B. Leibe 75 D. Forsyth, J. Ponce, Computer Vision – A Modern Approach. Prentice Hall, 2003 R. Hartley, A. Zisserman Multiple View Geometry in Computer Vision 2nd Ed., Cambridge Univ. Press, 2004