Structure from motion Input: Output: (Tomasi and Kanade)

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

The fundamental matrix F
Lecture 11: Two-view geometry
PCA + SVD.
3D Reconstruction – Factorization Method Seong-Wook Joo KG-VISA 3/10/2004.
Computer vision: models, learning and inference
Two-View Geometry CS Sastry and Yang
N-view factorization and bundle adjustment CMPUT 613.
Camera Calibration. Issues: what are intrinsic parameters of the camera? what is the camera matrix? (intrinsic+extrinsic) General strategy: view calibration.
Camera calibration and epipolar geometry
Structure from motion.
Stanford CS223B Computer Vision, Winter 2005 Lecture 11: Structure From Motion 2 Sebastian Thrun, Stanford Rick Szeliski, Microsoft Hendrik Dahlkamp and.
Stanford CS223B Computer Vision, Winter 2007 Lecture 8 Structure From Motion Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens.
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Stanford CS223B Computer Vision, Winter 2006 Lecture 8 Structure From Motion Professor Sebastian Thrun CAs: Dan Maynes-Aminzade, Mitul Saha, Greg Corrado.
Multiple-view Reconstruction from Points and Lines
3D reconstruction of cameras and structure x i = PX i x’ i = P’X i.
Uncalibrated Epipolar - Calibration
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Structure From Motion Sebastian Thrun, Gary Bradski, Daniel Russakoff
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.
Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Computer vision: models, learning and inference
1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Structure from Motion Computer Vision CS 143, Brown James Hays 11/18/11 Many slides adapted from Derek Hoiem, Lana Lazebnik, Silvio Saverese, Steve Seitz,
CSCE 643 Computer Vision: Structure from Motion
Affine Structure from Motion
Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION.
EECS 274 Computer Vision Affine Structure from Motion.
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Structure from Motion ECE 847: Digital Image Processing
Reconstruction from Two Calibrated Views Two-View Geometry
Geometry Reconstruction March 22, Fundamental Matrix An important problem: Determine the epipolar geometry. That is, the correspondence between.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –
Announcements No midterm Project 3 will be done in pairs same partners as for project 2.
EECS 274 Computer Vision Projective Structure from Motion.
Structure from Motion. For now, static scene and moving cameraFor now, static scene and moving camera – Equivalently, rigidly moving scene and static.
Camera Calibration Course web page: vision.cis.udel.edu/cv March 24, 2003  Lecture 17.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
Reconstruction of a Scene with Multiple Linearly Moving Objects Mei Han and Takeo Kanade CISC 849.
CSE 554 Lecture 8: Alignment
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
Computer vision: models, learning and inference
René Vidal and Xiaodong Fan Center for Imaging Science
Structure from motion II
The Brightness Constraint
Epipolar geometry.
Structure from motion Input: Output: (Tomasi and Kanade)
The Brightness Constraint
Epipolar geometry continued
Advanced Computer Vision
Uncalibrated Geometry & Stratification
George Mason University
Reconstruction.
Noah Snavely.
Two-view geometry.
Multi-view geometry.
Single-view geometry Odilon Redon, Cyclops, 1914.
Structure from motion.
Lecture 15: Structure from motion
Presentation transcript:

Structure from motion Input: Output: (Tomasi and Kanade) a set of point tracks Output: 3D location of each point (shape) camera parameters (motion)

Orthographic SFM: Setup 𝐼 1 , 𝐼 2 ,…, 𝐼 𝑓 : a collection of images (video frames) depicting a rigid scene Orthographic projection (no scale) 𝑝 point tracks in those 𝑓 frames Unknown 3D location: 𝑃 𝑗 =( 𝑋 𝑗 , 𝑌 𝑗 , 𝑍 𝑗 ) 𝑇 ∈ ℝ 3 , 𝑗=1,…,𝑝 Projected locations: denote by ( 𝑥 𝑖𝑗 , 𝑦 𝑖𝑗 ) 𝑇 the location of 𝑃 𝑗 at frame 𝑖, then 𝑥 𝑖𝑗 = 𝒓 𝑖 𝑇 𝑃 𝑗 + 𝑐 𝑖 𝑦 𝑖𝑗 = 𝒔 𝑖 𝑇 𝑃 𝑗 + 𝑑 𝑖 𝒓 𝑖 𝑇 , 𝒔 𝑖 𝑇 are the two top rows of a rotation matrix

Orthographic SFM: Objective Find 𝒓 𝑖 𝒔 𝑖 ∈ ℝ 3 and 𝑐 𝑖 , 𝑑 𝑖 ∈ℝ that minimize 𝑖=1 𝑓 𝑗=1 𝑝 ( 𝒓 𝑖 𝑇 𝑃 𝑗 + 𝑐 𝑖 )− 𝑥 𝑖𝑗 2 + ( 𝒔 𝑖 𝑇 𝑃 𝑗 + 𝑑 𝑖 )− 𝑦 𝑖𝑗 2 Subject to 𝒓 𝑖 = 𝒔 𝑖 =1 𝒓 𝑖 𝑇 𝒔 𝑖 =0

Eliminate translation We can eliminate translation by representing the location of each point relative to the centroids of all 𝑝 points: Assume without loss of generality that the centroid of 𝑃 1 ,…, 𝑃 𝑝 coincides with the origin 𝟎∈ ℝ 3 Translate each image point by setting 𝑥 𝑖𝑗 = 𝑥 𝑖𝑗 − 𝑥 𝑖 𝑦 𝑖𝑗 = 𝑦 𝑖𝑗 − 𝑦 𝑖 ( 𝑥 𝑖 , 𝑦 𝑖 ) denotes the centroid of ( 𝑥 𝑖𝑗 , 𝑦 𝑖𝑗 )

Objective (w/o translation) Find 𝒓 𝑖 𝒔 𝑖 ∈ ℝ 3 that minimize 𝑖=1 𝑓 𝑗=1 𝑝 𝒓 𝑖 𝑇 𝑃 𝑗 − 𝑥 𝑖𝑗 2 + 𝒔 𝑖 𝑇 𝑃 𝑗 − 𝑦 𝑖𝑗 2 Subject to 𝒓 𝑖 = 𝒔 𝑖 =1 𝒓 𝑖 𝑇 𝒔 𝑖 =0

Measurement matrix 𝑀= 𝑥 11 𝑥 12 . … 𝑥 𝑓1 𝑥 𝑓2 . . . 𝑥 1𝑝 … . . 𝑥 𝑓𝑝 𝑦 11 𝑦 12 . .. 𝑦 𝑓1 𝑦 𝑓2 . . . 𝑦 1𝑝 … . . 𝑦 𝑓𝑝 2𝑓×𝑝

Transformation and shape matrices 𝑇= 𝒓 1 𝑇 … 𝒓 𝑓 𝑇 𝒔 1 𝑇 … 𝒔 𝑓 𝑇 = 𝑟 11 𝑟 12 𝑟 13 … … 𝑟 𝑓1 𝑟 𝑓2 𝑟 𝑓3 𝑠 11 𝑠 12 𝑠 13 … … 𝑠 𝑓1 𝑠 𝑓2 𝑠 𝑓3 2𝑓×3 𝑆= 𝑋 1 𝑋 2 𝑌 1 𝑌 2 . 𝑍 1 𝑍 2 𝑋 𝑝 . . 𝑌 𝑝 𝑍 𝑝 3×𝑝

Objective: matrix notation Find 𝑇 and 𝑆 that minimize 𝑀−𝑇𝑆 𝐹 Subject to 𝒓 𝑖 = 𝒔 𝑖 =1 𝒓 𝑖 𝑇 𝒔 𝑖 =0 𝑀 is 2𝑓×𝑝, 𝑇 is 2𝑓×3, 𝑆 is 3×𝑝

𝑀=𝑇𝑆+Noise 𝑥 11 𝑥 12 . … 𝑥 𝑓1 𝑥 𝑓2 . . . 𝑥 1𝑝 … . . 𝑥 𝑓𝑝 𝑦 11 𝑦 12 . .. 𝑦 𝑓1 𝑦 𝑓2 . . . 𝑦 1𝑝 … . . 𝑦 𝑓𝑝 2𝑓×𝑝 = 𝑟 11 𝑟 12 𝑟 13 … … 𝑟 𝑓1 𝑟 𝑓2 𝑟 𝑓3 𝑠 11 𝑠 12 𝑠 13 … … 𝑠 𝑓1 𝑠 𝑓2 𝑠 𝑓3 2𝑓×3 𝑋 1 … 𝑋 𝑝 𝑌 1 𝑌 𝑝 𝑍 1 … 𝑍 𝑝 3×𝑝 +Noise

TK-Factorization 𝑀=𝑇𝑆+Noise Step 1: find rank 3 approximation to 𝑀 using SVD 𝑀=𝑈Σ 𝑉 𝑇 where 𝑈 is 2𝑓×2𝑓, 𝑈 𝑇 𝑈=𝐼, Σ=𝑑𝑖𝑎𝑔( 𝜎 1 , 𝜎 2 ,…), size 2𝑓×𝑝, and 𝜎 1 ≥ 𝜎 2 ≥…≥0 𝑉 is 𝑝×𝑝, 𝑉 𝑇 𝑉=𝐼

TK-Factorization 𝑀 =𝑈 Σ 3 𝑉 𝑇 𝑀 =𝑈 Σ 3 𝑉 𝑇 where Σ 3 =𝑑𝑖𝑎𝑔( 𝜎 1 , 𝜎 2 , 𝜎 3 ,0, 0,…) Note: this is a relaxation, only noise components outside the 3D space are annihilated Step 2: factorization 𝑇 =𝑈 Σ 3 𝑆 = Σ 3 𝑉 𝑇 Ambiguity: 𝑀 =( 𝑇 𝐴)( 𝐴 −1 𝑆 ) for any non-singular, 3×3 matrix 𝐴

TK-Factorization Step 3: resolve ambiguity 𝒓 𝑖 = 𝒔 𝑖 =1 𝒓 𝑖 𝑇 𝒔 𝑖 =0 Let 𝑅 𝑖 = 𝒓 𝑖 𝑇 𝒔 𝑖 𝑇 2×3 , note that 𝑅 𝑖 𝑅 𝑖 𝑇 =𝐼 Let 𝑇 𝑖 = 𝒓 𝑖 𝑇 𝒔 𝑖 𝑇 2×3 be the corresponding rows in 𝑇 , then 𝑅 𝑖 = 𝑇 𝑖 𝐴 Find a 3×3 symmetric matrix 𝐴 𝐴 𝑇 𝑇 𝑖 𝐴 𝐴 𝑇 𝑇 𝑖 𝑇 = 𝑅 𝑖 𝑅 𝑖 𝑇 =𝐼

TK-Factorization 𝑇 𝑖 𝐴 𝐴 𝑇 𝑇 𝑖 𝑇 = 𝑅 𝑖 𝑅 𝑖 𝑇 =𝐼 𝑇 𝑖 𝐴 𝐴 𝑇 𝑇 𝑖 𝑇 = 𝑅 𝑖 𝑅 𝑖 𝑇 =𝐼 Equation is linear in 𝐴 𝐴 𝑇 There are 3𝑓 equations in 6 unknowns Find 𝐴 by eigen-decomposition 𝐴 𝐴 𝑇 =𝑊∆ 𝑊 𝑇 so that 𝐴=𝑊 ∆ Solution is obtained up to a rotation ambiguity 𝑇 𝑖 (𝐴𝐵)( 𝐵 𝑇 𝐴 𝑇 ) 𝑇 𝑖 𝑇 such that 𝐵 𝐵 𝑇 =𝐼

TK-Factorization: Summary Eliminate translation, construct 𝑀 𝑆𝑉𝐷(𝑀) to get rank 3 𝑀 and factorize 𝑀 = 𝑇 𝑆 (3×3 ambiguity 𝐴 remains) Resolve ambiguity: estimate 𝐴 𝐴 𝑇 from orthonormality and factorize to obtain 𝐴 Solution up to rotation and reflection

Incomplete tracks Tracks are often incomplete – Factorization with missing data Rank is difficult to enforce Surrogate: minimize the nuclear norm – sum of singular values, 𝜎 1 + 𝜎 2 + 𝜎 3 +… Nuclear norm is convex, minimization often achieves low rank Accurate reconstruction usually requires accounting for perspective distortion

Perspective projection A point 𝑃=(𝑋,𝑌,𝑍) is projected to 𝑥= 𝑓𝑋 𝑍 𝑦= 𝑓𝑌 𝑍 A point rotated by 𝑅 and translated by 𝒕 projects to 𝑥= 𝑓( 𝒓 1 𝑇 𝑃+ 𝑡 𝑥 ) 𝒓 3 𝑇 𝑃+ 𝑡 𝑧 𝑦= 𝑓( 𝒓 2 𝑇 𝑃+ 𝑡 𝑦 ) 𝒓 3 𝑇 𝑃+ 𝑡 𝑧 𝒓 𝑖 𝑇 denotes the rows of 𝑅 We call 𝐶=𝐾[𝑅,𝒕] 3×4 a camera matrix 𝐾 calibration matrix, 𝑅 camera orientation, 𝒕 camera location

Bundle adjustment Given 𝑝 points in 𝑓 frames, (𝑥 𝑖𝑗 , 𝑦 𝑖𝑗 ), find camera matrices 𝐶 𝑖 and positions 𝑃 𝑗 (𝑗=1,…,𝑝) that minimize 𝑖=1 𝑓 𝑗=1 𝑝 𝑓 ( 𝒓 𝑖1 𝑇 𝑃 𝑗 + 𝑡 𝑥 ) 𝒓 𝑖3 𝑇 𝑃 𝑗 + 𝑡 𝑧 − 𝑥 𝑖𝑗 2 + 𝑓 (𝒓 𝑖2 𝑇 𝑃 𝑗 + 𝑡 𝑦 ) 𝒓 𝑖3 𝑇 𝑃 𝑗 + 𝑡 𝑧 − 𝑦 𝑖𝑗 2 Alternate optimization Given 𝑅 𝑖 and 𝒕 𝒊 , solve for 𝑃 𝑗 Given 𝑃 𝑗 solve for 𝑅 𝑖 and 𝒕 𝒊 Very good initial guess is required

Bundler (photo-tourism) (Snavely et al.)

Bundler (photo-tourism) Given images, identify feature points, describe them with SIFTs Match SIFTs, accept each match 𝑝 𝑖 ↔ 𝑝 𝑗 whose score is at least twice of any other match 𝑝 𝑖 ↔ 𝑝 𝑘 For every pair of images with sufficiently many matches use RANSAC to recover Essential matrices Starting with two images and adding one image at a time: use essential matrix to recover depth and apply bundle adjustment

Simultaneous solutions 𝐸 𝑖𝑗 : Essential matrix between 𝐼 𝑖 and 𝐼 𝑗 , 𝑖,𝑗=1,…,𝑓 𝐸 𝑖𝑗 = 𝒕 𝑖𝑗 × 𝑅 𝑖𝑗 (on a subset of image pairs) Objective: recover camera orientation 𝑅 𝑖 and location 𝒕 𝑖 relative to a global coordinate system min 𝑅 𝑖 𝑅 𝑖𝑗 − 𝑅 𝑖 𝑅 𝑗 𝑇 𝐹 This can be solved in various ways, for example min 𝑅 𝑖 𝑅 𝑖𝑗 𝑅 𝑗 − 𝑅 𝑖 𝐹 : least squares solution if we ignore the orthonormality constraints for 𝑅 𝑖

Essential in global coordinates Corresponding points, 𝑝 and 𝑞, satisfy the following relation 𝑝 𝑇 𝑅 𝑖 𝑇 𝒕 𝑖 × − 𝒕 𝑗 × 𝑅 𝑗 𝑞=0 This generalizes the formula for the essential matrix (plug in 𝑅 𝑖 =𝐼, 𝒕 𝑖 =𝟎) Once camera orientations 𝑅 𝑖 are known we can solve for camera locations Solution suffers from shrinkage problems

Reconstruction example