Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.

Slides:



Advertisements
Similar presentations
Structure from motion.
Advertisements

The fundamental matrix F
Lecture 11: Two-view geometry
More Mosaic Madness : Computational Photography Alexei Efros, CMU, Fall 2011 © Jeffrey Martin (jeffrey-martin.com) with a lot of slides stolen from.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Two-View Geometry CS Sastry and Yang
Jan-Michael Frahm, Enrique Dunn Spring 2012
Two-view geometry.
Dr. Hassan Foroosh Dept. of Computer Science UCF
Camera calibration and epipolar geometry
Structure from motion.
Lecture 23: Structure from motion and multi-view stereo
Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.
Global Alignment and Structure from Motion
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Lecture 21: Multiple-view geometry and structure from motion
Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
Lecture 16: Single-view modeling, Part 2 CS6670: Computer Vision Noah Snavely.
CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague
Single-view geometry Odilon Redon, Cyclops, 1914.
Panoramas and Calibration : Rendering and Image Processing Alexei Efros …with a lot of slides stolen from Steve Seitz and Rick Szeliski.
Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.
CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.
Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.
Multiple View Geometry. THE GEOMETRY OF MULTIPLE VIEWS Reading: Chapter 10. Epipolar Geometry The Essential Matrix The Fundamental Matrix The Trifocal.
CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.
Mosaics CSE 455, Winter 2010 February 8, 2010 Neel Joshi, CSE 455, Winter Announcements  The Midterm went out Friday  See to the class.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Automatic Camera Calibration
Computer vision: models, learning and inference
Image Stitching Ali Farhadi CSE 455
Multi-view geometry.
Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –
Image stitching Digital Visual Effects Yung-Yu Chuang with slides by Richard Szeliski, Steve Seitz, Matthew Brown and Vaclav Hlavac.
CSCE 643 Computer Vision: Structure from Motion
Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:
Geometric Camera Models
Announcements Project 3 due Thursday by 11:59pm Demos on Friday; signup on CMS Prelim to be distributed in class Friday, due Wednesday by the beginning.
Geometry of Multiple Views
Single-view geometry Odilon Redon, Cyclops, 1914.
Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.
Feature Matching. Feature Space Outlier Rejection.
COS429 Computer Vision =++ Assignment 4 Cloning Yourself.
Computer vision: models, learning and inference M Ahad Multiple Cameras
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Reconstruction from Two Calibrated Views Two-View Geometry
Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –
Announcements No midterm Project 3 will be done in pairs same partners as for project 2.
EECS 274 Computer Vision Projective Structure from Motion.
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
Lec 26: Fundamental Matrix CS4670 / 5670: Computer Vision Kavita Bala.
CSE 185 Introduction to Computer Vision Stereo 2.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
CS4670 / 5670: Computer Vision Kavita Bala Lecture 20: Panoramas.
Epipolar geometry.
Modeling the world with photos
Structure from motion Input: Output: (Tomasi and Kanade)
Idea: projecting images onto a common plane
Noah Snavely.
Two-view geometry.
Two-view geometry.
Multi-view geometry.
Structure from motion.
Calibration and homographies
Structure from motion Input: Output: (Tomasi and Kanade)
Lecture 15: Structure from motion
Presentation transcript:

Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely

Readings Snavely, Seitz, Szeliski, Photo Tourism: Exploring Photo Collections in 3D. SIGGRAPH Supplementary reading: Szeliski and Kang. Recovering 3D shape and motion from image streams using non-linear least squares. J. Visual Communication and Image Representation,

Problem: Drift copy of first image (x n,y n ) (x 1,y 1 ) – add another copy of first image at the end – this gives a constraint: y n = y 1 – there are a bunch of ways to solve this problem add displacement of (y 1 – y n )/(n - 1) to each image after the first compute a global warp: y’ = y + ax run a big optimization problem, incorporating this constraint – best solution, but more complicated – known as “bundle adjustment”

Global optimization Minimize a global energy function: – What are the variables? The translation t j = (x j, y j ) for each image – What is the objective function? We have a set of matched features p i,j = (u i,j, v i,j ) For each point match (p i,j, p i,j+1 ): p i,j+1 – p i,j = t j+1 – t j I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1

Global optimization I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1 p 1,2 – p 1,1 = t 2 – t 1 p 1,3 – p 1,2 = t 3 – t 2 p 2,3 – p 2,2 = t 3 – t 2 … v 4,1 – v 4,4 = y 1 – y 4 minimize w ij = 1 if feature i is visible in images j and j+1 0 otherwise

Global optimization I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1 A 2m x 2n 2n x 1 x 2m x 1 b

Global optimization Defines a least squares problem: minimize Solution: Problem: there is no unique solution for ! (det = 0) We can add a global offset to a solution and get the same error A 2m x 2n 2n x 1 x 2m x 1 b

Ambiguity in global location Each of these solutions has the same error Called the gauge ambiguity Solution: fix the position of one image (e.g., make the origin of the 1 st image (0,0)) (0,0) (-100,-100) (200,-200)

Solving for camera parameters Projection equation Recap: a camera is described by several parameters Translation t of the optical center from the origin of world coords Rotation R of the image plane focal length f, principle point (x’ c, y’ c ), pixel size (s x, s y ) blue parameters are called “extrinsics,” red are “intrinsics” K

Solving for camera parameters Projection equation The projection matrix models the cumulative effect of all parameters Useful to decompose into a series of operations projectionintrinsicsrotationtranslation identity matrix A camera is described by several parameters Translation T of the optical center from the origin of world coords Rotation R of the image plane focal length f, principle point (x’ c, y’ c ), pixel size (s x, s y ) blue parameters are called “extrinsics,” red are “intrinsics” K

Solving for camera rotation Instead of spherically warping the images and solving for translation, we can directly solve for the rotation R j of each camera Can handle tilt / twist

Solving for camera rotation What if we want to handle tilt / twist? – [images here] Instead of doing spherical warp and solving for translations, directly solve for rotation R i of each camera (u,v,f)(x,y,z) f (u,v,f) R

Solving for rotations R1R1 R2R2 f I1I1 I2I2 (u 12, v 12 ) (u 11, v 11 ) (u 11, v 11, f) = p 11 R 1 p 11 R 2 p 22

Solving for rotations minimize

3D rotations How many degrees of freedom are there? How do we represent a rotation? – Rotation matrix (too many degrees of freedom) – Euler angles (e.g. yaw, pitch, and roll) – Quaternions (4-vector on unit sphere) Usually involves non-linear optimization

Solving for rotations and translations Structure from motion (SfM) Unlike with panoramas, we often need to solve for structure (3D point positions) as well as motion (camera parameters)

Structure from motion Camera 1 Camera 2 Camera 3 R 1,t 1 R 2,t 2 R 3,t 3 p1p1 p4p4 p3p3 p2p2 p5p5 p6p6 p7p7 minimize f (R, T, P)f (R, T, P)

p 1,1 p 1,2 p 1,3 Image 1 Image 2 Image 3 x1x1 x4x4 x3x3 x2x2 x5x5 x6x6 x7x7 R1,t1R1,t1 R2,t2R2,t2 R3,t3R3,t3

Structure from motion Input: images with points in correspondence p i,j = (u i,j,v i,j ) Output structure: 3D location x i for each point p i motion: camera parameters R j, t j Objective function: minimize reprojection error Reconstruction (side) (top)

p 1,1 p 1,2 p 1,3 Image 1 Image 2 Image 3 x1x1 x4x4 x3x3 x2x2 x5x5 x6x6 x7x7 R1,t1R1,t1 R2,t2R2,t2 R3,t3R3,t3

SfM objective function Given point x and rotation and translation R, t Minimize sum of squared reprojection errors: predicted image location observed image location

Solving structure from motion Minimizing g is difficult: – g is non-linear due to rotations, perspective division – lots of parameters: 3 for each 3D point, 6 for each camera – difficult to initialize – gauge ambiguity: error is invariant to a similarity transform (translation, rotation, uniform scale) Many techniques use non-linear least-squares (NLLS) optimization (bundle adjustment) – Levenberg-Marquardt is one common algorithm for NLLS – Lourakis, The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg- Marquardt Algorithm, –

Extensions to SfM Can also solve for intrinsic parameters (focal length, radial distortion, etc.) Can use a more robust function than squared error, to avoid fitting to outliers For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.

Photo Tourism Structure from motion on Internet photo collections

Photo Tourism

Photo Tourism overview Scene reconstruction Photo Explorer Input photographs [Note: change to Trevi for consistency] Relative camera positions and orientations Point cloud Sparse correspondence

Scene reconstruction Feature detection Pairwise feature matching Pairwise feature matching Incremental structure from motion Correspondence estimation

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature matching Match features between each pair of images

Feature matching Refine matching using RANSAC [Fischler & Bolles 1987] to estimate fundamental matrices between pairs

Incremental structure from motion

Problem size Trevi Fountain collection 466 input photos + > 100,000 3D points = very large optimization problem

Photo Tourism overview Scene reconstruction Photo Explorer Input photographs

Photo Explorer

Demo

Overhead map

Prague Old Town Square

Annotations

Reproduced with permission of Yahoo! Inc. © 2005 by Yahoo! Inc. YAHOO! and the YAHOO! logo are trademarks of Yahoo! Inc.

Annotations

Yosemite

Two-view structure from motion Simpler case: can consider motion independent of structure Let’s first consider the case where K is known – Each image point (u i,j, v i,j, 1) can be multiplied by K -1 to form a 3D ray – We call this the calibrated case K

Notes on two-view geometry How can we express the epipolar constraint? Answer: there is a 3x3 matrix E such that p' T Ep = 0 E is called the essential matrix epipolar plane epipolar line p p'

Properties of the essential matrix p' T Ep = 0 Ep is the epipolar line associated with p e and e' are called epipoles: Ee = 0 and E T e' = 0 E can be solved for with 5 point matches – see Nister, An efficient solution to the five-point relative pose problem. PAMI epipolar plane epipolar line p p' EpEp e e'

Properties of the essential matrix epipolar plane epipolar line p p' EpEp e e'

The Fundamental matrix If K is not known, then we use a related matrix called the Fundamental matrix, F – Called the uncalibrated case F = K -T E K -1 F can be solved for linearly with eight points, or non-linearly with six or seven points

More information Paper: “Photo Tourism: Exploring photo collections in 3D,”

Properties of the Essential Matrix E T p is the epipolar line associated with p. E e’=0 and E T e=0. E is singular. E has two equal non-zero singular values (Huang and Faugeras, 1989). 55 T T

Fundamental matrix

Give example here

Eight-point algorithm

Multi-view structure from motion

Factorization [Steal partly from Rick’s slides]

Bundle adjustment [copy some from Rick’s slides]

What about bad matches?

What if the frames aren’t given in order?

Photo Tourism [describe the process here]

Photo Tourism [show some results here]