Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.

Slides:

Advertisements

Similar presentations

Structure from motion.

Advertisements

Assignment 4 Cloning Yourself

The fundamental matrix F

Lecture 11: Two-view geometry

Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Computer vision: models, learning and inference

Two-View Geometry CS Sastry and Yang

Two-view geometry.

Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.

Camera calibration and epipolar geometry

Structure from motion.

Lecture 23: Structure from motion and multi-view stereo

Lecture 11: Structure from motion, part 2 CS6670: Computer Vision Noah Snavely.

Global Alignment and Structure from Motion

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Lecture 21: Multiple-view geometry and structure from motion

Many slides and illustrations from J. Ponce

Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.

Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.

Lecture 16: Single-view modeling, Part 2 CS6670: Computer Vision Noah Snavely.

CS664 Lecture #19: Layers, RANSAC, panoramas, epipolar geometry Some material taken from:  David Lowe, UBC  Jiri Matas, CMP Prague

Single-view geometry Odilon Redon, Cyclops, 1914.

Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.

Lecture 10: Robust fitting CS4670: Computer Vision Noah Snavely.

Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.

Mosaics CSE 455, Winter 2010 February 8, 2010 Neel Joshi, CSE 455, Winter Announcements  The Midterm went out Friday  See to the class.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Computer vision: models, learning and inference

Multi-view geometry.

Mosaics Today’s Readings Szeliski, Ch 5.1, 8.1 StreetView.

Image stitching Digital Visual Effects Yung-Yu Chuang with slides by Richard Szeliski, Steve Seitz, Matthew Brown and Vaclav Hlavac.

CSCE 643 Computer Vision: Structure from Motion

© 2005 Martin Bujňák, Martin Bujňák Supervisor : RNDr.

Announcements Project 3 due Thursday by 11:59pm Demos on Friday; signup on CMS Prelim to be distributed in class Friday, due Wednesday by the beginning.

Peripheral drift illusion. Multiple views Hartley and Zisserman Lowe stereo vision structure from motion optical flow.

CS4670 / 5670: Computer Vision KavitaBala Lecture 17: Panoramas.

Single-view geometry Odilon Redon, Cyclops, 1914.

EECS 274 Computer Vision Geometric Camera Calibration.

Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.

Mosaics part 3 CSE 455, Winter 2010 February 12, 2010.

3D reconstruction from uncalibrated images

Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.

COS429 Computer Vision =++ Assignment 4 Cloning Yourself.

776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

Announcements No midterm Project 3 will be done in pairs same partners as for project 2.

EECS 274 Computer Vision Projective Structure from Motion.

Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

CS4670 / 5670: Computer Vision Kavita Bala Lecture 20: Panoramas.

Lecture 7: Image alignment

Modeling the world with photos

Structure from motion Input: Output: (Tomasi and Kanade)

Idea: projecting images onto a common plane

Lecture 23: Structure from motion 2

Multi-view geometry.

Lecture 8: Image alignment

Structure from motion.

Calibration and homographies

Structure from motion Input: Output: (Tomasi and Kanade)

Lecture 15: Structure from motion

CS5760: Computer Vision Lecture 9: RANSAC Noah Snavely

Image Stitching Linda Shapiro ECE/CSE 576.

Lecture 11: Image alignment, Part 2

Presentation transcript:

Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely

Readings Szeliski, Chapter 7.1 – 7.4

Announcements Project 2 due Sunday, March 13 – Two-person groups Quiz on Monday

Large-scale structure from motion Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352

Structure from motion Given many images, how can we a) figure out where they were all taken from? b) build a 3D model of the scene? This is (roughly) the structure from motion problem

Structure from motion Input: images with points in correspondence p i,j = (u i,j,v i,j ) Output structure: 3D location x i for each point p i motion: camera parameters R j, t j possibly K j Objective function: minimize reprojection error Reconstruction (side) (top)

Camera calibration and triangulation Suppose we know 3D points – And have matches between these points and an image – How can we compute the camera parameters? Suppose we have know camera parameters, each of which observes a point – How can we compute the 3D location of that point?

Structure from motion SfM solves both of these problems at once A kind of chicken-and-egg problem – (but solvable)

Photo Tourism

First step: how to get correspondence? Feature detection and matching

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature detection Detect features using SIFT [Lowe, IJCV 2004]

Feature matching Match features between each pair of images

Feature matching Refine matching using RANSAC to estimate fundamental matrix between each pair

Image connectivity graph (graph layout produced using the Graphviz toolkit:

p 1,1 p 1,2 p 1,3 Image 1 Image 2 Image 3 x1x1 x4x4 x3x3 x2x2 x5x5 x6x6 x7x7 R1,t1R1,t1 R2,t2R2,t2 R3,t3R3,t3

Structure from motion Camera 1 Camera 2 Camera 3 R 1,t 1 R 2,t 2 R 3,t 3 p1p1 p4p4 p3p3 p2p2 p5p5 p6p6 p7p7 minimize f (R, T, P)f (R, T, P)

Problem size Trevi Fountain collection 466 input photos + > 100,000 3D points = very large optimization problem

Incremental structure from motion

Photo Explorer

Questions? 3-minute break

Related topic: Drift copy of first image (x n,y n ) (x 1,y 1 ) – add another copy of first image at the end – this gives a constraint: y n = y 1 – there are a bunch of ways to solve this problem add displacement of (y 1 – y n )/(n - 1) to each image after the first compute a global warp: y’ = y + ax run a big optimization problem, incorporating this constraint – best solution, but more complicated – known as “bundle adjustment”

Global optimization Minimize a global energy function: – What are the variables? The translation t j = (x j, y j ) for each image I j – What is the objective function? We have a set of matched features p i,j = (u i,j, v i,j ) – We’ll call these tracks For each point match (p i,j, p i,j+1 ): p i,j+1 – p i,j = t j+1 – t j I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1 track

Global optimization I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1 p 1,2 – p 1,1 = t 2 – t 1 p 1,3 – p 1,2 = t 3 – t 2 p 2,3 – p 2,2 = t 3 – t 2 … v 4,1 – v 4,4 = y 1 – y 4 minimize w ij = 1 if track i is visible in images j and j+1 0 otherwise

Global optimization I1I1 I2I2 I3I3 I4I4 p 1,1 p 1,2 p 1,3 p 2,2 p 2,3 p 2,4 p 3,3 p 3,4 p 4,4 p 4,1 A 2m x 2n 2n x 1 x 2m x 1 b

Global optimization Defines a least squares problem: minimize Solution: Problem: there is no unique solution for ! (det = 0) We can add a global offset to a solution and get the same error A 2m x 2n 2n x 1 x 2m x 1 b

Ambiguity in global location Each of these solutions has the same error Called the gauge ambiguity Solution: fix the position of one image (e.g., make the origin of the 1 st image (0,0)) (0,0) (-100,-100) (200,-200)

Solving for camera rotation Instead of spherically warping the images and solving for translation, we can directly solve for the rotation R j of each camera Can handle tilt / twist

Solving for rotations R1R1 R2R2 f I1I1 I2I2 p 12 = (u 12, v 12 ) p 11 = (u 11, v 11 ) (u 11, v 11, f) = p 11 R 1 p 11 R 2 p 22

Solving for rotations minimize

3D rotations How many degrees of freedom are there? How do we represent a rotation? – Rotation matrix (too many degrees of freedom) – Euler angles (e.g. yaw, pitch, and roll) – bad idea – Quaternions (4-vector on unit sphere) Usually involves non-linear optimization

p 1,1 p 1,2 p 1,3 Image 1 Image 2 Image 3 x1x1 x4x4 x3x3 x2x2 x5x5 x6x6 x7x7 R1,t1R1,t1 R2,t2R2,t2 R3,t3R3,t3

SfM objective function Given point x and rotation and translation R, t Minimize sum of squared reprojection errors: predicted image location observed image location

Solving structure from motion Minimizing g is difficult – g is non-linear due to rotations, perspective division – lots of parameters: 3 for each 3D point, 6 for each camera – difficult to initialize – gauge ambiguity: error is invariant to a similarity transform (translation, rotation, uniform scale) Many techniques use non-linear least-squares (NLLS) optimization (bundle adjustment) – Levenberg-Marquardt is one common algorithm for NLLS – Lourakis, The Design and Implementation of a Generic Sparse Bundle Adjustment Software Package Based on the Levenberg-Marquardt Algorithm, – Marquardt_algorithm Marquardt_algorithm

Extensions to SfM Can also solve for intrinsic parameters (focal length, radial distortion, etc.) Can use a more robust function than squared error, to avoid fitting to outliers For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.

Questions?