Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 23: Structure from motion and multi-view stereo

Similar presentations


Presentation on theme: "Lecture 23: Structure from motion and multi-view stereo"— Presentation transcript:

1 Lecture 23: Structure from motion and multi-view stereo
CS6670: Computer Vision Noah Snavely Lecture 23: Structure from motion and multi-view stereo

2 Readings Szeliski, Chapter 7.1 – 7.4, 11.6

3 Announcements Project 2b due on Tuesday by 10:59pm
Final project proposals feedback soon

4 Structure from motion Reconstruction (side) (top) Input: images with points in correspondence pi,j = (ui,j,vi,j) Output structure: 3D location xi for each point pi motion: camera parameters Rj , tj possibly Kj Objective function: minimize reprojection error

5 Structure from motion g(X,R, T) minimize R1,t1 R3,t3 R2,t2 X4 X1 X3 X2
non-linear least squares p1,1 p1,3 Structure from motion solves the following problem: Given a set of images of a static scene with 2D points in correspondence, shown here as color-coded points, find… a set of 3D points P and a rotation R and position t of the cameras that explain the observed correspondences. In other words, when we project a point into any of the cameras, the reprojection error between the projected and observed 2D points is low. This problem can be formulated as an optimization problem where we want to find the rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f. This is a non-linear least squares problem and can be solved with algorithms such as Levenberg-Marquart. However, because the problem is non-linear, it can be susceptible to local minima. Therefore, it’s important to initialize the parameters of the system carefully. In addition, we need to be able to deal with erroneous correspondences. p1,2 Camera 1 Camera 3 R1,t1 R3,t3 Camera 2 R2,t2

6 is point i visible in image j ?
Structure from motion Minimize sum of squared reprojection errors: Minimizing this function is called bundle adjustment Optimized using non-linear least squares, e.g. Levenberg-Marquardt predicted image location observed image location indicator variable: is point i visible in image j ?

7 Incremental structure from motion
To help get good initializations for all of the parameters of the system, we reconstruct the scene incrementally, starting from two photographs and the points they observe.

8 Incremental structure from motion
To help get good initializations for all of the parameters of the system, we reconstruct the scene incrementally, starting from two photographs and the points they observe.

9 Incremental structure from motion
We then add several photos at a time to the reconstruction, refine the model, and repeat until no more photos match any points in the scene.

10 Photo Explorer

11 Demo

12

13

14 Why is this hard?

15

16 Extensions to SfM Can also solve for intrinsic parameters (focal length, radial distortion, etc.) Can use a more robust function than squared error, to avoid fitting to outliers For more information, see: Triggs, et al, “Bundle Adjustment – A Modern Synthesis”, Vision Algorithms 2000.

17 Questions?

18 Can we reconstruct entire cities?

19

20 _______ _ _____ ______ _ _____ _______ _______ ______ ____ _______ _ _________ ____

21 Gigantic matching problem
1,000,000 images  500,000,000,000 pairs Matching all of these on a 1,000-node cluster would take more than a year, even if we match 10,000 every second And involves TBs of data The vast majority (>99%) of image pairs do not match There are better ways of finding matching images (more on this later)

22 Gigantic SfM problem Largest problem size we’ve seen:
15,000 cameras 4 million 3D points more than 12 million parameters more than 25 million equations Huge optimization problem Requires sparse least squares techniques

23 Building Rome in a Day St. Peter’s Basilica Colosseum Trevi Fountain Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines

24 Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352

25 Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845).
Total reconstruction time: 23 hours Number of cores: 352

26 Total reconstruction time: 3 days. Number of cores: 496.
San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.

27 Multi-view stereo Stereo Multi-view stereo

28 Multi-view Stereo Point Grey’s Bumblebee XB3 Point Grey’s ProFusion 25
CMU’s 3D Room

29 Multi-view Stereo 29

30 Figures by Carlos Hernandez
Multi-view Stereo Input: calibrated images from several viewpoints Output: 3D object model Figures by Carlos Hernandez

31 Narayanan, Rander, Kanade
Fua Seitz, Dyer Narayanan, Rander, Kanade Faugeras, Keriven 1995 1997 1998 1998 Hernandez, Schmitt Pons, Keriven, Faugeras Furukawa, Ponce Goesele et al. 2004 2005 2006 2007

32 Stereo: basic idea error depth

33 Choosing the stereo baseline
all of these points project to the same pair of pixels width of a pixel Large Baseline Small Baseline What’s the optimal baseline? Too small: large depth error Too large: difficult search problem

34 The Effect of Baseline on Depth Estimation

35 z width of a pixel pixel matching score width of a pixel z

36

37 Multibaseline Stereo Basic Approach Limitations
Choose a reference view Use your favorite stereo algorithm BUT replace two-view SSD with SSSD over all baselines Limitations Only gives a depth map (not an “object model”) Won’t work for widely distributed views: Visibility!

38 Problem: visibility Some Solutions
Match only nearby photos [Narayanan 98] Use NCC instead of SSD, Ignore NCC values > threshold [Hernandez & Schmitt 03]

39 Popular matching scores
SSD (Sum Squared Distance) NCC (Normalized Cross Correlation) where what advantages might NCC have?

40 Questions?


Download ppt "Lecture 23: Structure from motion and multi-view stereo"

Similar presentations


Ads by Google