Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stereo and Projective Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/13/10 Many slides adapted from Lana.

Similar presentations


Presentation on theme: "Stereo and Projective Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/13/10 Many slides adapted from Lana."— Presentation transcript:

1 Stereo and Projective Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/13/10 Many slides adapted from Lana Lazebnik, Silvio Saverese, Steve Seitz

2 This class Recap of epipolar geometry Recovering structure – Generally, how can we estimate 3D positions for matched points in two images? (triangulation) – If we have a moving camera, how can we recover 3D points? (projective structure from motion) – If we have a calibrated stereo pair, how can we get dense depth estimates? (stereo fusion)

3 Basic Questions Why can’t we get depth if the camera doesn’t translate? Why can’t we get a nice panorama if the camera does translate?

4 Recap: Epipoles Point x in left image corresponds to epipolar line l’ in right image Epipolar line passes through the epipole (the intersection of the cameras’ baseline with the image plane

5 Recap: Fundamental Matrix Fundamental matrix maps from a point in one image to a line in the other If x and x’ correspond to the same 3d point X:

6 Recap: Automatically Relating Projections Homography (No Translation)Fundamental Matrix (Translation) Assume we have matched points x x’ with outliers

7 Recap: Automatically Relating Projections Homography (No Translation) Correspondence Relation 1.Normalize image coordinates 2.RANSAC with 4 points 3.De-normalize: Assume we have matched points x x’ with outliers Fundamental Matrix (Translation)

8 Recap: Automatically Relating Projections Homography (No Translation)Fundamental Matrix (Translation) Correspondence Relation 1.Normalize image coordinates 2.RANSAC with 8 points 3.Enforce by SVD 4.De-normalize: Correspondence Relation 1.Normalize image coordinates 2.RANSAC with 4 points 3.De-normalize: Assume we have matched points x x’ with outliers

9 Recap We can get projection matrices P and P’ up to a projective ambiguity Code: Code function P = vgg_P_from_F(F) [U,S,V] = svd(F); e = U(:,3); P = [-vgg_contreps(e)*F e]; See HZ p

10 Recap Fundamental matrix song

11 Triangulation: Linear Solution Generally, rays C  x and C’  x’ will not exactly intersect Can solve via SVD, finding a least squares solution to a system of equations X xx' Further reading: HZ p

12 Triangulation: Linear Solution Given P, P’, x, x’ 1.Precondition points and projection matrices 2.Create matrix A 3.[U, S, V] = svd(A) 4. X = V(:, end) Pros and Cons Works for any number of corresponding images Not projectively invariant Code:

13 Triangulation: Non-linear Solution Minimize projected error while satisfying x T Fx=0 Solution is a 6-degree polynomial of t, minimizing Further reading: HZ p. 318

14 Projective structure from motion Given: m images of n fixed 3D points x ij = P i X j, i = 1,…, m, j = 1, …, n Problem: estimate m projection matrices P i and n 3D points X j from the mn corresponding points x ij x1jx1j x2jx2j x3jx3j XjXj P1P1 P2P2 P3P3 Slides from Lana Lazebnik

15 Projective structure from motion Given: m images of n fixed 3D points x ij = P i X j, i = 1,…, m, j = 1, …, n Problem: estimate m projection matrices P i and n 3D points X j from the mn corresponding points x ij With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q: X → QX, P → PQ -1 We can solve for structure and motion when 2mn >= 11m +3n – 15 For two cameras, at least 7 points are needed

16 Projective SFM: Two-camera case Compute fundamental matrix F between the two views First camera matrix: [I|0] Second camera matrix: [M|t] Then e is the epipole ( F T e = 0 ), A = –[e × ]F F&P sec

17 Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure by triangulation For each additional view: – Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration cameras points

18 Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure by triangulation For each additional view: – Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration – Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation cameras points

19 Sequential structure from motion Initialize motion from two images using fundamental matrix Initialize structure by triangulation For each additional view: – Determine projection matrix of new camera using all the known 3D points that are visible in its image – calibration – Refine and extend structure: compute new 3D points, re-optimize existing points that are also seen by this camera – triangulation Refine structure and motion: bundle adjustment cameras points

20 Bundle adjustment Non-linear method for refining structure and motion Minimizing reprojection error x1jx1j x2jx2j x3jx3j XjXj P1P1 P2P2 P3P3 P1XjP1Xj P2XjP2Xj P3XjP3Xj

21 Self-calibration Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images – Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form P i = K [R i | t i ] Can use constraints on the form of the calibration matrix: zero skew

22 Summary so far From two images, we can: – Recover fundamental matrix F – Recover canonical cameras P and P’ from F – Estimate 3d position values X for corresponding points x and x’ For a moving camera, we can: – Initialize by computing F, P, X for two images – Sequentially add new images, computing new P, refining X, and adding points – Auto-calibrate assuming fixed calibration matrix to upgrade to similarity transform

23 Photo synth Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," SIGGRAPH 2006Photo tourism: Exploring photo collections in 3D

24 Photosynth.net Based on Photo Tourism by Noah Snavely, Steve Seitz, and Rick SzeliskiPhoto Tourism

25 3D from multiple images Building Rome in a Day: Agarwal et al. 2009

26 Plug: Steve Seitz Talk Steve Seitz will talk about “Reconstructing the World from Photos on the Internet” – Monday, April 26 th, 4pm in Siebel Center

27 Special case: Dense binocular stereo Fuse a calibrated binocular stereo pair to produce a depth image image 1image 2 Dense depth map Many of these slides adapted from Steve Seitz and Lana Lazebnik

28 Basic stereo matching algorithm For each pixel in the first image – Find corresponding epipolar line in the right image – Examine all pixels on the epipolar line and pick the best match – Triangulate the matches to get depth information Simplest case: epipolar lines are scanlines – When does this happen?

29 Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same

30 Simplest Case: Parallel images Image planes of cameras are parallel to each other and to the baseline Camera centers are at same height Focal lengths are the same Then, epipolar lines fall along the horizontal scan lines of the images

31 Essential matrix for parallel images R = I t = (T, 0, 0) Epipolar constraint: t x x’

32 Special case of fundamental matrix Epipolar constraint: R = I t = (T, 0, 0) The y-coordinates of corresponding points are the same! t x x’

33 Depth from disparity f xx’ Baseline B z OO’ X f Disparity is inversely proportional to depth!

34 Stereo image rectification

35 Reproject image planes onto a common plane parallel to the line between optical centers Pixel motion is horizontal after this transformation Two homographies (3x3 transform), one for each input image reprojection  C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999.Computing Rectifying Homographies for Stereo Vision

36 Rectification example

37 Basic stereo matching algorithm If necessary, rectify the two stereo images to transform epipolar lines into scanlines For each pixel x in the first image – Find corresponding epipolar scanline in the right image – Examine all pixels on the scanline and pick the best match x’ – Compute disparity x-x’ and set depth(x) = 1/(x-x’)

38 Matching cost disparity LeftRight scanline Correspondence search Slide a window along the right scanline and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation

39 LeftRight scanline Correspondence search SSD

40 LeftRight scanline Correspondence search Norm. corr

41 Effect of window size – Smaller window + More detail More noise – Larger window + Smoother disparity maps Less detail W = 3W = 20

42 Failures of correspondence search Textureless surfaces Occlusions, repetition Non-Lambertian surfaces, specularities

43 Results with window search Window-based matchingGround truth Data

44 How can we improve window-based matching? So far, matches are independent for each point What constraints or priors can we add?

45 How can we improve window-based matching? The similarity constraint is local (each reference window is matched independently) Need to enforce non-local correspondence constraints or priors

46 Stereo constraints/priors Uniqueness – For any point in one image, there should be at most one matching point in the other image

47 Stereo constraints/priors Uniqueness – For any point in one image, there should be at most one matching point in the other image Ordering – Corresponding points should be in the same order in both views

48 Stereo constraints/priors Uniqueness – For any point in one image, there should be at most one matching point in the other image Ordering – Corresponding points should be in the same order in both views Ordering constraint doesn’t hold

49 Non-local constraints Uniqueness – For any point in one image, there should be at most one matching point in the other image Ordering – Corresponding points should be in the same order in both views Smoothness – We expect disparity values to change slowly (for the most part)

50 Stereo matching as energy minimization I1I1 I2I2 D Energy functions of this form can be minimized using graph cuts Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001Fast Approximate Energy Minimization via Graph Cuts W1(i )W1(i )W 2 (i+D(i )) D(i )D(i )

51 Stereo matching as energy minimization Probabilistic interpretation: we want to find a Maximum A Posteriori (MAP) estimate of disparity image D: I1I1 I2I2 D W1(i )W1(i )W 2 (i+D(i )) D(i )D(i )

52 Many of these constraints can be encoded in an energy function and solved using graph cuts Graph cuts Ground truth For the latest and greatest: Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001Fast Approximate Energy Minimization via Graph Cuts

53 “Shortest paths” for scan-line stereo Left imageRight image Can be implemented with dynamic programming Ohta & Kanade ’85, Cox et al. ‘96 correspondence q p Left occlusion t Right occlusion s Slide credit: Y. Boykov

54 Coherent stereo on 2D grid Scanline stereo generates streaking artifacts Can’t use dynamic programming to find spatially coherent disparities/ correspondences on a 2D grid

55 Stereo matching as energy minimization Note: the above formulation does not treat the two images symmetrically, does not enforce uniqueness, and does not take occlusions into account It is possible to come up with an energy that does all these things, but it’s a bit more complex – Defined over all possible sets of matches, not over all disparity maps with respect to the first image – Includes an occlusion term – The smoothness term looks different and more complicated V. Kolmogorov and R. Zabih, Computing Visual Correspondence with Occlusions using Graph Cuts, ICCV 2001Computing Visual Correspondence with Occlusions using Graph Cuts

56 Summary Recap of epipolar geometry – Epipoles are intersection of baseline with image planes – Matching point in second image is on a line passing through its epipole – Fundamental matrix maps from a point in one image to an epipole in the other – Can recover canonical camera matrices from F (with projective ambiguity) Recovering structure – Triangulation to recover 3D position of two matched points in images with known projection matrices – Sequential algorithm to recover structure from a moving camera, followed by auto-calibration by assuming fixed K – Get depth from stereo pair by aligning via homography and searching across scanlines to match; Depth is inverse to disparity.

57 Next class KLT tracking Elegant SFM method using tracked points, assuming orthographic projection Optical flow


Download ppt "Stereo and Projective Structure from Motion Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/13/10 Many slides adapted from Lana."

Similar presentations


Ads by Google