Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION.

Slides:



Advertisements
Similar presentations
Epipolar Geometry.
Advertisements

The fundamental matrix F
Lecture 11: Two-view geometry
3D reconstruction.
3D Reconstruction – Factorization Method Seong-Wook Joo KG-VISA 3/10/2004.
Computer vision: models, learning and inference
Two-View Geometry CS Sastry and Yang
Two-view geometry.
Camera calibration and epipolar geometry
Structure from motion.
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.
Multiple-view Reconstruction from Points and Lines
Many slides and illustrations from J. Ponce
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
3D Computer Vision and Video Computing 3D Vision Lecture 14 Stereo Vision (I) CSC 59866CD Fall 2004 Zhigang Zhu, NAC 8/203A
The Pinhole Camera Model
Projected image of a cube. Classical Calibration.
May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.
Lec 21: Fundamental Matrix
Camera parameters Extrinisic parameters define location and orientation of camera reference frame with respect to world frame Intrinsic parameters define.
Structure Computation. How to compute the position of a point in 3- space given its image in two views and the camera matrices of those two views Use.
CS 558 C OMPUTER V ISION Lecture IX: Dimensionality Reduction.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
Automatic Camera Calibration
Computer vision: models, learning and inference
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Camera Geometry and Calibration Thanks to Martial Hebert.
Multi-view geometry.
Epipolar geometry The fundamental matrix and the tensor
1 Preview At least two views are required to access the depth of a scene point and in turn to reconstruct scene structure Multiple views can be obtained.
Projective cameras Motivation Elements of Projective Geometry Projective structure from motion Planches : –
© 2005 Yusuf Akgul Gebze Institute of Technology Department of Computer Engineering Computer Vision Geometric Camera Calibration.
Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.
Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION Presented by Prof. Chiou-Shann Fuh & Pradnya Borade
Lecture 04 22/11/2011 Shai Avidan הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Multiview Geometry and Stereopsis. Inputs: two images of a scene (taken from 2 viewpoints). Output: Depth map. Inputs: multiple images of a scene. Output:
Announcements Project 3 due Thursday by 11:59pm Demos on Friday; signup on CMS Prelim to be distributed in class Friday, due Wednesday by the beginning.
Parameter estimation. 2D homography Given a set of (x i,x i ’), compute H (x i ’=Hx i ) 3D to 2D camera projection Given a set of (X i,x i ), compute.
Affine Structure from Motion
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Two-view geometry. Epipolar Plane – plane containing baseline (1D family) Epipoles = intersections of baseline with image planes = projections of the.
EECS 274 Computer Vision Affine Structure from Motion.
Computer vision: models, learning and inference M Ahad Multiple Cameras
776 Computer Vision Jan-Michael Frahm & Enrique Dunn Spring 2013.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –
Parameter estimation class 5 Multiple View Geometry CPSC 689 Slides modified from Marc Pollefeys’ Comp
Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.
Lec 26: Fundamental Matrix CS4670 / 5670: Computer Vision Kavita Bala.
Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
René Vidal and Xiaodong Fan Center for Imaging Science
Parameter estimation class 5
Two-view geometry Computer Vision Spring 2018, Lecture 10
Epipolar geometry.
Structure from motion Input: Output: (Tomasi and Kanade)
Advanced Computer Vision
Uncalibrated Geometry & Stratification
Reconstruction.
Two-view geometry.
Two-view geometry.
Multi-view geometry.
Single-view geometry Odilon Redon, Cyclops, 1914.
The Pinhole Camera Model
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

Advanced Computer Vision Structure from Motion1 Chapter 7 S TRUCTURE FROM M OTION

What Is Structure from Motion? 1.Study of visual perception. 2.Process of finding the three-dimensional structure of an object by analyzing local motion signals over time. 3.A method for creating 3D models from 2D pictures of an object. Structure from Motion2

Example Structure from Motion3 Picture 1 Picture 2

Example (cont). Structure from Motion4 3D model created from the two images

7.1 Triangulation A problem of estimating a point’s 3D location when it is seen from multiple cameras is known as triangulation. Structure from Motion5

Find the 3D point p that lies closest to all of the 3D rays corresponding to the 2D matching feature locations {x j } Triangulation (cont). Structure from Motion6

Triangulation (cont). Find the 3D point p that lies closest to all of the 3D rays corresponding to the 2D matching feature locations {x j } observed by cameras {P j = K j [R j | t j ] } t j = -R j c j c j is the jth camera center. Structure from Motion7

Triangulation (cont). It is a converse of pose estimation problem. Given projection matrices, 3D points can be computed from their measured image positions in two or more views. Structure from Motion8

Triangulation (cont). Structure from Motion9

Triangulation (cont). Structure from Motion10

Triangulation (cont). Structure from Motion11

Triangulation (cont). Structure from Motion12 x = PX {P = K [R|t] }

Triangulation (cont). Structure from Motion13 Figure 7.7: 3D point triangulation by finding the points p that lies nearest to all of the optical rays

Triangulation (cont). The rays originate at c j in a direction The nearest point to p on this ray, which is denoted as q j, minimizes the distance. which has a minimum at Hence, Structure from Motion14 (p-cj) -

Triangulation (cont). The squared distance between p and q j is The optimal value for p, which lies closest to all of the rays, can be computed as a regular least square problem by summing over all the r j 2 and finding the optimal value of p, Structure from Motion15 (p-cj) -

Triangulation (cont). Structure from Motion16

Triangulation (cont). Structure from Motion17

Triangulation (cont). If we use homogeneous coordinates p=(X,Y,Z,W), the resulting set of equation is homogeneous and is solved as singular value decomposition (SVD). If we set W=1, we can use regular linear least square, but the resulting system may be singular or poorly coordinated (i.e. all of the viewing rays are parallel). Structure from Motion18

Singular Value Decomposition (SVD). Structure from Motion19

Singular Value Decomposition (SVD). Structure from Motion20 Rotation

Singular Value Decomposition (SVD). Solution is the eigenvector corresponding to the minimum eigenvalue of AA T AA T = UΣV T VΣ T U T = U(ΣΣ T )U T It is also the eigenvector corresponding to the minimum eigenvalue of A Structure from Motion21

Least Square Structure from Motion22

Linear Least Square Problem Structure from Motion23

Linear Least Square Problem Minimize F(X): Partial differential over X 0, X 1 : Solve X 0, X 1 by combining two equations Structure from Motion24

7.2Two-Frame Structure from Motion In 3D reconstruction we have always assumed that either 3D points position or the 3D camera poses are known in advance. Structure from Motion25

Two-Frame Structure from Motion (cont). Structure from Motion26 Figure 7.8: Epipolar geometry: The vectors t=c 1 – c 0, p – c 0 and p-c 1 are co-planar and the basic epipolar constraint expressed in terms of the pixel measurement x 0 and x 1

Two-Frame Structure from Motion (cont). Figure shows a 3D point p being viewed from two cameras whose relative position can be encoded by a rotation R and a translation t. We do not know anything about the camera positions, without loss of generality. We can set the first camera at the origin c 0 =0 and at a canonical orientation R 0 =I Structure from Motion27

Two-Frame Structure from Motion (cont). The observed location of point p in the first image, is mapped into the second image by the transformation : the ray direction vectors. Structure from Motion28

Two-Frame Structure from Motion (cont). Structure from Motion29 Taking the cross product of both the sides with t in order to annihilate it on the right hand side yields Taking the dot product of both the sides with yields

Two-Frame Structure from Motion (cont). The right hand side is triple product with two identical entries We therefore arrive at the basic epipolar constraint : essential matrix Structure from Motion30

The essential matrix E maps a point in image 0 into a line in image 1 since Two-Frame Structure from Motion (cont). Structure from Motion31

Two-Frame Structure from Motion (cont). All such lines must pass through the second epipole e 1, which is therefore defined as the left singular vector of E with 0 singular value, or, equivalently the projection of the vector t into image 1. The transpose of these relationships gives us the epipolar line in the first image as and e 0 as the zero value right singular vector E. Structure from Motion32

Two-Frame Structure from Motion (cont). Structure from Motion33

Two-Frame Structure from Motion (cont). Given the relationship If we have n corresponding measurements {(x i0,x i1 )}, we can form N homogeneous equations in the elements of E= {e 00 …..e 22 } Structure from Motion34

Two-Frame Structure from Motion (cont). Structure from Motion35 Find min||AE||, E = least eigenvector of A T A. Variants E’: enforcing the rank two constraint in E →

Two-Frame Structure from Motion (cont). t is eigenvector correspended to min eignvalue under no noise: Estimate R from t: Structure from Motion36

With, we get Under no noise ( ): → However, you can flip both V,U signs and still get a valid SVD: Structure from Motion37 Two-Frame Structure from Motion (cont).

If the measurements have noise, the terms that are product of measurement have their noise amplified by the other element in the product, which lead to poor scaling. In order to deal with this, a suggestion is that the point coordinate should be translated and scaled so that their centroid lies at the original variance is unity; i.e. Structure from Motion38

Two-Frame Structure from Motion (cont). such that Structure from Motion39 and n= number of points. Once the essential matrix has been computed from the transformed coordinates; the original essential matrix E can be recovered as

Projective Reconstruction (cont). Structure from Motion40 In the unreliable case, we do not know the calibration matrices K j, so we cannot use the normalized ray directions. We have access to the image coordinate x j, so essential matrix becomes: fundamental matrix:

Just like essential matrix, fundamental matrix can be written as follow with rank 2: And ( can not be recovered from F) Structure from Motion41 Projective Reconstruction (cont).

As equations on P.37, F can be written as: Therefore, : singular value matrix with the smallest value replaced by middle value We can form pair projective matrices as follow and reconstruct scene by triangulation: Structure from Motion42 Projective Reconstruction (cont).

View Morphing Application of basic two-frame structure from motion. Also known as view interpolation. Used to generate a smooth 3D animation from one view of a 3D scene to another. To create such a transition: smoothly interpolate camera matrices, i.e., camera position, orientation, focal lengths. More effect is obtained by easing in and easing out camera parameters. To generate in-between frames: establish full set of 3D correspondences or 3D models for each reference view. Structure from Motion43

View Morphing Triangulate set of matched feature points in each image. As the 3D points are re-projected into their intermediate views, pixels can be mapped from their original source images to their new views using affine projective mapping. The final image then composited using linear blend of the two reference images as with usual morphing. Structure from Motion44

7.3 Factorization n 3D points are seen in m views q =(u,v,1): 2D image point p =(x,y,z,1): 3D scene point Π : projection matrix π : projection function q ij is the projection of the i -th point on image j λ ij projective depth of q ij Structure from Motion45

Projection Models Structure from Motion46

Projection Models Structure from Motion47

Orthographic Projection Structure from Motion48

Orthographic Projection Structure from Motion49

Perspective Projection Structure from Motion50

SFM under Orthographic Projection In general, p: 4x1 matrix(x y z 1), q: 3x1 matrix(u v 1) Assume no translation, Π:3x3, p:3x1,q:3x1 Under orthographic projection, Π:2x3, p:3x1, q:2x1 Structure from Motion51

SFM under Orthographic Projection Choose scene origin to be centroid of 3D points Choose image origins to be centroid of 2D points Allows us to drop the camera translation: Structure from Motion52

Factorization (cont). Original input: Centroid: Translation: Structure from Motion53 =

Factorization (cont). Rank(W) <= 3 Structure from Motion54

Factorization (cont). Use singular value decomposition to W: Eliminate noise, Σ nxn → Σ’ 3x3, rank(Σ’)<=3, U 2mxn →U’ 2mx3, V nxn →V’ 3xn. Structure from Motion55

Factorization (cont). S’ differs from S by a linear transformation A: Solve for A by enforcing metric constraints on M: Orthographic Camera Rows of Π are orthonormal: Therefore, rows of M are orthonormal → Solve A → Solve M(=M’A) Structure from Motion56

Factorization (cont). Assume Π=Π’A, Solve for G first by writing equations for every Π i in M Then G = AA T by SVD Structure from Motion57

Factorization with Noisy Data Provides optimal rank 3 approximation W’ of W by SVD: Estimate W’, then use noise-free factorization of W’ as before Result minimizes the SSD between positions of image features and projection of the reconstruction Structure from Motion58

Factorization with Missing Data Structure from Motion59

Factorization with Missing Data (cont). Apply factorization on W 6X4 : Structure from Motion60

Factorization with Missing Data (cont). Solve for i 4 and j 4 : Structure from Motion61

Factorization with Missing Data (cont). Disadvantages Finding the largest full submatrix of a matrix with missing elements is NP-hard. The data is not used symmetrically, these inaccuracies will propagate in the computation of additional missing elements. Structure from Motion62

Projective Factorization W has at most 4 rank Structure from Motion63

Projective Factorization For the p-th point, its projective depths for the i-th and j-th images are related by Structure from Motion64

Projective Factorization Normalize the image i’s coordinate, by applying transformations T i. Estimate the fundamental matrices and epipoles Determine the scale factors λ ip Build rescaled matrix W Compute the SVD of W From the SVD, recover projective motion and shape Adapt projection motion, to account for the normalization transformation T i of step 1 Structure from Motion65

Projective Factorization Structure from Motion66

7.4 Bundle Adjustment Minimize the squared reprojection errors of the 2D points Solve the nonlinear least squared problem by Levenberg-Marquardt method Structure from Motion67

Bundle Adjustment (cont). Structure from Motion68 (a)(b) (c) Figure 7.14: (a) Bipartite graph for a toy structure from motion problem and (b) its associated Jacobian J and (c) Hessian A. Numbers indicate cameras. The dashed arcs and light blue squares indicate the fill-in that occurs when the structure (point) variables are eliminated.

Constrained Structure and Motion Line-based technique: Pairwise epipolar geometry cannot be recovered from line matches alone, even if the cameras are calibrated. Consider projecting the set of lines in each image into a set of 3D planes in space. You can move the two cameras around into any configuration and still obtain a valid reconstruction for 3D lines. Structure from Motion69

Constrained Structure and Motion When lines are visible in three or more views, the trifocal tensor can be used to transfer lines from one pair of image to another. The trifocal tensor can also be computed on the basis line matches alone. For triples of images, the trifocal tensor is used to verify that the lines are in geometric correspondence before evaluating the correlations between line segments. Structure from Motion70

Constrained Structure and Motion Structure from Motion71

Constrained Structure and Motion Structure from Motion72 Camera matrices (3x4) for the three views: P = [I|0], P′= [A|a 4 ], P′′= [B|b 4 ] a 4 =e′ and b 4 = e′′ are the epipoles arising from the first camera center C thus:e′= P′C and e′′= P′′C

Constrained Structure and Motion Structure from Motion73 The lines: l↔l′↔l′′ back project to the planes: The planes π, π′ and π′′ coincide in the line L This can be expressed algebraically with: M = [π, π′, π′′], det(M) = 0

Constrained Structure and Motion Structure from Motion74 For the top three vectors of M This gives: l = (b ⊤ 4 l′′)A ⊤ l′−(a ⊤ 4 l′)B ⊤ l′′ = (l′′ ⊤ b 4 )A ⊤ l′−(l′ ⊤ a 4 )B ⊤ l′′ For the i-th element of we have:

Constrained Structure and Motion Structure from Motion75 The set of the three matrices T1, T2, T3 constitute the trifocal tensor in matrix notation.

Reference lectures/ lectures/ E7%A1%95%E5%A3%AB%E8%AE%BA%E6%96% _%E5%8C%85%E7%AB%8B.pdfhttp:// E7%A1%95%E5%A3%AB%E8%AE%BA%E6%96% _%E5%8C%85%E7%AB%8B.pdf 1bb.html Structure from Motion76