A Global Linear Method for Camera Pose Registration

Slides:

Advertisements

Similar presentations

3D Model Matching with Viewpoint-Invariant Patches(VIP) Reporter ：鄒嘉恆 Date ： 10/06/2009.

Advertisements

Optimizing and Learning for Super-resolution

The fundamental matrix F

For Internal Use Only. © CT T IN EM. All rights reserved. 3D Reconstruction Using Aerial Images A Dense Structure from Motion pipeline Ramakrishna Vedantam.

1. 2 An extreme occurrence of the missing data W I D E B A S E L I N E – no point in more than 2 images!

Institut für Elektrische Meßtechnik und Meßsignalverarbeitung Professor Horst Cerjak, Augmented Reality VU 2 Calibration Axel Pinz.

Recent work in image-based rendering from unstructured image collections and remaining challenges Sudipta N. Sinha Microsoft Research, Redmond, USA.

Two-View Geometry CS Sastry and Yang

Multiple View Reconstruction Class 24 Multiple View Geometry Comp Marc Pollefeys.

Parallel Tracking and Mapping for Small AR Workspaces Vision Seminar

N-view factorization and bundle adjustment CMPUT 613.

Discrete-Continuous Optimization for Large-scale Structure from Motion David Crandall, Andrew Owens, Noah Snavely, Dan Huttenlocher Presented by: Rahul.

Camera calibration and epipolar geometry

Structure from motion.

Accurate Non-Iterative O( n ) Solution to the P n P Problem CVLab - Ecole Polytechnique Fédérale de Lausanne Francesc Moreno-Noguer Vincent Lepetit Pascal.

Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.

Visual Odometry Michael Adams CS 223B Problem: Measure trajectory of a mobile platform using visual data Mobile Platform (Car) Calibrated Camera.

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Many slides and illustrations from J. Ponce

Multiple View Geometry Marc Pollefeys University of North Carolina at Chapel Hill Modified by Philippos Mordohai.

Lecture 11: Structure from motion CS6670: Computer Vision Noah Snavely.

Triangulation and Multi-View Geometry Class 9 Read notes Section 3.3, , 5.1 (if interested, read Triggs’s paper on MVG using tensor notation, see.

Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.

Single-view geometry Odilon Redon, Cyclops, 1914.

Multiple View Reconstruction Class 23 Multiple View Geometry Comp Marc Pollefeys.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

Global Alignment and Structure from Motion Computer Vision CSE455, Winter 2008 Noah Snavely.

CSCE 641 Computer Graphics: Image-based Modeling (Cont.) Jinxiang Chai.

Lecture 12: Structure from motion CS6670: Computer Vision Noah Snavely.

Accurate, Dense and Robust Multi-View Stereopsis Yasutaka Furukawa and Jean Ponce Presented by Rahul Garg and Ryan Kaminsky.

Structure Computation. How to compute the position of a point in 3- space given its image in two views and the camera matrices of those two views Use.

3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Computer vision: models, learning and inference

Symmetric Architecture Modeling with a Single Image

Geometry and Algebra of Multiple Views

Epipolar geometry The fundamental matrix and the tensor

CSCE 643 Computer Vision: Structure from Motion

Correspondence-Free Determination of the Affine Fundamental Matrix (Tue) Young Ki Baik, Computer Vision Lab.

Acquiring 3D models of objects via a robotic stereo head David Virasinghe Department of Computer Science University of Adelaide Supervisors: Mike Brooks.

HONGIK UNIVERSITY School of Radio Science & Communication Engineering Visual Information Processing Lab Hong-Ik University School of Radio Science & Communication.

Single-view geometry Odilon Redon, Cyclops, 1914.

A minimal solution to the autocalibration of radial distortion Young Ki Baik (CV Lab.) (Wed)

IIIT HYDERABAD Image-based walkthroughs from partial and incremental scene reconstructions Kumar Srijan Syed Ahsan Ishtiaque C. V. Jawahar Center for Visual.

Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.

Peter Henry1, Michael Krainin1, Evan Herbst1,

Scene Reconstruction Seminar presented by Anton Jigalin Advanced Topics in Computer Vision ( )

3D reconstruction from uncalibrated images

Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.

Visual Odometry David Nister, CVPR 2004

Structure from motion Multi-view geometry Affine structure from motion Projective structure from motion Planches : –

Announcements No midterm Project 3 will be done in pairs same partners as for project 2.

EECS 274 Computer Vision Projective Structure from Motion.

Lecture 22: Structure from motion CS6670: Computer Vision Noah Snavely.

Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.

Discrete-Continuous Optimization for Large-scale Structure from Motion

Answering ‘Where am I?’ by Nonlinear Least Squares

Epipolar geometry.

Modeling the world with photos

Structure from motion Input: Output: (Tomasi and Kanade)

Multi-view geometry.

Single-view geometry Odilon Redon, Cyclops, 1914.

Structure from motion.

Structure from motion Input: Output: (Tomasi and Kanade)

Lecture 15: Structure from motion

Presentation transcript:

A Global Linear Method for Camera Pose Registration Nianjuan Jiang*1, Zhaopeng Cui*2, Ping Tan2 1Advanced Digital Sciences Center, Singapore 2National University of Singapore *Joint first authors

Structure from Motion (SfM) Simultaneously recover both 3D scene points and camera poses

Images with matched feature points SfM Pipeline Step 1. Epipolar geometry; compute relative motion between 2 or 3 cameras 6-point method [Quan 1995] 7-point method [Torr & Murray 1997] 8-point method (normalized) [Hartley 1997] 5-point method [Nister 2004] Images with matched feature points

SfM Pipeline Step 1. Epipolar geometry; Step 2. Camera registration; put all cameras in the same coordinate system (auto-calibration if needed [Pollefeys et al. 1998]) [Fitzgibbon & Zisserman 1998] [Pollefeys et al. 2004]

SfM Pipeline Step 1. Epipolar geometry; Step 2. Camera registration; Step 3. Bundle adjustment. optimize all cameras and points [Triggs et al. 1999]

“The Black Art ” Step 1. Epipolar geometry; Step 2. Camera registration; Step 3. Bundle adjustment. The state-of-the-art: Step 1 and 3 are very well studied with elegant theories and algorithms. The step 2 is often ad-hoc and heuristic. The camera registration to initialize bundle adjustment “… is still to some extent a black art…”. Page 452, Chapter 18.6

Typical Solutions Hierarchical solution: Iteratively merge sub-sequences [Lhuillier & Quan 2005] [Fitzgibbon & Zisserman 1998]

Typical Solutions Hierarchical solution: Incremental solution: Iteratively merge sub-sequences [Pollefeys et al. 2004] Incremental solution: Iteratively add cameras one by one [Snavely et al. 2006] [Lhuillier & Quan 2005] [Fitzgibbon & Zisserman 1998]

Pain of Existing Solutions The block diagram (for the incremental solution): Drawbacks: Repetitively calling bundle adjustment  Inefficiency 90% of the total computation time is spent on bundle adjustment. Some cameras are fixed before the others asymmetric formulation leads to inferior results. Our objective: Simultaneously register all cameras to initialize the bundle adjustment

Previous Works require coplanar cameras cannot solve translations linear global solution to rotations discrete-continuous optimization [Govindu 2001] [Crandall et al. 2011] [Hartley et al. 2013] Desirable features: Solve both rotations & translations; Linear & robust solution; No degeneracy. sensitive to outliers elegant quasi-convex optimization degenerate at collinear motion linear global solution to translations [Martinec et al. 2007] [Arie-Nachimson et al. 2012] [Kahl 2005]

The Input Epipolar Geometry The essential matrix 𝐸 encodes the relative motion 𝐸 𝑖𝑗 = 𝑡 𝑖𝑗 × 𝑅 𝑖𝑗 𝐸 𝑖𝑗 𝑡 𝑖𝑗 𝑅 𝑖𝑗 and 𝑡 𝑖𝑗 𝑅 𝑖𝑗

Rotation Registration [Martinec et al. 2007] A linear equation from every two cameras 𝑅 𝑖 =[ , , ] 𝑟 1 𝑖 𝑟 2 𝑖 𝑟 3 𝑖 𝑅 2 = 𝑅 12 𝑅 1 {cam1,cam2} ⨀ 𝑅 3 = 𝑅 23 𝑅 3 {cam2,cam3} 𝑅 𝑖𝑗 … 𝑅 𝑗 ⨀ 𝑅 𝑖 𝑅 𝑛 = 𝑅 𝑚𝑛 𝑅 𝑚 {camm,camn} 𝑅 𝑗 = 𝑅 𝑖𝑗 𝑅 𝑖

Translation Registration (3 cameras) Input: Relative translations: 𝑐 𝑖𝑗 , 𝑐 𝑖𝑘 , 𝑐 𝑗𝑘 Output: Camera positions: 𝑐 𝑖 , 𝑐 𝑗 , 𝑐 𝑘 ck 𝑐 𝑖𝑘 𝑐 𝑗𝑘 ci 𝑐 𝑖𝑗 cj

Translation Registration (3 cameras) Suppose 𝑐 𝑖 , 𝑐 𝑗 are known, 𝑐 𝑘 can be computed by: rotate 𝑐 𝑖𝑗 to match the orientation of 𝑐 𝑖𝑘 𝑅 𝑖 𝜃 𝑖 both are easy to compute shrink/grow 𝑐 𝑖𝑗 to match the length of 𝑐 𝑖𝑘 𝑠 𝑖𝑗 𝑖𝑘 A linear equation: 𝑐 𝑘 − 𝑐 𝑖 = 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ck 𝑐 𝑖𝑘 𝜃 𝑖 cj cj ci 𝑐 𝑖𝑗

Translation Registration (3 cameras) A similar linear equation by matching 𝑐 𝑖𝑗 and 𝑐 𝑗𝑘 𝑐 𝑘 − 𝑐 𝑗 = 𝑅 𝑗 − 𝜃 𝑗 𝑠 𝑖𝑗 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑗 ) ck 𝑐 𝑗𝑘 𝜃 𝑗 ci ci 𝑐 𝑖𝑗 cj

Translation Registration (3 cameras) A geometric explanation 𝜋 1 : plane spanned by 𝑐 𝑖𝑗 and 𝑐 𝑖𝑘 𝑐 𝑘 − 𝑐 𝑖 = 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) 𝜋 2 : plane spanned by 𝑐 𝑖𝑗 and 𝑐 𝑗𝑘 𝑐 𝑘 − 𝑐 𝑗 = 𝑅 𝑗 − 𝜃 𝑗 𝑠 𝑖𝑗 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑗 ) 𝜋 1 and 𝜋 2 are non-coplanar ck 𝜋 1 𝜋 2 ci cj

Translation Registration (3 cameras) A geometric explanation 𝑐 𝑘 − 𝑐 𝑖 = 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) 𝑐 𝑘 = 𝑐 𝑖 + 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) ≈A 𝑐 𝑘 − 𝑐 𝑗 = 𝑅 𝑗 − 𝜃 𝑗 𝑠 𝑖𝑗 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑗 ) 𝑐 𝑘 = 𝑐 𝑗 + 𝑅 𝑗 − 𝜃 𝑗 𝑠 𝑖𝑗 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑗 ) ≈𝐵 see derivation in the paper 𝜋 1 𝜋 2 𝐴𝐵: the mutual perpendicular line ck A 𝑐 𝑘 : the middle point of 𝐴𝐵 B Our linear equations minimizes an approximate geometric error! ci cj

Translation Registration (3 cameras) No degeneracy with collinear motion ck 𝑐 𝑖𝑘 𝑐 𝑗𝑘 ci 𝑐 𝑖𝑗 cj 𝑐 𝑘 − 𝑐 𝑖 = 𝑅 𝑖 0 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) 𝑐 𝑘 − 𝑐 𝑗 = 𝑅 𝑗 0 𝑠 𝑖𝑗 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑗 )

Translation Registration (3 cameras) Suppose 𝑐 𝑖 , 𝑐 𝑘 are known, 𝑐 𝑗 can be computed by: 𝑐 𝑗 − 𝑐 𝑖 = 𝑅 𝑖 − 𝜃 𝑖 𝑠 𝑖𝑘 𝑖𝑗 ( 𝑐 𝑘 − 𝑐 𝑖 ) 𝑐 𝑗 − 𝑐 𝑘 = 𝑅 𝑘 𝜃 𝑘 𝑠 𝑖𝑘 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑘 ) ck 𝜃 𝑘 𝑐 𝑖𝑘 𝑐 𝑗𝑘 𝜃 𝑖 ci 𝑐 𝑖𝑗 cj

Translation Registration (3 cameras) Suppose 𝑐 𝑗 , 𝑐 𝑘 are known, 𝑐 𝑖 can be computed by: 𝑐 𝑖 − 𝑐 𝑘 = 𝑅 𝑘 − 𝜃 𝑘 𝑠 𝑗𝑘 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑘 ) 𝑐 𝑖 − 𝑐 𝑗 = 𝑅 𝑗 𝜃 𝑗 𝑠 𝑗𝑘 𝑖𝑗 ( 𝑐 𝑘 − 𝑐 𝑗 ) ck 𝜃 𝑘 𝑐 𝑖𝑘 𝑐 𝑗𝑘 𝜃 𝑗 ci 𝑐 𝑖𝑗 cj

Translation Registration (3 cameras) Collecting all six equations 𝑐 𝑘 − 𝑐 𝑖 = 𝑅 𝑖 𝜃 𝑖 𝑠 𝑖𝑗 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑖 ) 𝑐 𝑘 − 𝑐 𝑗 = 𝑅 𝑗 − 𝜃 𝑗 𝑠 𝑖𝑗 𝑗𝑘 𝑐 𝑖 − 𝑐 𝑗 𝑐 𝑗 − 𝑐 𝑖 = 𝑅 𝑖 −𝜃 𝑖 𝑠 𝑖𝑘 𝑖𝑗 ( 𝑐 𝑘 − 𝑐 𝑖 ) 𝑐 𝑗 − 𝑐 𝑘 = 𝑅 𝑘 ( 𝜃 𝑘 )𝑠 𝑖𝑘 𝑗𝑘 ( 𝑐 𝑖 − 𝑐 𝑘 ) 𝑐 𝑖 − 𝑐 𝑗 = 𝑅 𝑗 𝜃 𝑗 𝑠 𝑗𝑘 𝑖𝑗 ( 𝑐 𝑘 − 𝑐 𝑗 ) 𝑐 𝑖 − 𝑐 𝑘 = 𝑅 𝑘 ( −𝜃 𝑘 )𝑠 𝑗𝑘 𝑖𝑘 ( 𝑐 𝑗 − 𝑐 𝑘 ) 𝐵 𝑖𝑗𝑘 𝑐 𝑖 𝑐 𝑗 𝑐 𝑘 =0

Translation Registration (n cameras) Generalize to n cameras 1. Collect equations from all triangles in the match graph. 𝑌= 𝑐 1 𝑐 2 𝑐 3 𝑐 4 𝑐 5 𝑐 6 𝑐 7 𝑐 8 𝑐 9 The match graph: each camera is a vertex, connect two cameras if their relative motion is known. 𝐵𝑌=0 2. Solve all equations 𝐵 1 𝑐 1 , 𝑐 2 , 𝑐 3 =0 cameras can be non-coplanar. 𝐵 2 𝑐 2 , 𝑐 3 , 𝑐 4 =0 𝐵 𝑐 3 , 𝑐 4 , 𝑐 6 =0 𝐵 4 𝑐 4 , 𝑐 5 , 𝑐 6 =0 𝐵 5 𝑐 5 , 𝑐 6 , 𝑐 7 =0 𝐵 6 𝑐 6 , 𝑐 7 , 𝑐 8 =0 𝐵 7 𝑐 7 , 𝑐 8 , 𝑐 9 =0

Triangulation Once cameras are fixed, triangulate matched corners to generate 3D points.

Robustness Issues Exclude unreliable triplets More consistency checks in the paper 𝑐 𝑖𝑗 = 𝑐 𝑖𝑗 𝑐 𝑖𝑘 = 𝑐 𝑖𝑘 𝑐 𝑗𝑘 = 𝑐 𝑗𝑘 Check if ?? 𝑐 𝑖𝑘 𝑐 𝑗𝑘 𝑐 𝑖𝑗 𝑐 𝑖𝑘 𝑐 𝑗𝑘 𝑐 𝑖𝑗

Results Accuracy evaluation: Compare with recent methods on data with known ground truth. Fountain-P11 Herz-Jesu-P25 Castle-P30 Fountain-P11 Herz-Jesu-P25 Castle-P30 c meters R degrees Ours 0.0139 0.1954 0.0636 0.1880 0.2345 0.4800 [Arie-Nachimson et al. 2012] 0.0226 0.4211 0.0479 0.3125 - [Sinha et al. 2010] 0.1317 0.2538 VisualSFM 0.0364 0.2794 0.0551 0.2868 0.2639 0.3980 All results are after the final bundle adjustment.

Results Efficiency evaluation: Building Trevi Fountain Pisa Notre Dame Our Method Visual-SFM Total running time (s)* 17 62 49 479 69 135 1790 BA time (s) 11 57 20 442 52 444 61 1715 Registration time (s) 6 5 29 37 12 74 75 # of reconstructed images 128 362 365 480 1255 1253 # of reconstructed points 91,290 78,100 103,629 104,657 134,555 129,484 297,766 292,277 * The total running time excludes the time spent on feature matching and epipolar geometry computation.

Conclusions A global solution for orientations & positions; Linear, robust & geometrically meaningful; No degeneracy.

Thanks! code & data available at: http://www.ece.nus.edu.sg/stfpage/eletp/

Results A large scale scene Quasi-dense points generated by CMVS [Furukawa et al. 2010] for better visualization.