Geometry 3: Stereo Reconstruction

Slides:



Advertisements
Similar presentations
The fundamental matrix F
Advertisements

Lecture 11: Two-view geometry
CSE473/573 – Stereo and Multiple View Geometry
3D reconstruction.
Geometry 2: A taste of projective geometry Introduction to Computer Vision Ronen Basri Weizmann Institute of Science.
Stereo Vision Reading: Chapter 11
Gratuitous Picture US Naval Artillery Rangefinder from World War I (1918)!!
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.
MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.
Geometry 1: Projection and Epipolar Lines Introduction to Computer Vision Ronen Basri Weizmann Institute of Science.
Lecture 8: Stereo.
Stereo.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Epipolar lines epipolar lines Baseline O O’ epipolar plane.
Camera calibration and epipolar geometry
Last Time Pinhole camera model, projection
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2005 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Computer Vision : CISC 4/689 Adaptation from: Prof. James M. Rehg, G.Tech.
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
Stereo & Iterative Graph-Cuts Alex Rav-Acha Vision Course Hebrew University.
Multiple-view Reconstruction from Points and Lines
Stereopsis Mark Twain at Pool Table", no date, UCR Museum of Photography.
The plan for today Camera matrix
Stereo Computation using Iterative Graph-Cuts
CSE473/573 – Stereo Correspondence
Announcements PS3 Due Thursday PS4 Available today, due 4/17. Quiz 2 4/24.
Stereo Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision (with slides by James Rehg and.
Multiple View Geometry : Computational Photography Alexei Efros, CMU, Fall 2006 © Martin Quinn …with a lot of slides stolen from Steve Seitz and.
Stereo matching “Stereo matching” is the correspondence problem –For a point in Image #1, where is the corresponding point in Image #2? C1C1 C2C2 ? ? C1C1.
Stereo matching Class 10 Read Chapter 7 Tsukuba dataset.
3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.
Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.
Automatic Camera Calibration
Computer vision: models, learning and inference
Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.
Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.
Stereo Vision Reading: Chapter 11 Stereo matching computes depth from two or more images Subproblems: –Calibrating camera positions. –Finding all corresponding.
Geometry 3: Stereo Reconstruction Introduction to Computer Vision Ronen Basri Weizmann Institute of Science.
Stereo Many slides adapted from Steve Seitz.
CS 4487/6587 Algorithms for Image Analysis
Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image image 1image.
Computer Vision, Robert Pless
Computer Vision Stereo Vision. Bahadir K. Gunturk2 Pinhole Camera.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.
stereo Outline : Remind class of 3d geometry Introduction
Feature Matching. Feature Space Outlier Rejection.
Solving for Stereo Correspondence Many slides drawn from Lana Lazebnik, UIUC.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry Topics: Basics of projective geometry Points and hyperplanes in projective space Homography.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Project 2 due today Project 3 out today Announcements TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA.
Correspondence and Stereopsis Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]
John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.
Energy minimization Another global approach to improve quality of correspondences Assumption: disparities vary (mostly) smoothly Minimize energy function:
Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.
CSE 185 Introduction to Computer Vision Stereo 2.
55:148 Digital Image Processing Chapter 11 3D Vision, Geometry
CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.
STEREOPSIS The Stereopsis Problem: Fusion and Reconstruction
Epipolar geometry.
EECS 274 Computer Vision Stereopsis.
What have we learned so far?
Presented by: Cindy Yan EE6358 Computer Vision
Haim Kaplan and Uri Zwick
Multiple View Geometry for Robotics
Reconstruction.
Computer Vision Stereo Vision.
Chapter 11: Stereopsis Stereopsis: Fusing the pictures taken by two cameras and exploiting the difference (or disparity) between them to obtain the depth.
Stereo vision Many slides adapted from Steve Seitz.
Presentation transcript:

Geometry 3: Stereo Reconstruction Introduction to Computer Vision Ronen Basri Weizmann Institute of Science

Material covered Pinhole camera model, perspective projection Two view geometry, general case: Epipolar geometry, the essential matrix Camera calibration, the fundamental matrix Two view geometry, degenerate cases Homography (planes, camera rotation) A taste of projective geometry Stereo vision: 3D reconstruction from two views Multi-view geometry, reconstruction through factorization

Summary of last lecture Homography Perspective (calibrated) Perspective (uncalibrated) Orthographic Form 𝑞∝𝐻𝑝 𝑞 𝑇 𝐸𝑝=0 𝑞 𝑇 𝐹𝑝=0 Properties One-to-one (group) Concentric epipolar lines Parallel epipolar lines DOFs 8(5) 8(7) 4 Eqs/pnt 2 1 Minimal configuration 5+ (8,linear) 7+ (8,linear) Depth No Yes, up to scale Yes, projective structure Affine structure (third view required for Euclidean structure)

Camera rotation Images obtained by rotating the camera about its optical axis are related by homography: 𝑞∝𝑅𝑝 (𝑡=0) Verify that 𝑞 does not depend on 𝑍: 𝑥 ′ = 𝑓( 𝑟 11 𝑋+ 𝑟 12 𝑌+ 𝑟 13 𝑍) 𝑟 31 𝑋+ 𝑟 32 𝑌+ 𝑟 33 𝑍 , 𝑦 ′ = 𝑓( 𝑟 21 𝑋+ 𝑟 22 𝑌+ 𝑟 23 𝑍) 𝑟 31 𝑋+ 𝑟 32 𝑌+ 𝑟 33 𝑍 𝑥 ′ = 𝑟 11 𝑥+ 𝑟 12 𝑦+ 𝑟 13 𝑓 𝑟 31 𝑥+ 𝑟 32 𝑦+ 𝑟 33 𝑓 , 𝑦 ′ = 𝑓( 𝑟 11 𝑥+ 𝑟 12 𝑦+ 𝑟 13 𝑓) 𝑟 31 𝑥+ 𝑟 32 𝑦+ 𝑟 33 𝑓

Planar scene For a planar scene 𝑞∝𝐻𝑝, with 𝐻=𝑅+ 1 𝑑 𝑡 𝑛 𝑇 𝑄=𝑅𝑃+𝑡 and 𝑎𝑋+𝑏𝑌+𝑐𝑍=𝑑 𝑎𝑥+𝑏𝑦+𝑐𝑓= 𝑑𝑓 𝑍 𝑥 ′ = 𝑓( 𝑟 11 𝑋+ 𝑟 12 𝑌+ 𝑟 13 𝑍+ 𝑡 𝑥 ) 𝑟 31 𝑋+ 𝑟 32 𝑌+ 𝑟 33 𝑍+ 𝑡 𝑧 𝑦 ′ = 𝑓( 𝑟 21 𝑋+ 𝑟 22 𝑌+ 𝑟 23 𝑍+ 𝑡 𝑦 ) 𝑟 31 𝑋+ 𝑟 32 𝑌+ 𝑟 33 𝑍+ 𝑡 𝑧 𝑥 ′ = 𝑟 11 𝑥+ 𝑟 12 𝑦+ 𝑟 13 𝑓+ 𝑡 𝑥 𝑓/𝑍 𝑟 31 𝑥+ 𝑟 32 𝑦+ 𝑟 33 𝑓+ 𝑡 𝑧 𝑓/𝑍 𝑦 ′ = 𝑟 21 𝑥+ 𝑟 22 𝑦+ 𝑟 23 𝑓+ 𝑡 𝑦 𝑓/𝑍 𝑟 31 𝑥+ 𝑟 32 𝑦+ 𝑟 33 𝑓+ 𝑡 𝑧 𝑓/𝑍

Epipolar lines 𝑝′ 𝑇 𝐸𝑝=0 epipolar plane epipolar lines epipolar lines Baseline O O’ 𝑝′ 𝑇 𝐸𝑝=0

Rectification Rectification: rotation and scaling of each camera’s coordinate frame to make the epipolar lines horizontal and equi-height, by bringing the two image planes to be parallel to the baseline Rectification is achieved by applying homography to each of the two images

Rectification 𝐻 𝑙 𝐻 𝑟 Baseline O O’ 𝑞′ 𝑇 𝐻 𝑙 −𝑇 𝐸 𝐻 𝑟 −1 𝑞=0

Cyclopean coordinates Given a rectified stereo rig with baseline length 𝑏, we place the origin at the midpoint between the camera centers. a point 𝑋,𝑌,𝑍 is projected to: Left image: 𝑥 𝑙 = 𝑓(𝑋−𝑏/2) 𝑍 , 𝑦 𝑙 = 𝑓𝑌 𝑍 Right image: 𝑥 𝑟 = 𝑓(𝑋+𝑏/2) 𝑍 , 𝑦 𝑟 = 𝑓𝑌 𝑍 Cyclopean coordinates: 𝑋= 𝑏( 𝑥 𝑟 + 𝑥 𝑙 ) 2( 𝑥 𝑟 − 𝑥 𝑙 ) , Y= 𝑏( 𝑦 𝑟 + 𝑦 𝑙 ) 2( 𝑥 𝑟 − 𝑥 𝑙 ) , 𝑍= 𝑓𝑏 𝑥 𝑟 − 𝑥 𝑙

Disparity 𝑥 𝑟 − 𝑥 𝑙 = 𝑓𝑏 𝑍 Disparity is inverse proportional to depth Constant disparity ⟺ constant depth Larger baseline, more stable reconstruction of depth (but more occlusions, correspondence is harder) (Note that disparity is defined in a rectified rig in a cyclopean coordinate frame)

The correspondence problem Stereo matching is ill-posed: Matching ambiguity: different regions may look similar

The correspondence problem Stereo matching is ill-posed: Matching ambiguity: different regions may look similar Specular reflectance: multiple depth values

Random dot stereogram Depth is perceived from a pair of random dot images Stereo perception is based solely on local information (low level)

Moving random dots

Compared elements for correspondence Single pixel intensities Pixel color Small window (e.g. 3×3 or 5×5), often using normalized correlation to offset gain Features and edges Mini segments

Dynamic programming Each pair of epipolar lines is compared independently Local cost, sum of unary term and binary term Unary term: cost of a single match Binary term: cost of change of disparity (occlusion) Analogous to string matching (‘diff’ in Unix)

String matching Swing → String S t r i n g Start S w i n g End

String matching Cost: #substitutions + #insertions + #deletions S w i n g

Stereo with dynamic programming Shortest path in a grid Diagonals: constant disparity Moving along the diagonal – pay unary cost (cost of pixel match) Move sideways – pay binary cost, i.e. disparity change (occlusion, right or left) Cost prefers fronto-parallel planes. Penalty is paid for tilted planes

Dynamic programming on a grid Start 𝑇 𝑖𝑗 = max ( 𝑇 𝑖−1,𝑗 + 𝐶 𝑖−1,𝑗→𝑖,𝑗 , 𝑇 𝑖−1,𝑗−1 + 𝐶 𝑖−1,𝑗−1→𝑖,𝑗 , 𝑇 𝑖−1,𝑗−1 + 𝐶 𝑖,𝑗−1→𝑖,𝑗 ) Complexity?

Probability interpretation: the Viterbi algorithm Markov chain States: discrete set of disparity 𝑃 𝑑 1 ,…, 𝑑 𝑛 = 𝑃 1 ( 𝑑 1 ) 𝑖=2 𝑛 𝑃 𝑖 𝑑 𝑖 𝑃 𝑖−1,𝑖 ( 𝑑 𝑖−1 , 𝑑 𝑖 ) Log probabilities: product ⟹ sum

Probability interpretation: the Viterbi algorithm Markov chain States: discrete set of disparity − log 𝑃 𝑑 1 ,…, 𝑑 𝑛 =− log 𝑃 1 𝑑 1 − 𝑖=2 𝑛 (log 𝑃 𝑖 𝑑 𝑖 +log 𝑃 𝑖−1,𝑖 𝑑 𝑖−1 , 𝑑 𝑖 ) Maximum likelihood: minimize sum of negative logs Viterbi algorithm: equivalent to shortest path

Dynamic programming: pros and cons Advantages: Simple, efficient Achieves global optimum Generally works well Disadvantages:

Dynamic programming: pros and cons Advantages: Simple, efficient Achieves global optimum Generally works well Disadvantages: Works separately on each epipolar line, does not enforce smoothness across epipolars Prefers fronto-parallel planes Too local? (considers only immediate neighbors)

Markov random field Graph 𝐺= 𝑉,𝐸 In our case: graph is a 4-connected grid representing one image States: disparity Minimize energy of the form 𝐸(𝒟)= (𝑝,𝑞)∈𝐸 𝑉 𝑝,𝑞 𝑑 𝑝 , 𝑑 𝑞 + 𝑝∈𝑉 𝐷 𝑝 ( 𝑑 𝑝 ) Interpreted as negative log probabilities

Iterated conditional modes (ICM) Initialize states (= disparities) for every pixel Update repeatedly each pixel by the most likely disparity given the values assigned to its neighbors: min 𝑑 𝑝 𝑞∈𝒩(𝑝) 𝑉 𝑝,𝑞 𝑑 𝑝 , 𝑑 𝑞 + 𝐷 𝑝 ( 𝑑 𝑝 ) Markov blanket: the state of a pixel only depends on the states of its immediate neighbors Similar to Gauss-Seidel iterations Slow convergence to (often bad) local minimum

Graph cuts: expansion moves Assume 𝐷 𝑥 is non-negative and 𝑉 𝑥,𝑦 is metric: 𝑉 𝑥,𝑥 =0 𝑉 𝑥,𝑦 =𝑉 𝑦,𝑥 𝑉 𝑥,𝑦 ≤𝑉 𝑥,𝑧 +𝑉 𝑧,𝑦 We can apply more semi-global moves using minimal s-t cuts Converges faster to a better (local) minimum

α-Expansion In any one round, expansion move allows each pixel to either change its state to α, or maintain its previous state Each round is implemented via max flow/min cut One iteration: apply expansion moves sequentially with all possible disparity values Repeat till convergence

α-Expansion Every round achieves a globally optimal solution over one expansion move Energy decreases (non-increasing) monotonically between rounds At convergence energy is optimal with respect to all expansion moves, and within a scale factor from the global optimum: 𝐸( 𝒟 𝑒𝑥𝑝𝑎𝑛𝑠𝑖𝑜𝑛 )≤2𝑐𝐸( 𝒟 ∗ ) where 𝑐= max 𝛼≠𝛽∈𝒟 𝑉(𝛼,𝛽) min 𝛼≠𝛽∈𝒟 𝑉(𝛼,𝛽)

α-Expansion (1D example) 𝑑 𝑝 𝑑 𝑞

α-Expansion (1D example) 𝛼 𝛼

α-Expansion (1D example) 𝛼 𝐷 𝑝 (𝛼) 𝐷 𝑞 (𝛼) 𝑉 𝑝𝑞 𝛼,𝛼 =0 𝛼

α-Expansion (1D example) 𝛼 But what about 𝑉 𝑝𝑞 ( 𝑑 𝑝 , 𝑑 𝑞 )? 𝐷 𝑝 ( 𝑑 𝑝 ) 𝐷 𝑞 ( 𝑑 𝑞 ) 𝛼

α-Expansion (1D example) 𝛼 𝑉 𝑝𝑞 ( 𝑑 𝑝 , 𝑑 𝑞 ) 𝐷 𝑝 ( 𝑑 𝑝 ) 𝐷 𝑞 ( 𝑑 𝑞 ) 𝛼

α-Expansion (1D example) 𝛼 𝐷 𝑞 (𝛼) 𝑉 𝑝𝑞 ( 𝑑 𝑝 ,𝛼) 𝐷 𝑝 ( 𝑑 𝑝 ) 𝛼

α-Expansion (1D example) 𝛼 𝐷 𝑝 (𝛼) 𝑉 𝑝𝑞 (𝛼, 𝑑 𝑞 ) 𝐷 𝑞 ( 𝑑 𝑞 ) 𝛼

α-Expansion (1D example) 𝛼 𝑉 𝑝𝑞 ( 𝑑 𝑝 ,𝛼) 𝑉 𝑝𝑞 (𝛼, 𝑑 𝑞 ) 𝑉 𝑝𝑞 ( 𝑑 𝑝 , 𝑑 𝑞 ) Such a cut cannot be obtained due to triangle inequality: 𝑉 𝑝𝑞 (𝛼, 𝑑 𝑞 ) ≤𝑉 𝑝𝑞 𝑑 𝑝 , 𝑑 𝑞 + 𝑉 𝑝𝑞 ( 𝑑 𝑝 ,𝛼) 𝛼

Common metrics Potts model: 𝑉 𝑥,𝑦 = 0 𝑥=𝑦 1 𝑥≠𝑦 𝑉 𝑥,𝑦 = 𝑥−𝑦 𝑉 𝑥,𝑦 = 0 𝑥=𝑦 1 𝑥≠𝑦 𝑉 𝑥,𝑦 = 𝑥−𝑦 𝑉 𝑥,𝑦 = 𝑥−𝑦 2 Truncated ℓ 1 : 𝑉 𝑥,𝑦 = 𝑥−𝑦 𝑥−𝑦 <𝑇 𝑇 otherwise Truncated squared difference is not a metric

Reconstruction with graph-cuts Original Result Ground truth

A different application: detect skyline Input: one image, oriented with sky above Objective: find the skyline in the image Graph: grid Two states: sky, ground Unary (data) term: State = sky, low if blue, otherwise high State = ground, high if blue, otherwise low Binary term for vertical connections: If state(node)=sky then state(node above)=sky (infinity if not) If state(node)=ground then state(node below)= ground Solve with expansion move. This is a two state problem, and so graph cut finds the global optimum in one expansion move