Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geometry 3: Stereo Reconstruction

Similar presentations


Presentation on theme: "Geometry 3: Stereo Reconstruction"β€” Presentation transcript:

1 Geometry 3: Stereo Reconstruction
Introduction to Computer Vision Ronen Basri Weizmann Institute of Science

2 Material covered Pinhole camera model, perspective projection
Two view geometry, general case: Epipolar geometry, the essential matrix Camera calibration, the fundamental matrix Two view geometry, degenerate cases Homography (planes, camera rotation) A taste of projective geometry Stereo vision: 3D reconstruction from two views Multi-view geometry, reconstruction through factorization

3 Summary of last lecture
Homography Perspective (calibrated) Perspective (uncalibrated) Orthographic Form π‘žβˆπ»π‘ π‘ž 𝑇 𝐸𝑝=0 π‘ž 𝑇 𝐹𝑝=0 Properties One-to-one (group) Concentric epipolar lines Parallel epipolar lines DOFs 8(5) 8(7) 4 Eqs/pnt 2 1 Minimal configuration 5+ (8,linear) 7+ (8,linear) Depth No Yes, up to scale Yes, projective structure Affine structure (third view required for Euclidean structure)

4 Camera rotation Images obtained by rotating the camera about its optical axis are related by homography: π‘žβˆπ‘…π‘ (𝑑=0) Verify that π‘ž does not depend on 𝑍: π‘₯ β€² = 𝑓( π‘Ÿ 11 𝑋+ π‘Ÿ 12 π‘Œ+ π‘Ÿ 13 𝑍) π‘Ÿ 31 𝑋+ π‘Ÿ 32 π‘Œ+ π‘Ÿ 33 𝑍 , 𝑦 β€² = 𝑓( π‘Ÿ 21 𝑋+ π‘Ÿ 22 π‘Œ+ π‘Ÿ 23 𝑍) π‘Ÿ 31 𝑋+ π‘Ÿ 32 π‘Œ+ π‘Ÿ 33 𝑍 π‘₯ β€² = π‘Ÿ 11 π‘₯+ π‘Ÿ 12 𝑦+ π‘Ÿ 13 𝑓 π‘Ÿ 31 π‘₯+ π‘Ÿ 32 𝑦+ π‘Ÿ 33 𝑓 , 𝑦 β€² = 𝑓( π‘Ÿ 11 π‘₯+ π‘Ÿ 12 𝑦+ π‘Ÿ 13 𝑓) π‘Ÿ 31 π‘₯+ π‘Ÿ 32 𝑦+ π‘Ÿ 33 𝑓

5 Planar scene For a planar scene π‘žβˆπ»π‘, with 𝐻=𝑅+ 1 𝑑 𝑑 𝑛 𝑇
𝑄=𝑅𝑃+𝑑 and π‘Žπ‘‹+π‘π‘Œ+𝑐𝑍=𝑑 π‘Žπ‘₯+𝑏𝑦+𝑐𝑓= 𝑑𝑓 𝑍 π‘₯ β€² = 𝑓( π‘Ÿ 11 𝑋+ π‘Ÿ 12 π‘Œ+ π‘Ÿ 13 𝑍+ 𝑑 π‘₯ ) π‘Ÿ 31 𝑋+ π‘Ÿ 32 π‘Œ+ π‘Ÿ 33 𝑍+ 𝑑 𝑧 𝑦 β€² = 𝑓( π‘Ÿ 21 𝑋+ π‘Ÿ 22 π‘Œ+ π‘Ÿ 23 𝑍+ 𝑑 𝑦 ) π‘Ÿ 31 𝑋+ π‘Ÿ 32 π‘Œ+ π‘Ÿ 33 𝑍+ 𝑑 𝑧 π‘₯ β€² = π‘Ÿ 11 π‘₯+ π‘Ÿ 12 𝑦+ π‘Ÿ 13 𝑓+ 𝑑 π‘₯ 𝑓/𝑍 π‘Ÿ 31 π‘₯+ π‘Ÿ 32 𝑦+ π‘Ÿ 33 𝑓+ 𝑑 𝑧 𝑓/𝑍 𝑦 β€² = π‘Ÿ 21 π‘₯+ π‘Ÿ 22 𝑦+ π‘Ÿ 23 𝑓+ 𝑑 𝑦 𝑓/𝑍 π‘Ÿ 31 π‘₯+ π‘Ÿ 32 𝑦+ π‘Ÿ 33 𝑓+ 𝑑 𝑧 𝑓/𝑍

6 Epipolar lines 𝑝′ 𝑇 𝐸𝑝=0 epipolar plane epipolar lines epipolar lines
Baseline O O’ 𝑝′ 𝑇 𝐸𝑝=0

7 Rectification Rectification: rotation and scaling of each camera’s coordinate frame to make the epipolar lines horizontal and equi-height, by bringing the two image planes to be parallel to the baseline Rectification is achieved by applying homography to each of the two images

8 Rectification 𝐻 𝑙 𝐻 π‘Ÿ Baseline O O’ π‘žβ€² 𝑇 𝐻 𝑙 βˆ’π‘‡ 𝐸 𝐻 π‘Ÿ βˆ’1 π‘ž=0

9 Cyclopean coordinates
Given a rectified stereo rig with baseline length 𝑏, we place the origin at the midpoint between the camera centers. a point 𝑋,π‘Œ,𝑍 is projected to: Left image: π‘₯ 𝑙 = 𝑓(π‘‹βˆ’π‘/2) 𝑍 , 𝑦 𝑙 = π‘“π‘Œ 𝑍 Right image: π‘₯ π‘Ÿ = 𝑓(𝑋+𝑏/2) 𝑍 , 𝑦 π‘Ÿ = π‘“π‘Œ 𝑍 Cyclopean coordinates: 𝑋= 𝑏( π‘₯ π‘Ÿ + π‘₯ 𝑙 ) 2( π‘₯ π‘Ÿ βˆ’ π‘₯ 𝑙 ) , Y= 𝑏( 𝑦 π‘Ÿ + 𝑦 𝑙 ) 2( π‘₯ π‘Ÿ βˆ’ π‘₯ 𝑙 ) , 𝑍= 𝑓𝑏 π‘₯ π‘Ÿ βˆ’ π‘₯ 𝑙

10 Disparity π‘₯ π‘Ÿ βˆ’ π‘₯ 𝑙 = 𝑓𝑏 𝑍 Disparity is inverse proportional to depth
Constant disparity ⟺ constant depth Larger baseline, more stable reconstruction of depth (but more occlusions, correspondence is harder) (Note that disparity is defined in a rectified rig in a cyclopean coordinate frame)

11 The correspondence problem
Stereo matching is ill-posed: Matching ambiguity: different regions may look similar

12 The correspondence problem
Stereo matching is ill-posed: Matching ambiguity: different regions may look similar Specular reflectance: multiple depth values

13 Random dot stereogram Depth is perceived from a pair of random dot images Stereo perception is based solely on local information (low level)

14 Moving random dots

15 Compared elements for correspondence
Single pixel intensities Pixel color Small window (e.g. 3Γ—3 or 5Γ—5), often using normalized correlation to offset gain Features and edges Mini segments

16 Dynamic programming Each pair of epipolar lines is compared independently Local cost, sum of unary term and binary term Unary term: cost of a single match Binary term: cost of change of disparity (occlusion) Analogous to string matching (β€˜diff’ in Unix)

17 String matching Swing β†’ String S t r i n g Start S w i n g End

18 String matching Cost: #substitutions + #insertions + #deletions
S w i n g

19

20 Stereo with dynamic programming
Shortest path in a grid Diagonals: constant disparity Moving along the diagonal – pay unary cost (cost of pixel match) Move sideways – pay binary cost, i.e. disparity change (occlusion, right or left) Cost prefers fronto-parallel planes. Penalty is paid for tilted planes

21 Dynamic programming on a grid
Start 𝑇 𝑖𝑗 = max ( 𝑇 π‘–βˆ’1,𝑗 + 𝐢 π‘–βˆ’1,𝑗→𝑖,𝑗 , 𝑇 π‘–βˆ’1,π‘—βˆ’1 + 𝐢 π‘–βˆ’1,π‘—βˆ’1→𝑖,𝑗 , 𝑇 π‘–βˆ’1,π‘—βˆ’1 + 𝐢 𝑖,π‘—βˆ’1→𝑖,𝑗 ) Complexity?

22 Probability interpretation: the Viterbi algorithm
Markov chain States: discrete set of disparity 𝑃 𝑑 1 ,…, 𝑑 𝑛 = 𝑃 1 ( 𝑑 1 ) 𝑖=2 𝑛 𝑃 𝑖 𝑑 𝑖 𝑃 π‘–βˆ’1,𝑖 ( 𝑑 π‘–βˆ’1 , 𝑑 𝑖 ) Log probabilities: product ⟹ sum

23 Probability interpretation: the Viterbi algorithm
Markov chain States: discrete set of disparity βˆ’ log 𝑃 𝑑 1 ,…, 𝑑 𝑛 =βˆ’ log 𝑃 1 𝑑 1 βˆ’ 𝑖=2 𝑛 (log 𝑃 𝑖 𝑑 𝑖 +log 𝑃 π‘–βˆ’1,𝑖 𝑑 π‘–βˆ’1 , 𝑑 𝑖 ) Maximum likelihood: minimize sum of negative logs Viterbi algorithm: equivalent to shortest path

24 Dynamic programming: pros and cons
Advantages: Simple, efficient Achieves global optimum Generally works well Disadvantages:

25 Dynamic programming: pros and cons
Advantages: Simple, efficient Achieves global optimum Generally works well Disadvantages: Works separately on each epipolar line, does not enforce smoothness across epipolars Prefers fronto-parallel planes Too local? (considers only immediate neighbors)

26 Markov random field Graph 𝐺= 𝑉,𝐸 In our case: graph is a 4-connected grid representing one image States: disparity Minimize energy of the form 𝐸(π’Ÿ)= (𝑝,π‘ž)∈𝐸 𝑉 𝑝,π‘ž 𝑑 𝑝 , 𝑑 π‘ž + π‘βˆˆπ‘‰ 𝐷 𝑝 ( 𝑑 𝑝 ) Interpreted as negative log probabilities

27 Iterated conditional modes (ICM)
Initialize states (= disparities) for every pixel Update repeatedly each pixel by the most likely disparity given the values assigned to its neighbors: min 𝑑 𝑝 π‘žβˆˆπ’©(𝑝) 𝑉 𝑝,π‘ž 𝑑 𝑝 , 𝑑 π‘ž + 𝐷 𝑝 ( 𝑑 𝑝 ) Markov blanket: the state of a pixel only depends on the states of its immediate neighbors Similar to Gauss-Seidel iterations Slow convergence to (often bad) local minimum

28 Graph cuts: expansion moves
Assume 𝐷 π‘₯ is non-negative and 𝑉 π‘₯,𝑦 is metric: 𝑉 π‘₯,π‘₯ =0 𝑉 π‘₯,𝑦 =𝑉 𝑦,π‘₯ 𝑉 π‘₯,𝑦 ≀𝑉 π‘₯,𝑧 +𝑉 𝑧,𝑦 We can apply more semi-global moves using minimal s-t cuts Converges faster to a better (local) minimum

29 Ξ±-Expansion In any one round, expansion move allows each pixel to either change its state to Ξ±, or maintain its previous state Each round is implemented via max flow/min cut One iteration: apply expansion moves sequentially with all possible disparity values Repeat till convergence

30 Ξ±-Expansion Every round achieves a globally optimal solution over one expansion move Energy decreases (non-increasing) monotonically between rounds At convergence energy is optimal with respect to all expansion moves, and within a scale factor from the global optimum: 𝐸( π’Ÿ 𝑒π‘₯π‘π‘Žπ‘›π‘ π‘–π‘œπ‘› )≀2𝑐𝐸( π’Ÿ βˆ— ) where 𝑐= max π›Όβ‰ π›½βˆˆπ’Ÿ 𝑉(𝛼,𝛽) min π›Όβ‰ π›½βˆˆπ’Ÿ 𝑉(𝛼,𝛽)

31 Ξ±-Expansion (1D example)
𝑑 𝑝 𝑑 π‘ž

32 Ξ±-Expansion (1D example)
𝛼 𝛼

33 Ξ±-Expansion (1D example)
𝛼 𝐷 𝑝 (𝛼) 𝐷 π‘ž (𝛼) 𝑉 π‘π‘ž 𝛼,𝛼 =0 𝛼

34 Ξ±-Expansion (1D example)
𝛼 But what about 𝑉 π‘π‘ž ( 𝑑 𝑝 , 𝑑 π‘ž )? 𝐷 𝑝 ( 𝑑 𝑝 ) 𝐷 π‘ž ( 𝑑 π‘ž ) 𝛼

35 Ξ±-Expansion (1D example)
𝛼 𝑉 π‘π‘ž ( 𝑑 𝑝 , 𝑑 π‘ž ) 𝐷 𝑝 ( 𝑑 𝑝 ) 𝐷 π‘ž ( 𝑑 π‘ž ) 𝛼

36 Ξ±-Expansion (1D example)
𝛼 𝐷 π‘ž (𝛼) 𝑉 π‘π‘ž ( 𝑑 𝑝 ,𝛼) 𝐷 𝑝 ( 𝑑 𝑝 ) 𝛼

37 Ξ±-Expansion (1D example)
𝛼 𝐷 𝑝 (𝛼) 𝑉 π‘π‘ž (𝛼, 𝑑 π‘ž ) 𝐷 π‘ž ( 𝑑 π‘ž ) 𝛼

38 Ξ±-Expansion (1D example)
𝛼 𝑉 π‘π‘ž ( 𝑑 𝑝 ,𝛼) 𝑉 π‘π‘ž (𝛼, 𝑑 π‘ž ) 𝑉 π‘π‘ž ( 𝑑 𝑝 , 𝑑 π‘ž ) Such a cut cannot be obtained due to triangle inequality: 𝑉 π‘π‘ž (𝛼, 𝑑 π‘ž ) ≀𝑉 π‘π‘ž 𝑑 𝑝 , 𝑑 π‘ž + 𝑉 π‘π‘ž ( 𝑑 𝑝 ,𝛼) 𝛼

39 Common metrics Potts model: 𝑉 π‘₯,𝑦 = 0 π‘₯=𝑦 1 π‘₯≠𝑦 𝑉 π‘₯,𝑦 = π‘₯βˆ’π‘¦
𝑉 π‘₯,𝑦 = 0 π‘₯=𝑦 1 π‘₯≠𝑦 𝑉 π‘₯,𝑦 = π‘₯βˆ’π‘¦ 𝑉 π‘₯,𝑦 = π‘₯βˆ’π‘¦ 2 Truncated β„“ 1 : 𝑉 π‘₯,𝑦 = π‘₯βˆ’π‘¦ π‘₯βˆ’π‘¦ <𝑇 𝑇 otherwise Truncated squared difference is not a metric

40 Reconstruction with graph-cuts
Original Result Ground truth

41 A different application: detect skyline
Input: one image, oriented with sky above Objective: find the skyline in the image Graph: grid Two states: sky, ground Unary (data) term: State = sky, low if blue, otherwise high State = ground, high if blue, otherwise low Binary term for vertical connections: If state(node)=sky then state(node above)=sky (infinity if not) If state(node)=ground then state(node below)= ground Solve with expansion move. This is a two state problem, and so graph cut finds the global optimum in one expansion move


Download ppt "Geometry 3: Stereo Reconstruction"

Similar presentations


Ads by Google