1Ellen L. Walker Stereo Vision Why? Two images provide information to extract (some) 3D information We have good biological models (our own vision system)

Slides:

Advertisements

Similar presentations

Epipolar Geometry.

Advertisements

CSE473/573 – Stereo and Multiple View Geometry

3D reconstruction.

Stereo Vision Reading: Chapter 11

CS 376b Introduction to Computer Vision 04 / 21 / 2008 Instructor: Michael Eckmann.

Stereo Many slides adapted from Steve Seitz. Binocular stereo Given a calibrated binocular stereo pair, fuse it to produce a depth image Where does the.

MASKS © 2004 Invitation to 3D vision Lecture 7 Step-by-Step Model Buidling.

Computer vision: models, learning and inference

Lecture 8: Stereo.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Camera calibration and epipolar geometry

Structure from motion.

Last Time Pinhole camera model, projection

Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.

Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.

Uncalibrated Geometry & Stratification Sastry and Yang

Multiple-view Reconstruction from Points and Lines

The plan for today Camera matrix

3D Computer Vision and Video Computing 3D Vision Lecture 14 Stereo Vision (I) CSC 59866CD Fall 2004 Zhigang Zhu, NAC 8/203A

May 2004Stereo1 Introduction to Computer Vision CS / ECE 181B Tuesday, May 11, 2004  Multiple view geometry and stereo  Handout #6 available (check with.

Lec 21: Fundamental Matrix

CSE473/573 – Stereo Correspondence

Stereo Sebastian Thrun, Gary Bradski, Daniel Russakoff Stanford CS223B Computer Vision (with slides by James Rehg and.

Stockman MSU/CSE Math models 3D to 2D Affine transformations in 3D; Projections 3D to 2D; Derivation of camera matrix form.

1 Perceiving 3D from 2D Images How can we derive 3D information from one or more 2D images? There have been 2 approaches: 1. intrinsic images: a 2D representation.

3-D Scene u u’u’ Study the mathematical relations between corresponding image points. “Corresponding” means originated from the same 3D point. Objective.

Multi-view geometry. Multi-view geometry problems Structure: Given projections of the same 3D point in two or more images, compute the 3D coordinates.

Computer Vision Spring ,-685 Instructor: S. Narasimhan WH 5409 T-R 10:30am – 11:50am Lecture #15.

Automatic Camera Calibration

Computer vision: models, learning and inference

Lecture 11 Stereo Reconstruction I Lecture 11 Stereo Reconstruction I Mata kuliah: T Computer Vision Tahun: 2010.

Last Week Recognized the fact that the 2D image is a representation of a 3D scene thus contains a consistent interpretation –Labeled edges –Labeled vertices.

Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.

Epipolar geometry The fundamental matrix and the tensor

Course 12 Calibration. 1.Introduction In theoretic discussions, we have assumed: Camera is located at the origin of coordinate system of scene.

Stereo Vision Reading: Chapter 11 Stereo matching computes depth from two or more images Subproblems: –Calibrating camera positions. –Finding all corresponding.

Lecture 04 22/11/2011 Shai Avidan הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Geometry 3: Stereo Reconstruction Introduction to Computer Vision Ronen Basri Weizmann Institute of Science.

Stereo Course web page: vision.cis.udel.edu/~cv April 11, 2003  Lecture 21.

Stereo Many slides adapted from Steve Seitz.

Computer Vision, Robert Pless

Lec 22: Stereo CS4670 / 5670: Computer Vision Kavita Bala.

Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.

CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.

Bahadir K. Gunturk1 Phase Correlation Bahadir K. Gunturk2 Phase Correlation Take cross correlation Take inverse Fourier transform  Location of the impulse.

Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.

EECS 274 Computer Vision Affine Structure from Motion.

stereo Outline : Remind class of 3d geometry Introduction

Computer vision: models, learning and inference M Ahad Multiple Cameras

1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.

Correspondence and Stereopsis Original notes by W. Correa. Figures from [Forsyth & Ponce] and [Trucco & Verri]

John Morris Stereo Vision (continued) Iolanthe returns to the Waitemata Harbour.

Image-Based Rendering Geometry and light interaction may be difficult and expensive to model –Think of how hard radiosity is –Imagine the complexity of.

Lec 26: Fundamental Matrix CS4670 / 5670: Computer Vision Kavita Bala.

Correspondence and Stereopsis. Introduction Disparity – Informally: difference between two pictures – Allows us to gain a strong sense of depth Stereopsis.

CSE 185 Introduction to Computer Vision Stereo 2.

55:148 Digital Image Processing Chapter 11 3D Vision, Geometry

CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.

Motion and Optical Flow

Epipolar geometry.

Geometry 3: Stereo Reconstruction

Common Classification Tasks

What have we learned so far?

Two-view geometry.

Multiple View Geometry for Robotics

Reconstruction.

Computer Vision Stereo Vision.

Course 6 Stereo.

Chapter 11: Stereopsis Stereopsis: Fusing the pictures taken by two cameras and exploiting the difference (or disparity) between them to obtain the depth.

Stereo vision Many slides adapted from Steve Seitz.

Presentation transcript:

1Ellen L. Walker Stereo Vision Why? Two images provide information to extract (some) 3D information We have good biological models (our own vision system) Difficulties Matching information from left to right … but we’ve already looked at some matching techniques … and some can take advantage of expectation Calibrating the stereo rig Some methods require careful calibration Others avoid calibration entirely

2Ellen L. Walker Multiple Coordinate Frames World Frame (Euclidean) “Arbitrary” origin, z usually vertical Camera Frame (Euclidean) Focal point of the camera is the origin, Z points away from the image plane and is aligned with the optical axis. Image Frame (Euclidean) Axes X and Y aligned with camera frame. Origin is where the focal ray hits the image plane. Image Frame (Affine) Y, Z same as Camera frame, X maybe not perpendicular to Y (models non-rectangular pixels)

3Ellen L. Walker Perspective Projection Geometry (review) Image plane Focal point f y Y Z0 y/f = Y/Zy = fY/Z Z=fY/y

4Ellen L. Walker Triangulation Given image point (x), and focal center (c), all possible world points lie along a ray with vector v: Find the intersection of these rays to get the 3D point, (see section 7.1 for least- squares formulation) Figure 7.1

5Ellen L. Walker Stereo Reconstruction Epipolar geometry Every point in one image, lies on a line in the other image The epipolar line is the image of the ray from the focal point through the point All epipolar lines pass through the epipole, which is the image of the focal point itself. So what? If cameras are calibrated, make epipolar lines line up on scan lines (epipole is at infinity) This is the “canonical configuration”) If cameras are not calibrated, find the epipole and use it for calibration (8 point algorithm)

6Ellen L. Walker Epipolar Geometry Figure 7.3 Epipolar points (e0 and e1), lines (l0 and l1) and corresponding points (x and x1)

7Ellen L. Walker Epipolar Geometry Definitions c0, c1 - camera centers of focus i0, i1 - image planes p - point in space x0, x1 – images of p e0 – epipole 0 (image of c1 in i0) & vice versa for e1 Epipolar lines l0 and l1 connect e0 and x0; e1 and x1 Epipolar plane contains p, c0 and c1 (and all epipolar lines) Epipolar constraint: All images of a point lie on its epipolar line.

8Ellen L. Walker Recovering the Epipolar Information Begin by assuming 2 cameras are related by rotation R and translation T (We will not have to know R and T later) Then: (x0, y0, w0) T = P1 (X, Y, Z, 1) T Where P1 = (Id | 0) and Id is the 3x3 identity matrix (x1, y1, w1) T = P2 (X, Y, Z, 1) T Where P2 = (R | -RT) T and R and T are the rotation and translation matrices between the cameras The image of the line that passes through 2 points (the camera origin C0 = (0, 0, 0, 1) and the point at infinity (x0, y0, z0, 0) is epipolar line 2 After some algebra (section 7.2), we get the important equation relating the points in the two images: (x0, y0, w0) E (x1, y1, w1) T = 0 E is the 3x3 essential matrix that relates the two images

9Ellen L. Walker Recovering the Epipolar Information The equation (x0, y0, w0) Q (x1, y1, w1) T = 0 is true for every point that is visible in both images! Since Q is a 3x3 matrix, we would need 9 linear equations to recover all 9 elements But, we will never be able to recover absolute scale (since moving the camera closer is entirely equivalent to making the objects bigger) Set Q[3][3] = 1 Use 8 correspondences to recover 8 points "8 point algorithm" for epipolar constraint recovery Given Q and p0 = (x0, y0, s0), the epipolar line is the set of all points P1 for which p0Qp1 = 0, which is an equation for the epipolar line!

10Ellen L. Walker Value of Epipolar Information Recover epipole to use as a constraint for correspondence matching Use epipolar information to warp images as they would appear in a calibrated rig (epipolar line -> x axis) Recognize "possible / impossible" relationships among points based on epipolar constraints Use the concept of "two views" in other ways Object and shadow Two copies of the same object (translational symmetry) Surface of revolution (rotational symmetry of boundary curve)

11Ellen L. Walker Stereo in a Calibrated Rig Assume cameras aligned on x axis, b and f known Given xl and xr (and d = xl – xr), calculate Z P = Xl,Z (left) Cl Cr f b Xr = Xl – b Zr = Zl = Z xl = f Xl / Z xr = f Xr / Z = f(Xl – b)/Z xl – xr = (f/Z) (Xl – (Xl – b)) xl – xr = f b / z f xl xr

12Ellen L. Walker Disparity Image Given two rectified images (epipolar lines are horizontal or vertical), compute disparity (d) at each point Disparity image (x, y,d): x and y from image 0, d is the disparity Distance is inversely proportional to disparity Brighter points are closer

13Ellen L. Walker Finding Disparities This is a matching problem Use knowledge of camera setup to limit match locations Along horizontal lines, for calibrated setup earlier Along epipolar lines more generally Matching strategies include: Correlation (e.g. random dot stereogram) (Point) feature extraction & matching Object recognition & matching [not used by human vision] Use of relational constraints (items don't trade places)

14Ellen L. Walker Sparse vs. Dense Stereo Feature-based methods are sparse First, find matchable features, then compute disparities via matches Less computationally intensive (historically important) Matches have high certainty We want dense 3D information Necessary for modeling, rendering One way: use sparse matches as seeds, then fill in to make denser maps (analogs: region growing, thresholding with hysteresis)

15Ellen L. Walker Dense Stereo Taxonomy Most methods perform the following steps: Matching cost computation Cost (support) aggregation Disparity computation / optimization Disparity refinement “Cost” is generally with respect to an optimization framework (e.g. penalty for non-smoothness)

16Ellen L. Walker Sum of Squared Difference (local) Matching cost is squared difference of intensity at given disparity (i.e. how different are the ‘matching’ pixels?) Aggregation is adding up cost (at a given disparity) in a square window Disparity selected based on minimum cost at each pixel (Optional disparity refinement step can be added)

17Ellen L. Walker Optimization Algorithms (global) Choose a local matching cost (similarity measure) Apply global constraints (e.g. smoothness) Use an optimization technique (e.g. simulated annealing, dynamic programming) to solve the resulting constrained optimization problem Disparity refinement step can be added here

18Ellen L. Walker Dynamic Programming for Optimization Row is left scanline, column is right scanline Goal: generate least-cost diagonal path through matrix M=match, L=left only, R=right only (L and R have fixed costs, M depends on match quality)

19Ellen L. Walker Disparity Refinement For rendering, prevent ‘viewmaster’ appearance: Objects seem to be aligned on fixed planes, e.g. cardboard cutouts stacked behind each other Interpolate (“subpixel”) disparities to fit appropriate 3D curves and surfaces Determine areas of occlusion (& verify) Clean up noise with median filters, etc.

20Ellen L. Walker Segmentation Based Approach First, segment the image into coherent regions Oversegment to avoid mis-segmentation Then, fit a local plane to each region Iterative optimization technique, like relaxation Allows for arbitrary discontinuities between regions These techniques are best-ranked on Middlebury stereo evaluation site:

21Ellen L. Walker Variations on Stereo Trinocular stereo Three calibrated cameras impose more constraints on correspondences Multi-baseline stereo When b is large, Z determination is more accurate "error diamonds" are not so elongated When b is small, correspondences are easier to find Sliding camera or 3 or more collinaer cameras allow both (Depth estimate from small b constraints search in larger b)

22Ellen L. Walker Motion from 2D Image Sequences Motion also gives multiple views Multiple frames of translational motion similar to multiple- baseline images Correspondence between sequential frames (small baseline) Reconstruction using first and last frame (large baseline) Camera moving on known path (e.g. into scene) allows reconstruction of unmoving objects from optical flow Stable camera, single moving object Motion segmentation Trajectory estimation Possible 3D reconstruction depending on complexity of object and trajectory

23Ellen L. Walker Stationary Object, Fixed Background One or more discrete "moving objects" in the scene Since most of the scene is stable, image subtraction will highlight objects What changes are the leading & trailing edges Changes are of opposite sign Bounding box of moving object easy to determine For best results, filter small noise regions Smoothing before subtraction Remove small regions of motion after subtraction Closing to fill small gaps in moving objects' shapes

24Ellen L. Walker Optical Flow Assume that intensity is not changing Compute vector of each visible point between frames Set of vectors is "optical flow field" Issues Computing point correspondences gives sparse field Additional constraint from assuming consistent motion Dense field computed as optimization with correlation and smoothness constraints When object edges are not visible, only the motion normal to visible edges can be determined (aperture problem). E.g. looking at a pole through a keyhole

25Ellen L. Walker Interpreting Optical Flow Field Mostly 0, some regions of consistent vector Translational object motion on stable background Entire image is consistent vector Translational camera motion in stable scene Vectors pointing outward from a point Motion into the scene towards that point, or expansion Vectors pointing inward toward a point Motion away from that point, or contraction In all cases, larger vectors = faster motion

26Ellen L. Walker Range Sensing - Direct 3D Structured light (visible, infrared, laser) Simple case: replace second camera by a scanning laser - No correspondence problem! More efficient: use stripes aligned with rows/columns; use patterns to avoid scanning Active sensing (radar, sonar, laser, touch?) Send out a signal & see how long it takes to bounce back Use phase difference for more accurate data Act on the object and record results (touch gives position and orientation of surface)