Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 by Davi GeigerComputer Vision March 2008 L1.1 Binocular Stereo Left Image Right Image.

Similar presentations


Presentation on theme: "© 2008 by Davi GeigerComputer Vision March 2008 L1.1 Binocular Stereo Left Image Right Image."— Presentation transcript:

1 © 2008 by Davi GeigerComputer Vision March 2008 L1.1 Binocular Stereo Left Image Right Image

2 © 2008 by Davi GeigerComputer Vision March 2008 L1.2 There are various different methods of extracting relative depth from images, some of the “passive ones” are based on (i) relative size of known objects, (ii)texture variations, (iii) occlusion cues, such as presence of T-Junctions, (iv) motion information, (v) focusing and defocusing, (vi) relative brightness Moreover, there are active methods such as (i)Radar, which requires beams of sound waves or (ii)Laser, uses beam of light Stereo vision is unique because it is both passive and accurate. Binocular Stereo

3 © 2008 by Davi GeigerComputer Vision March 2008 L1.3 Julesz’s Random Dot Stereogram. The left image, a black and white image, is generated by a program that assigns black or white values at each pixel according to a random number generator. The right image is constructed from by copying the left image, but an imaginary square inside the left image is displaced a few pixels to the left and the empty space filled with black and white values chosen at random. When the stereo pair is shown, the observers can identify/match the imaginary square on both images and consequently “see” a square in front of the background. It shows that stereo matching can occur without recognition. Human Stereo: Random Dot Stereogram

4 © 2008 by Davi GeigerComputer Vision March 2008 L1.4 Not even the identification/matching of illusory contour is known a priori of the stereo process. These pairs gives evidence that the human visual system does not process illusory contours/surfaces before processing binocular vision. Accordingly, binocular vision will be thereafter described as a process that does not require any recognition or contour detection a priori. Human Stereo: Illusory Contours Here not only illusory figures on left and right images don’t match, but also stereo matching yields illusory figures not seen on either left or right images alone. Stereo matching occurs in the presence of illusory.

5 © 2008 by Davi GeigerComputer Vision March 2008 L1.5 Human Stereo: Half Occlusions Left Right An important aspect of the stereo geometry are half-occlusions. There are regions of a left image that will have no match in the right image, and vice-versa. Unmatched regions, or half-occlusion, contain important information about the reconstruction of the scene. Even though these regions can be small they affect the overall matching scheme, because the rest of the matching must reconstruct a scene that accounts for the half-occlusion. Leonardo DaVinci had noted that the larger is the discontinuity between two surfaces the larger is the half-occlusion. Nakayama and Shimojo in 1991 have first shown stereo pair images where by adding one dot to one image, like above, therefore inducing occlusions, affected the overall matching of the stereo pair.

6 © 2008 by Davi GeigerComputer Vision March 2008 L1.6 Projective Camera Let be a point in the 3D world represented by a “world” coordinate system. Let be the center of projection of a camera where a camera reference frame is placed. The camera coordinate system has the z component perpendicular to the camera frame (where the image is produced) and the distance between the center and the camera frame is the focal length,. In this coordinate system the point is described by the vector and the projection of this point to the image (the intersection of the line with the camera frame) is given by the point, where z x P o =(X o,Y o,Z o ) p o =(x o,y o,f) f y f

7 © 2008 by Davi GeigerComputer Vision March 2008 L1.7 We have neglected to account for the radial distortion of the lenses, which would give additional intrinsic parameters. Equation above can be described by the linear transformation where the intrinsic parameters of the camera,, represent the size of the pixels (say in millimeters) along x and y directions, the coordinate q x, q y in pixels of the image (also called the principal point) and the focal length of the camera. Projective Camera and Image Coordinate System x y O pixel coordinates

8 © 2008 by Davi GeigerComputer Vision March 2008 L1.8 A 3D point P, view in the cyclopean coordinate system, projected on both cameras. The same point P described by a coordinate system in the left eye is P l and described by a coordinate system in the right eye is P r. The translation vector T brings the origin of one the left coordinate system to the origin of the right coordinate system. Two Projective Cameras OlOl xlxl ylyl P=(X,Y,Z) p l =(x o,y o,f) xrxr yryr zrzr OrOr zlzl f p r =(x o,y o,f) f

9 © 2008 by Davi GeigerComputer Vision March 2008 L1.9 OlOl xlxl ylyl P=(X,Y,Z) p l =(x o,y o,f) xrxr yryr zrzr OrOr zlzl f p r =(x o,y o,f) f The transformation of coordinate system, from left to right is described by a rotation matrix R and a translation vector T. More precisely, a point P described as P l in the left frame will be described in the right frame as Two Projective Cameras: Transformations OlOl xlxl ylyl P=(X,Y,Z) p l =(x o,y o,f) xrxr yryr zrzr OrOr zlzl f p r =(x o,y o,f) f

10 © 2008 by Davi GeigerComputer Vision March 2008 L1.10 OlOl xlxl ylyl P=(X,Y,Z) xrxr yryr zrzr elel epipolar lines OrOr zlzl p r =(x o,y o,f) erer Each 3D point P defines a plane. This plane intersects the two camera frames creating two corresponding epipolar lines. The line will intersect the camera planes at and, known as the epipoles. The line is common to every plane PO l O l and thus the two epipoles belong to all pairs of epipolar lines (the epipoles are the “center/intersection” of all epipolar lines.) p l =(x o,y o,f) Two Projective Cameras Epipolar Lines

11 © 2008 by Davi GeigerComputer Vision March 2008 L1.11 Estimating Epipolar Lines and Epipoles The two vectors,, span a 2 dimensional space. Their cross product,, is perpendicular to this 2 dimensional space. Therefore where F is known as the fundamental matrix and needs to be estimated

12 © 2008 by Davi GeigerComputer Vision March 2008 L1.12 “Eight point algorithm”: (i)Given two images, we need to identify eight points or more on both images, i.e., we provide n  8 points with their correspondence. The points have to be non-degenerate. (ii)Then we have n linear and homogeneous equations with 9 unknowns, the components of F. We need to estimate F only up to some scale factors, so there are only 8 unknowns to be computed from the n  8 linear and homogeneous equations. (i) If n=8 there is a unique solution (with non-degenerate points), and if n > 8 the solution is overdetermined and we can use the SVD decomposition to find the best fit solution. Computing F (fundamental matrix)

13 © 2008 by Davi GeigerComputer Vision March 2008 L1.13 Each potential match is represented by a square. The black ones represent the most likely scene to “explain” the images, but other combinations could have given rise to the same images (e.g., red) Stereo Correspondence: Ambiguities What makes the set of black squares preferred/unique is that they have similar disparity values, the ordering constraint is satisfied and there is a unique match for each point. Any other set that could have given rise to the two images would have disparity values varying more, and either the ordering constraint violated or the uniqueness violated. The disparity values are inversely proportional to the depth values

14 © 2008 by Davi GeigerComputer Vision March 2008 L1.14 AB C D E F A B A C D D C F F E Stereo Correspondence: Matching Space Right boundary no match Boundary no match Left depth discontinuity Surface orientation discontinuity F D C B A AC DEFAC DEF Note 2: Due to pixel discretization, points A and C in the right frame are neighbors. Note 1: Depth discontinuities and very tilted surfaces can/will yield the same images ( with half occluded pixels) In the matching space, a point (or a node) represents a match of a pixel in the left image with a pixel in the right image

15 © 2008 by Davi GeigerComputer Vision March 2008 L1.15 Cyclopean Eye The cyclopean eye “sees” the world in 3D where x represents the coordinate system of this eye and w is the disparity axis For manipulating with integer coordinate values, one can also use the following representation Restricted to integer values. Thus, for l,r=0,…,N-1 we have x=0,…2N-2 and w=-N+1,.., 0, …, N-1 Note: Not every pair (x,w) have a correspondence to (l,r), when only integer coordinates are considered. For “x+w even” we have integer values for pixels r and l and for “x+w odd” we have supixel locations. Thus, the cyclopean coordinate system for integer values of (x,w) includes a subpixel image resolution w w=2 Right Epipolar Line l-1 l=3 l+1 r+1 r=5 r-1 x

16 © 2008 by Davi GeigerComputer Vision March 2008 L1.16 Left Epipolar Line w w=2 Right Epipolar Line x Smoothness Surface Constraints I Smoothness : In nature most surfaces are smooth in depth compared to their distance to the observer, but depth discontinuities also occur. r+1 r=5 r-1 l-1 l=3 l+1 x=8 YES Given that the l=3 and r=5 are matched (blue square), then the red squares represent violations of the ordering constraint while the yellow squares represent smooth matches.

17 © 2008 by Davi GeigerComputer Vision March 2008 L1.17 Surface Constraints: Discontinuities and Occlusions

18 © 2008 by Davi GeigerComputer Vision March 2008 L1.18 The Uniqueness-Opaque Constraint There should be only one disparity value, one depth value, associated to each cyclopean coordinate x (see figure). The assumption is that objects are opaque and so a 3D point P, seen by the cyclopean coordinate x and disparity value w will cause all other disparity values not to be allowed. Closer points than P, along the same x coordinate, must be transparent air and further away points will not be seen since P is already been seen (opaqueness). However, multiple matches for left eye points or right eye points are allowed. This is indeed required to model tilt surfaces and occlusion surfaces as we will later discuss. This constraint is a physical motivated one and is easily understood in the cyclopean coordinate system. Uniqueness-Opaque Left Epipolar Line w w=2 Right Epipolar Line x x=8 r+1 r=5 r-1 l-1 l=3 l+1 NO: Uniqueness x=8 YES, multiple match for the right eye YES, multiple match for the left eye NO: Uniqueness Given that the l=3 and r=5 are matched (blue square), then the red squares represent violations of the uniqeness-opaqueness constraint while the yellow squares represent unique matches, in the cyclopean coordinate system but multiple matches in the left or right eyes coordinate system.

19 © 2008 by Davi GeigerComputer Vision March 2008 L1.19 Bayesian Formulation The probability of a surface w(x,e) to account for the left and right image can be described by the Bayes formula as where e index the epipolar lines. Let us develop formulas for both probability terms on the numerator. The denominator can be computed as the normalization constant to make the probability sum to 1.

20 © 2008 by Davi GeigerComputer Vision March 2008 L1.20 P(e,x,w) Є [0,1], for x+w even (pixel pixel matching), represents how similar the images are between pixels (e,l) in the left image and (e,r) in the right image, given that they match. We use “left” and “right” windows to account for occlusions. The Image Formation I (special case) left right

21 © 2008 by Davi GeigerComputer Vision March 2008 L1.21 We expand the previous formula to include matching of windows that have other orientations than just  =0 and we expand the formula to any integer value of x+w. Note that when x+w is odd, the coordinates l and r are half integers and so an interpolation/average value for the intensity values need to be computed. For half integer values of l and r we use the formulas The Image Formation I ( intensity ) where is the floor value of x. Our proposed score function is The extra factor 1 is introduced without any change on information and will be a convenient constant to have so that for a perfect match the minimum value is 1 (and not 0)

22 © 2008 by Davi GeigerComputer Vision March 2008 L1.22 The Image Formation II (contrast) where and Our criteria reflects the knowledge that matching opposite contrasts induce rivalry in the matching. For matching homogenous regions the two costs, the contrast one and the pixel one, give the same value of 2 (all the quantities are zero). The best cost for contrast, however, is less than 2, and corresponds to matching contrast information. Our data matching term is then

23 © 2008 by Davi GeigerComputer Vision March 2008 L1.23 Occlusions are regions where no feature match occur. In order to consider them we introduce an “occlusion field” O(e,x) which is 1 if an occlusion occurs at location (e,x) and zero otherwise. We also introduce a related binary matching field, M(e,x,w) which is 1 if a match occurs at (e,x,w) and zero otherwise. Thus, the likelihood of left and right images given the feature match must take into account if an occlusion occurs. We modify the data probability to include occlusions. The Image Formation III (Occlusions) where, due to opacity, the binary matching field M satisfies The occlusion field O relates to the matching field M as

24 © 2008 by Davi GeigerComputer Vision March 2008 L1.24 Need for prior model: Flat or (Double) Tilted ? w left right left right The data and the probabilities for (a) and (b) are the same. Since we are not considering curvature preferences, the preference for flat surfaces must be built as a prior model of surfaces. w=2 x l=1 l=2 l=3 r=5 r=4 r=3 x=8 X X a.Flat Plane b.Doubled Tilted left right Tilt left right Flat w=2 x l=1 l=2 l=3 The images and the probabilities for (a) and (b) are the same. a.Flat Plane b.Tilted r=5 r=4 r=3 r=2 x=8 X X X X X X X X

25 © 2008 by Davi GeigerComputer Vision March 2008 L1.25 left right Occluded left right Flat w=2 x How to set the occlusion probabilities for (b) ? The occlusions are computed without matches between O(e,x)=1 and a penalty for O must be introduced. Flat or Occluded? a.Flat Plane b.Occluded left right Occluded left right Flat The occlusion solution in this case should be slightly better than the occlusion solution in the previous case 2, since with the red color replaced by blue, contrast exists improving the chances of a discontinuity in depth. X X X X X X

26 © 2008 by Davi GeigerComputer Vision March 2008 L1.26 Occluded or Tilted? left right left right x w w=2 l=2 l=3 r=6 r=5 r=4 r=3 x=8 Occlusions and tilt solutions should be approximately the same cost, though it seems like humans may have a slight preference for tilt in these cases. a.Flat and Occluded Planes b.Tilted Planes X X X

27 © 2008 by Davi GeigerComputer Vision March 2008 L1.27 Prior Model for Tilted Surfaces : w w=2 Right Epipolar Line l-1 l=3 l+1 r+1 r=5 r-1 x x=8 w w=2 Right Epipolar Line l-1 l=3 l+1 r+1 r=5 r-1 x x=8 X

28 © 2008 by Davi GeigerComputer Vision March 2008 L1.28 Computer Vision March 2008 L1.28 A Model for Occluded Surfaces Example of a jump/occlusion, x=9,w=2 and x’=5, w’=-1. The x coordinates x=8,7,6 will have O(e,x)=1. So an occlusion cost is added for each column that O(e,x)=1, for a total of |w-w’| =3 columns. whereand There is a bias for discontinuities to occur where good contrast matches occur (for  =0). For |w-w’|=1 the cost competes with a tilt solution. The neighbor set for (e,x,w) is (e,x’,w’), for w’=-D, …,D, so the following property is satisfied

29 © 2008 by Davi GeigerComputer Vision March 2008 L1.29 Computer Vision March 2008 L1.29 Computer Vision March 2008 L1.29 A Model for Occluded Surfaces (cont.) Example of a jump/occlusion, w=2 and w’=-1. Three x coordinates will have O(e,x)=1. So an occlusion cost is added for each column that O(e,x)=1, fot a total of |w-w’| columns. Note that the following property holds This property shows that our prior model is basically a cost proportional to the counting of the occlusions. Well, our model is a modification of the one above, since we also puts a bias for jumps at nodes where good contrast matches occur.

30 © 2008 by Davi GeigerComputer Vision March 2008 L1.30 Computer Vision March 2008 L1.30 x Posterior Model Given a match at (e,x,w) then either it is a flat/tilt surface or it is an occluded surface Neighborhood structure for a node (e,xw) consisting of flat, tilt, or occluded surfaces

31 © 2008 by Davi GeigerComputer Vision March 2008 L1.31 Limit Disparity The search is within a range of disparity : 2D+1, i.e., The rational is: (i)Less computations (ii)Larger disparity matches imply larger errors in 3D estimation. (iii)Humans only fuse stereo images within a limit. It is called the Panum’s limit. We may start the computations at x=D to avoid limiting the range of w values. In this case, we also limit the computations to up to x=2N-2-D

32 © 2008 by Davi GeigerComputer Vision March 2008 L1.32 w=D w w=-D ? States (2D+1) ? 1 2 3 … x-1 x... 2N-2 ? Dynamic Programming F[x, w+D, w’ ]  x-1*[w’+D]  x*[w+D]= C [ x, w+D ] + min i=-1,0,1 {  x-1*[w’+D]+ F [x,w+D, w’+D]}

33 © 2008 by Davi GeigerComputer Vision March 2008 L1.33 Computer Vision March 2008 L1.33 Main loop loop for x=D,  2N-1-D loop for w=-D,…,0,…,D Cost   loop for w’=-D,…,D  if |w-w’| 1 {  F1[x, w+D, w’+D]=TiltCost  w-w’   Temp=  x-1*[w’+D]+ F1[x, w+D, w’+D];  if  Temp < Cost) { Cost= Temp; back[x,w+D] =[x-1,w’+D];} }  Temp=  x’*[w’+D]+ F2[x, w+D, w’+D];  if  Temp < Cost) { Cost= Temp; back[x,w+D] =[x’,w’+D]; } } end loop  x *[w+D]= Cost+ C[x,w+D]; end loop Dynamic Programming if w’ w { (set x’=x-1-|w-w’| ) F2[x, w+D, w’+D]=Occlusion

34 © 2008 by Davi GeigerComputer Vision March 2008 L1.34 left right Epipolar Line Interaction Epipolar interaction: the larger the intensity edges the less the cost (the higher the probability) to have disparity changes across epipolar lines

35 © 2008 by Davi GeigerComputer Vision March 2008 L1.35 Computer Vision March 2008 L1.35 Epipolar Line Interaction (cont.) Epipolar interaction: the larger the intensity edges the less the cost (the higher the probability) to have disparity changes across epipolar lines

36 © 2008 by Davi GeigerComputer Vision March 2008 L1.36 Stereo Correspondence: Belief Propagation (BP) We want to obtain/compute the marginal We have now the posterior distribution for disparity values (surface {w(x,e)}) These are exponential computations on the size (length) of the grid N

37 © 2008 by Davi GeigerComputer Vision March 2008 L1.37 “Horizontal” Belief Tree “Vertical” Belief Tree Kai Ju’s approximation to BP We use Kai Ju’s Ph.D. thesis work to approximate the (x,e) graph/lattice by horizontal and vertical graphs, which are singly connected. Thus, exact computation of the marginal in these graphs can be obtained in linear time. We combine the probabilities obtained for the horizontal and vertical graphs, for each lattice site, by “picking” the “best” one (the ones with lower entropy, where.)

38 © 2008 by Davi GeigerComputer Vision March 2008 L1.38 Result

39 © 2008 by Davi GeigerComputer Vision March 2008 L1.39 Regi on A Regi on B Region A Left Region A Right Region B Left Region B Right Junctions and its properties (false matches that reveal information from vertical disparities (see Malik 94, ECCV) Some Issues in Stereo:

40 © 2008 by Davi GeigerComputer Vision March 2008 L1.40 w=2 x l=1 l=2 l=3 r=6 r=5 r=4 r=3 Towards a Model for Occluded Surfaces: X X X Example of a jump/occlusion, from (x=9,w=2 ) to (x=8, w’=-1). Note that (x=9, w=2) is in the surface in front. where The idea being that when there is a good contrast matching on the surface in front, such as in l=7/2 and r=11/2, a jump/occlusion is more likely to happen.

41 © 2008 by Davi GeigerComputer Vision March 2008 L1.41 x Posterior Model If there is a match at (e,xw) then either it is a flat/tilt surface or it is an occluded surface. Opacity becomes X X X X

42 © 2008 by Davi GeigerComputer Vision March 2008 L1.42 Stereo-Matching DP( ImageLeft, ImageRight, D, e ) (to be solved line by line) Create the Graph  (V(x,w), E(x,w,x-1,w’) (length is 2N-1 and width is 2D+1) /* precomputation of the match cost and transition cost, stored in arrays C and F */ loop for v=(x,w) in V (i.e., loop for x and loop for w) Set-Array C[x,w+D] (use the data matching term  (e,x,w,s=3)) end loop Main loop loop for x=D,  2N-1-D loop for w=-D,…,0,…,D Cost   loop for w’=-D,…,D   Temp=  x-1*[w’+D]+ F[x, w+D, w’+D];  if  Temp < Cost); Cost= Temp; back[x,w+D] =w’+D; end loop  x *[w+D]= Cost+ C[x,w+D]; end loop Dynamic Programming if |w-w’| > 1 F[x, w+D, w’+D]=Occlusion else if F[x, w+D, w’+D]=TiltCost  w-w’ 


Download ppt "© 2008 by Davi GeigerComputer Vision March 2008 L1.1 Binocular Stereo Left Image Right Image."

Similar presentations


Ads by Google