Presentation on theme: "CS 376b Introduction to Computer Vision 04 / 21 / 2008 Instructor: Michael Eckmann."— Presentation transcript:
CS 376b Introduction to Computer Vision 04 / 21 / 2008 Instructor: Michael Eckmann
Michael Eckmann - Skidmore College - CS 376b - Spring 2008 Today’s Topics Comments/Questions perspective projection –I'll first draw figure 12.21 w/ real & “front” image planes –then our text only shows the “front” image plane in later figures stereo vision –sparse depth map from stereo perceiving 3D from 2D –human depth cues –shape from shading basically this lecture goes in the following order: 12.5, 12.6 then 12.3
Perspective projection from Shapiro and Stockman, figure 12.22 Perspective projection
Perspective projection The image coordinates are related to the world coordinates by the following equations. Similar triangles yield: z c /f = x c /x i which means x i = (f/z c ) * x c z c /f = y c /y i which means y i = (f/z c ) * y c so, if the camera is viewing a plane in the world that is parallel to the viewing plane, then the view on the image plane is a scaled version of the world plane Every 3D point along a ray corresponds to the same 2D point on the image plane.
Stereo from Shapiro and Stockman, figure 12.24 Let me show the steps that get skipped on the board to go from (2) to (3) for x and z. The equation for y is simply as we did it for the perspective projection.
So, as disparity increases, the distance to the world point decreases. And vice-versa. Think about close objects vs. far objects and their expected disparities. When baseline increases, less chance for correspondences (less overlap in what the two cameras view), but if we decrease the baseline, then small errors in corresponding image points, result in larger errors in determining where P is in the world. Stereo
Therefore, computing the depth of a particular world point is easy as long as we know –baseline (distance between the cameras) –and focal length of the camera –and the disparity in the images of the corresponding points The hardest part in all this is? Stereo
Therefore, computing the depth of a particular world point is easy as long as we know –baseline (distance between the cameras) –and focal length of the camera –and the disparity in the images of the corresponding points The hardest part in all this is –getting correct correspondences –think about if the correspondence of points are off, by even a little (assuming baseline << distance to world points) Stereo
Cross correlation is often used between pixels in left image to right image –can compute the depth at all? pixels Other possibilities include –finding good features (those that are localizable) in left image –then finding correspondences in right image, either by cross correlation or by some other matching scheme end up with a sparse depth map --- only have depth calculations at places where we found a good feature AND that feature found a correspondence not every feature in left image will find its match in right image (why do you think?) given a sparse depth map, we'll need to do some error prone interpolation between computed depth points to fill in the depth map Stereo
Can take advantage of the epipolar constraint –says that a point in left image can only appear on an epipolar line in the right image (reduces the search space from a potentially large region to a line) define epipolar lines, epipolar plane, epipoles (see next slides) Can take advantage of the ordering constraint –says that two points which are on a continuous surface in the 3D world will appear in the same order in the left image as the right image –problem is, you might incorrectly assume that two points belong to the same surface when in actuality they do not and the constraint doesn't hold e.g. consider a small object occluding a larger object which is further away (let me draw on the board) Stereo
from Shapiro and Stockman, figure 12.27 Epipolar geometry for aligned cameras
Stereo from Shapiro and Stockman, figure 12.28 Epipolar geometry for unaligned cameras
Focus, blur and depth of field Let's look at figures 12.29 and 12.30 A point will be blurred into a larger spot when the image plane is off in either direction from its ideal location when the point will be in focus. The depth of field is related to how large a range of depths will be in focus (or within some acceptable level of blur). –e.g. our text computes the depth of field assuming a maximum blur of an area the size of one pixel
3D cues from 1 2D image We just considered stereo which used two images to get 3D information. We can gather some 3D information from a single image with various depth cues (see section 12.3) –occlusion –changing texture density on a plane –changing size of an object which extends throughout various depths These cues can be used to get relative (not absolute) depths between various objects/surfaces.
3D cues from 1 2D image Interposition --- if an object A occludes another object B then object A is closer Perspective scaling --- distance of an object is inversely proportional to its size in the image plane Foreshortening --- viewing an object at an acute angle causes the object to be compressed (in a perspective manner) Motion parallax --- when an observer moves, stationary closer objects move faster than stationary further objects
3D cues from 1 2D image figure 12.10 from Shapiro and Stockman
3D cues from 1 2D image Shape from shading example: figure 12.16 in Shapiro and Stockman (original credit: courtesy of D. Trytten.) Compute surface normals based on intesities and known properties of the light (energy, position, direction) and surface properties (how light is reflected, absorbed)
3D cues from 1 2D image So, if one is able to control the lighting in an environment, one can do various things to help with implying 3D structure, such as projecting light stripes in your environment as in figure 12.14 in Shapiro and Stockman (original credit: courtesy of Gongzhu Hu.)