Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective.

Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective

Stockman MSU/CSE Fall 20082 Computer Vision and Human Perception What are the goals of CV? What are the applications? How do humans perceive the 3D world via images? Some methods of processing images.

Stockman MSU/CSE Fall 20083 Goal of computer vision Make useful decisions about real physical objects and scenes based on sensed images. Alternative (Aloimonos and Rosenfeld): goal is the construction of scene descriptions from images. How do you find the door to leave? How do you determine if a person is friendly or hostile?.. an elder?.. a possible mate?

Stockman MSU/CSE Fall 20084 Critical Issues Sensing: how do sensors obtain images of the world? Information: how do we obtain color, texture, shape, motion, etc.? Representations: what representations should/does a computer [or brain] use? Algorithms: what algorithms process image information and construct scene descriptions?

Stockman MSU/CSE Fall 20085 Images: 2D projections of 3D 3D world has color, texture, surfaces, volumes, light sources, objects, motion, betweeness, adjacency, connections, etc. 2D image is a projection of a scene from a specific viewpoint; many 3D features are captured, some are not. Brightness or color = g(x,y) or f(row, column) for a certain instant of time Images indicate familiar people, moving objects or animals, health of people or machines

Stockman MSU/CSE Fall 20086 Image receives reflections Light reaches surfaces in 3D Surfaces reflect Sensor element receives light energy Intensity matters Angles matter Material maters

Stockman MSU/CSE Fall 20087 CCD Camera has discrete elts Lens collects light rays CCD elts replace chemicals of film Number of elts less than with film (so far)

Stockman MSU/CSE Fall 20088 Intensities near center of eye

Stockman MSU/CSE Fall 20089 Camera + Programs = Display Camera inputs to frame buffer Program can interpret data Program can add graphics Program can add imagery

Stockman MSU/CSE Fall 200810 Some image format issues Spatial resolution; intensity resolution; image file format

Stockman MSU/CSE Fall 200811 Resolution is “pixels per unit of length” Resolution decreases by one half in cases at left Human faces can be recognized at 64 x 64 pixels per face

Stockman MSU/CSE Fall 200812 Features detected depend on the resolution Can tell hearts from diamonds Can tell face value Generally need 2 pixels across line or small region (such as eye)

Stockman MSU/CSE Fall 200813 Human eye as a spherical camera 100M sensing elts in retina Rods sense intensity Cones sense color Fovea has tightly packed elts, more cones Periphery has more rods Focal length is about 20mm Pupil/iris controls light entry Eye scans, or saccades to image details on fovea 100M sensing cells funnel to 1M optic nerve connections to the brain

Stockman MSU/CSE Fall 200814 Image processing operations Thresholding; Edge detection; Motion field computation

Stockman MSU/CSE Fall 200815 Find regions via thresholding Region has brighter or darker or redder color, etc. If pixel > threshold then pixel = 1 else pixel = 0

Stockman MSU/CSE Fall 200816 Example red blood cell image Many blood cells are separate objects Many touch – bad! Salt and pepper noise from thresholding How useable is this data?

Stockman MSU/CSE Fall 200817 Robot vehicle must see stop sign sign = imread('Images/stopSign.jpg','jpg'); red = (sign(:, :, 1)>120) & (sign(:,:,2)<100) & (sign(:,:,3)<80); out = red*200; imwrite(out, 'Images/stopRed120.jpg', 'jpg')

Stockman MSU/CSE Fall 200818 Thresholding is usually not trivial

Stockman MSU/CSE Fall 200819 Can cluster pixels by color similarity and by adjacency Original RGB ImageColor Clusters by K-Means

Stockman MSU/CSE Fall 200820 Some image processing ops Finding contrast in an image; using neighborhoods of pixels; detecting motion across 2 images

Stockman MSU/CSE Fall 200821 Differentiate to find object edges For each pixel, compute its contrast Can use max difference of its 8 neighbors Detects intensity change across boundary of adjacent regions LOG filter later on

Stockman MSU/CSE Fall 200822 4 and 8 neighbors of a pixel 4 neighbors are at multiples of 90 degrees. N. W * E. S. 8 neighbors are at every multiple of 45 degrees NW N NE W * E SW S SE

Stockman MSU/CSE Fall 200823 Detect Motion via Subtraction Constant background Moving object Produces pixel differences at boundary Reveals moving object and its shape Differences computed over time rather than over space

Stockman MSU/CSE Fall 200824 Two frames of aerial imagery Video frame N and N+1 shows slight movement: most pixels are same, just in different locations.

Stockman MSU/CSE Fall 200825 Best matching blocks between video frames N+1 to N (motion vectors) The bulk of the vectors show the true motion of the airplane taking the pictures. The long vectors are incorrect motion vectors, but they do work well for compression of image I2! Best matches from 2 nd to first image shown as vectors overlaid on the 2 nd image. (Work by Dina Eldin.)

Stockman MSU/CSE Fall 200826 Gradient from 3x3 neighborhood Estimate both magnitude and direction of the edge.

Stockman MSU/CSE Fall 200827 Prewitt versus Sobel masks Sobel mask uses weights of 1,2,1 and -1,-2,-1 in order to give more weight to center estimate. The scaling factor is thus 1/8 and not 1/6.

Stockman MSU/CSE Fall 200828 Computational short cuts

Stockman MSU/CSE Fall 200829 2 rows of intensity vs difference

Stockman MSU/CSE Fall 200830 “Masks” show how to combine neighborhood values Multiply the mask by the image neighborhood to get first derivatives of intensity versus x and versus y

Stockman MSU/CSE Fall 200831 Curves of contrasting pixels

Stockman MSU/CSE Fall 200832 Boundaries not always found well

Stockman MSU/CSE Fall 200833 Canny boundary operator

Stockman MSU/CSE Fall 200834 LOG filter creates zero crossing at step edges (2 nd der. of Gaussian) 3x3 mask applied at each image position Detects spots Detects steps edges Marr-Hildreth theory of edge detection

Stockman MSU/CSE Fall 200835 G(x,y): Mexican hat filter

Stockman MSU/CSE Fall 200836 Positive center Negative surround

Stockman MSU/CSE Fall 200837 Properties of LOG filter Has zero response on constant region Has zero response on intensity ramp Exaggerates a step edge by making it larger Responds to a step in any direction across the “receptive field”. Responds to a spot about the size of the center

Stockman MSU/CSE Fall 200838 Human receptive field is analogous to a mask X[j] are the image intensities. W[j] are gains (weights) in the mask

Stockman MSU/CSE Fall 200839 Human receptive fields amplify contrast

Stockman MSU/CSE Fall 200840 3D neural network in brain Level j Level j+1

Stockman MSU/CSE Fall 200841 Mach band effect shows human bias

Stockman MSU/CSE Fall 200842 Human bias and illusions supports receptive field theory of edge detection

Stockman MSU/CSE Fall 200843 Human brain as a network 100B neurons, or nodes Half are involved in vision 10 trillion connections Neuron can have “fanout” of 10,000 Visual cortex highly structured to process 2D signals in multiple ways

Stockman MSU/CSE Fall 200844 Color and shading Used heavily in human vision Color is a pixel property, making some recognition problems easy Visible spectrum for humans is 400nm (blue) to 700 nm (red) Machines can “see” much more; ex. X-rays, infrared, radio waves

Stockman MSU/CSE Fall 200845 Imaging Process (review)

Stockman MSU/CSE Fall 200846 Factors that Affect Perception Light: the spectrum of energy that illuminates the object surface Reflectance: ratio of reflected light to incoming light Specularity: highly specular (shiny) vs. matte surface Distance: distance to the light source Angle: angle between surface normal and light source Sensitivity how sensitive is the sensor

Stockman MSU/CSE Fall 200847 Some physics of color White light is composed of all visible frequencies (400-700) Ultraviolet and X-rays are of much smaller wavelength Infrared and radio waves are of much longer wavelength

Stockman MSU/CSE Fall 200848 Models of Reflectance We need to look at models for the physics of illumination and reflection that will 1. help computer vision algorithms extract information about the 3D world, and 2. help computer graphics algorithms render realistic images of model scenes. Physics-based vision is the subarea of computer vision that uses physical models to understand image formation in order to better analyze real-world images.

Stockman MSU/CSE Fall 200849 The Lambertian Model: Diffuse Surface Reflection A diffuse reflecting surface reflects light uniformly in all directions Uniform brightness for all viewpoints of a planar surface.

Stockman MSU/CSE Fall 200850 Real matte objects Light from ring around camera lens

Stockman MSU/CSE Fall 200851 Specular reflection is highly directional and mirrorlike. R is the ray of reflection V is direction from the surface toward the viewpoint  is the shininess parameter

Stockman MSU/CSE Fall 200852 CV: Perceiving 3D from 2D Many cues from 2D images enable interpretation of the structure of the 3D world producing them

Stockman MSU/CSE Fall 200853 Many 3D cues How can humans and other machines reconstruct the 3D nature of a scene from 2D images? What other world knowledge needs to be added in the process?

Stockman MSU/CSE Fall 200854 Labeling image contours interprets the 3D scene structure + An egg and a thin cup on a table top lighted from the top right Logo on cup is a “mark” on the material “shadow” relates to illumination, not material

Stockman MSU/CSE Fall 200855 “Intrinsic Image” stores 3D info in “pixels” and not intensity. For each point of the image, we want depth to the 3D surface point, surface normal at that point, albedo of the surface material, and illumination of that surface point.

Stockman MSU/CSE Fall 200856 3D scene versus 2D image Creases Corners Faces Occlusions (for some viewpoint) Edges Junctions Regions Blades, limbs, T’s

Stockman MSU/CSE Fall 200857 Labeling of simple polyhedra Labeling of a block floating in space. BJ and KI are convex creases. Blades AB, BC, CD, etc model the occlusion of the background. Junction K is a convex trihedral corner. Junction D is a T- junction modeling the occlusion of blade CD by blade JE.

Stockman MSU/CSE Fall 200858 Trihedral Blocks World Image Junctions: only 16 cases! Only 16 possible junctions in 2D formed by viewing 3D corners formed by 3 planes and viewed from a general viewpoint! From top to bottom: L-junctions, arrows, forks, and T-junctions.

Stockman MSU/CSE Fall 200859 How do we obtain the catalog? think about solid/empty assignments to the 8 octants about the X-Y-Z-origin think about non-accidental viewpoints account for all possible topologies of junctions and edges then handle T-junction occlusions

Stockman MSU/CSE Fall 200860 Blocks world labeling Left: block floating in space Right: block glued to a wall at the back

Stockman MSU/CSE Fall 200861 Try labeling these: interpret the 3D structure, then label parts What does it mean if we can’t label them? If we can label them?

Stockman MSU/CSE Fall 200862 1975 researchers very excited very strong constraints on interpretations several hundred in catalogue when cracks and shadows allowed (Waltz): algorithm works very well with them but, world is not made of blocks! later on, curved blocks world work done but not as interesting

Stockman MSU/CSE Fall 200863 Backtracking or interpretation tree

Stockman MSU/CSE Fall 200864 “Necker cube” has multiple interpretations A human staring at one of these cubes typically experiences changing interpretations. The interpretation of the two forks (G and H) flip-flops between “front corner” and “back corner”. What is the explanation? Label the different interpretations

Stockman MSU/CSE Fall 200865 Depth cues in 2D images

Stockman MSU/CSE Fall 200866 “Interposition” cue Def: Interposition occurs when one object occludes another object, thus indicating that the occluding object is closer to the viewer than the occluded object.

Stockman MSU/CSE Fall 200867 interposition T-junctions indicate occlusion: top is occluding edge while bar is the occluded edge Bench occludes lamp post leg occludes bench lamp post occludes fence railing occludes trees trees occlude steeple

Stockman MSU/CSE Fall 200868 Perspective scaling: railing looks smaller at the left; bench looks smaller at the right; 2 steeples are far away Forshortening: the bench is sharply angled relative to the viewpoint; image length is affected accordingly

Stockman MSU/CSE Fall 200869 Texture gradient reveals surface orientation Texture Gradient: change of image texture along some direction, often corresponding to a change in distance or orientation in the 3D world containing the objects creating the texture. ( In East Lansing, we call it “corn” not “maize’. ) Note also that the rows appear to converge in 2D

Stockman MSU/CSE Fall 200870 3D Cues from Perspective

Stockman MSU/CSE Fall 200871 3D Cues from perspective

Stockman MSU/CSE Fall 200872 More 3D cues Virtual linesFalsely perceived interposition

Stockman MSU/CSE Fall 200873 Irving Rock: The Logic of Perception; 1982 Summarized an entire career in visual psychology Concluded that the human visual system acts as a problem-solver Triangle unlikely to be accidental; must be object in front of background; must be brighter since it’s closer

Stockman MSU/CSE Fall 200874 More 3D cues 2D alignment usually means 3d alignment 2D image curves create perception of 3D surface

Stockman MSU/CSE Fall 200875 “structured light” can enhance surfaces in industrial vision Sculpted object Potatoes with light stripes

Stockman MSU/CSE Fall 200876 Models of Reflectance We need to look at models for the physics of illumination and reflection that will 1. help computer vision algorithms extract information about the 3D world, and 2. help computer graphics algorithms render realistic images of model scenes. Physics-based vision is the subarea of computer vision that uses physical models to understand image formation in order to better analyze real-world images.

Stockman MSU/CSE Fall 200877 The Lambertian Model: Diffuse Surface Reflection A diffuse reflecting surface reflects light uniformly in all directions Uniform brightness for all viewpoints of a planar surface.

Stockman MSU/CSE Fall 200878 Shape (normals) from shading Cylinder with white paper and pen stripes Intensities plotted as a surface Clearly intensity encodes shape in this case

Stockman MSU/CSE Fall 200879 Shape (normals) from shading Plot of intensity of one image row reveals the 3D shape of these diffusely reflecting objects.

Stockman MSU/CSE Fall 200880 Specular reflection is highly directional and mirrorlike. R is the ray of reflection V is direction from the surface toward the viewpoint  is the shininess parameter

Stockman MSU/CSE Fall 200881 What about models for recognition “recognition” = to know again; How does memory store models of faces, rooms, chairs, etc.?

Stockman MSU/CSE Fall 200882 Human capability extensive Child age 6 might recognize 3000 words And 30,000 objects Junkyard robot must recognize nearly all objects Hundreds of styles of lamps, chairs, tools, …

Stockman MSU/CSE Fall 200883 Some methods: recognize Via geometric alignment: CAD Via trained neural net Via parts of objects and how they join Via the function/behavior of an object

Stockman MSU/CSE Fall 200884 Side view classes of Ford Taurus (Chen and Stockman) These were made in the PRIP Lab from a scale model. Viewpoints in between can be generated from x and y curvature stored on boundary. Viewpoints matched to real image boundaries via optimization.

Stockman MSU/CSE Fall 200885 Matching image edges to model limbs Could recognize car model at stoplight or gate.

Stockman MSU/CSE Fall 200886 Object as parts + relations Parts have size, color, shape Connect together at concavities Relations are: connect, above, right of, inside of, …

Stockman MSU/CSE Fall 200887 Functional models Inspired by JJ Gibson’s Theory of Affordances. An object is what an object does * container: holds stuff * club: hits stuff * chair: supports humans

Stockman MSU/CSE Fall 200888 Louise Stark: chair model Dozens of CAD models of chairs Program analyzed for * stable pose * seat of right size * height off ground right size * no obstruction to body on seat * program would accept a trash can (which could also pass as a container)

Stockman MSU/CSE Fall 200889 Minski’s theory of frames (Schank’s theory of scripts) Frames are learned expectations – frame for a room, a car, a party, an argument, … Frame is evoked by current situation – how? (hard) Human “fills in” the details of the current frame (easier)

Stockman MSU/CSE Fall 200890 summary Images have many low level features Can detect uniform regions and contrast Can organize regions and boundaries Human vision uses several simultaneous channels: color, edge, motion Use of models/knowledge diverse and difficult Last 2 issues difficult in computer vision

Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective.

Similar presentations

Presentation on theme: "Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective.

Similar presentations

Presentation on theme: "Stockman MSU/CSE Fall 20081 Computer Vision and Human Perception A brief intro from an AI perspective."— Presentation transcript:

Similar presentations

About project

Feedback