CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen

CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen Güzelyurt, Ankara Spring 2009

Textbook: 1. L. Shapiro and Stockman, Computer Vision Reccomended Books: R. Szeliski, Computer Vision: Algorithms and Applications, Dec 23, 2008 D. Forsyth, J. Ponce, Computer Vision: Modern Approach B. Jahne, H. Haubacker, Computer Vision and Applications

Grading: Midterm: 30% Final: 40% Homework: 30% with your partner

What is Computer Vision?
Make the computer SEE SEE: Extracting Visual information from any sensed data Goal : Make useful decisions about objects and scenes based on sensed data

OBJECT perceptible material thing vision

Object According to Plato
Things consisting of forms and matter Forms are proper subjects of philosophical investigation, for they have the highest degree of reality. Matter is the ordinary substace

OBJECTS ANIMALS PLANTS INANIMATE ….. TAPIR BOAR GROUSE CAMERA NATURAL
MAN-MADE ….. VERTEBRATE MAMMALS BIRDS TAPIR BOAR GROUSE CAMERA

How many object categories are there?
~10,000 to 30,000 Biederman 1987

SCENE Consists of multiple objects
Goal : Make useful decisions about objects and scenes based on sensed data

Bruegel, 1564

Sensed Data: Images All sorts of sensor data carying visual info Optic
Thermal IR MR SAR …. Goal : Make useful decisions about objects and scenes based on sensed data

IMAGES: Sattelite,CT, SAR, Thermal, scientific

Useful Decisions Recognize, classify, detect, locallize, retrieve, annotate, varify Goal : Make useful decisions about objects and scenes based on sensed data

So what does recognition involve?

Verification: is that a lamp?

Detection: are there people?

Identification: is that Potala Palace?

Object categorization
mountain tree building banner street lamp vendor people

Scene and context categorization
outdoor city …

Application Domains of computer vISION

Traffics Assisted driving Pedestrian and car detection Lane detection
meters Ped Car Pedestrian and car detection Assisted driving Lane detection Collision warning systems with adaptive cruise control, Lane departure warning systems, Rear object detection systems,

Retrieval: Improving online search
Query: STREET Digital Album

Similarity Retrieval of Brain Data

Image Databases: Content-Based Retrieval
Images from my Ground-Truth collection. What categories of image databases exist today?

Abstract Regions for Object Recognition
Original Images Color Regions Texture Regions Line Clusters Caption!

Insect Identification for Ecology Studies
Doroneuria (Dor) Calineuria (Cal) Yoraperla (Yor)

Document Analysis

Surveillance: Object and Event Recognition in Aerial Videos
Original Video Frame Color Regions Structure Regions

Video Analysis What are the objects? What are the events?

3D Reconstruction of the Blood Vessel Tree

Recognition of 3D Object Classes from Range Data

3D Scanning Scanning Michelangelo’s “The David”
The Digital Michelangelo Project - UW Prof. Brian Curless, collaborator 2 BILLION polygons, accuracy to .29mm

The Digital Michelangelo Project, Levoy et al.

Tasks in Computer Vision
Segment an image into useful regions Perform measurements on certain areas Determine what object(s) are in the scene Calculate the precise location(s) of objects Visually inspect a manufactured object Construct a 3D model of the imaged object Find “interesting” events in a video liver kidney spleen

HISTORY OF COMPUTER VISION

Why is it Difficult? What are the Challenges

Challenges 1: view point variation
Michelangelo

Challenges 2: illumination
slide credit: S. Ullman

Challenges 3: occlusion
Magritte, 1957

Challenges 4: scale

Challenges 5: deformation
Xu, Beihong 1943

Challenges 6: background clutter
Klimt, 1913

Challenges 7: intra-class variation

image image The Three Stages of Computer Vision low-level mid-level
high-level image image image features features analysis

Low-Level sharpening blurring

Low-Level Mid-Level Canny original image edge image ORT data structure
circular arcs and line segments

Mid-level K-means clustering (followed by connected component
analysis) regions of homogeneous color original color image data structure

Low- to High-Level edge image consistent line clusters
low-level edge image mid-level consistent line clusters high-level Building Recognition

Recognition Scale / orientation range to search over Speed Context

Course content Image representatiın Matrices, functions
Image file formats Binary Image Analysis Pixel and neighborhood Masks and convolution Counting and labeling Morphological operations

Thresholding Object Recognition conceps Representation Classification Measures Gray-level Image Analysis Gray level mapping Noise removal, Smoothing

Color and shading Color spaces Shades Texture Texels, texture description Texture measure Segmentation Clustering Region Growing Content Based Image retrieval

Imaging and Image Representation Ch:2 Shapiro et al.

Classical Imaging Process
Light reaches surfaces in 3D Surfaces reflect Sensor element receives light energy Intensity counts Angles count Material counts What are radiance and irradiance?

Radiometry and Computer Vision*
Radiometry is a branch of physics that deals with the measurement of the flow and transfer of radiant energy. Radiance is the power of light that is emitted from a unit surface area into some spatial angle; the corresponding photometric term is brightness. Irradiance is the amount of energy that an image- capturing device gets per unit of an efficient sensitive area of the camera. Quantizing it gives image gray tones. From Sonka, Hlavac, and Boyle, Image Processing, Analysis, and Machine Vision, ITP, 1999.

Sensors: Image acquisition Devices
CCD (Charged Couple Device ) X-Ray Devices Microwave Devices UV Devices Thermal Cameras IR Devices 3-D scanners

CCD type camera: Commonly used in industrial applications
Array of small fixed elements Each element converts the light energy to electric charge 1x1 cm Can add refracting elements to get color in 2x2 neighborhoods 8-bit intensity common

Computer Vision Algorithms Main concern of CV is to develop Algorithms

LIDAR also senses surfaces
Single sensing element scans scene Laser light reflected off surface and returned Phase shift codes distance Brightness change codes albedo (surface reflectance) Stockman MSU/CSE Fall 2008

2.5D face image from Minolta Vivid 910 scanner
A rotating mirror scans a laser stripe across the object. 320x240 rangels obtained in about 2 seconds. [x,y,z,R,G,B] image. Stockman MSU/CSE Fall 2008

3D scanning technology 3D image of voxels obtained
Usually computationally expensive reconstruction of 3D from many 2D scans (CAT computer-aided-tomography) Stockman MSU/CSE Fall 2008

Magnetic Resonance Imaging
Sense density of certain chemistry S slices x R rows x C columns Volume element (voxel) about 2mm per side At left is shaded 2D image created by “volume rendering” a 3D volume: darkness codes depth Stockman MSU/CSE Fall 2008

Single slice through human head
MRIs are computed structures, computed from many views. At left is MRA (angiograph), which shows blood flow. CAT scans are computed in much the same manner from X-ray transmission data. Stockman MSU/CSE Fall 2008

Problems in Image Acquisition

Human eye as a spherical camera
millionRods sense intensity 6-7 million Cones sense color Fovea has tightly packed area, more cones Periphery has more rods Focal length is about 20mm Pupil/iris controls light entry Eye scans, or saccades to image details on fovea 100M sensing cells funnel to 1M optic nerve connections to the brain Stockman MSU/CSE Fall 2008

RODES AND CONES

Image Formation

Problems in HVS Mach Band Effect

Contrast

Illusions

Images: 2D projections of 3D
The 3D world has color, texture, surfaces, volumes, light sources, temperature, reflectance, … A 2D image is a projection of a scene from a specific viewpoint.

Digital Images form arrays

Digitizing- SAmpling

Quantization

Digital Image: Sampled and quantized

Sampling at different resolution

Sampling

Quantization

What is the appropriate sampling and quantization rates?

Resolution resolution: precision of the sensor
nominal resolution: size of a single pixel in scene coordinates (ie. meters, mm) common use of resolution: num_rows X num_cols (ie. 515 x 480) field of view (FOV): size of the scene a sensor can sense

Images as Functions g(x,y) = val or f(row, col) = val
A gray-tone image is a function: g(x,y) = val or f(row, col) = val A color image is just three functions or a vector-valued function: f(row,col) =(r(row,col), g(row,col), b(row,col)) Multi-spectral Image: f(row,col) =(f1(row,col), f2(row,col),…, fn(row,col))

Gray-tone Image as Function

Image vs Matrix There are many different file formats.

Digital Image Terminology:
pixel (with value 94) its 3x3 neighborhood region of medium intensity resolution (7x7) binary image gray-scale (or gray-tone) image color image multi-spectral image range image labeled image

Image File Formats Portable Gray Map (PGM) older form
GIF was early commercial version JPEG (JPG) is modern version MPEG for motion Many others exist: header plus data Do they handle color? Do they provide for compression? Are there good packages that use them or at least convert between them?

Commpression: Reduce the redundancy
Lossy Lossless

Run Coding Row Row Row Code 1: 3(0)1(1)2(0)1(1)6(0) Or Code2: (4,4)(7,7)

PGM image with ASCII info.
P2 means ASCII gray Comments W=16; H=8 192 is max intensity Can be made with editor Large images are usually not stored as ASCII

PBM/PGM/PPM Codes P1: ascii binary (PBM) P2: ascii grayscale (PGM)
P3: ascii color (PPM) P4: byte binary (PBM) P5: byte grayscale (PGM) P6: byte color (PPM)

JPG current popular form
Public standard Allows for image compression; often 10:1 or :1 are easily possible 8x8 intensity regions are fit with basis of cosines Error in cosine fit coded as well Parameters then compressed with Huffman coding Common for most digital cameras

From 3D Scenes to 2D Images
Object World Camera Real Image Pixel Image

Binary Image Analysis

Binary image analysis consists of a set of image analysis operations that are used to produce or process binary images, usually images of 0’s and 1’s. 0 represents the background 1 represents the foreground

Binary Image Analysis is used in a number of practical applications, e.g. part inspection riveting fish counting document processing

What kinds of operations?
Separate objects from background and from one another Aggregate pixels for each object Compute features for each object

Example: red blood cell image
Many blood cells are separate objects Many touch – bad! Salt and pepper noise from thresholding How useable is this data?

Results of analysis 63 separate objects detected
Single cells have area about 50 Noise spots Gobs of cells

Useful Operations 1. Thresholding a gray-tone image
2. Determining good thresholds 3. Connected components analysis 4. Binary mathematical morphology 5. All sorts of feature extractors (area, centroid, circularity, …)

1. Thresholding Convert gray level or color image into binary image
Use histogram

Histogram Background is black Healthy cherry is bright
Bruise is medium dark Histogram shows two cherry regions (black background has been removed) pixel counts 256 gray-tone values

Histogram-Directed Thresholding
How can we use a histogram to separate an image into 2 (or several) different regions? Is there a single clear threshold? 2? 3?

Automatic Thresholding: Otsu’s Method
Grp 1 Grp 2 Assumption: the histogram is bimodal t Method: find the threshold t that minimizes the weighted sum of within-group variances for the two groups that result from separating the gray tones at value t.

Thresholding Example original gray tone image binary thresholded image

2. Connected Components Labeling
Once you have a binary image, you can identify and then analyze each connected set of pixels. The connected components operation takes in a binary image and produces a labeled image in which each pixel has the integer label of either the background (0) or a component. connected components binary image after morphology

Methods for CC Analysis
Recursive Tracking (almost never used) Parallel Growing (needs parallel hardware) Row-by-Row (most common) Classical Algorithm (see text) Efficient Run-Length Algorithm (developed for speed in real industrial applications)

Equivalent Labels Original Binary Image

Equivalent Labels The Labeling Process
1  2 1  3

Run-Length Data Structure
1 2 3 4 row scol ecol label Binary Image 1 2 3 4 5 6 7 U N U S E D 0 Rstart Rend 1 2 3 4 2 4 6 0 0 7 7 Row Index Runs

Run-Length Algorithm /* Pass 1 (by rows) */
Procedure run_length_classical { initialize Run-Length and Union-Find data structures count <- 0 /* Pass 1 (by rows) */ for each current row and its previous row move pointer P along the runs of current row move pointer Q along the runs of previous row

Case 1: No Overlap Q Q |/////| |/////| |////| |///| |///| |/////| P P
|/////| |////| |///| |///| |/////| P P /* new label */ count <- count + 1 label(P) <- count P <- P + 1 /* check Q’s next run */ Q <- Q + 1

Case 2: Overlap Q Q |///////| |/////| |/////////////|
Subcase 2: P’s run has a label that is different from Q’s run Subcase 1: P’s run has no label yet Q Q |///////| |/////| |/////////////| |///////| |/////| |/////////////| P P label(P) <- label(Q) move pointer(s) union(label(P),label(Q)) move pointer(s) }

Pass 2 (by runs) /* Relabel each run with the name of the
equivalence class of its label */ For each run M { label(M) <- find(label(M)) } where union and find refer to the operations of the Union-Find data structure, which keeps track of sets of equivalent labels.

Labeling shown as Pseudo-Color
connected components of 1’s from thresholded image connected components of cluster labels

Mathematical Morphology
Binary mathematical morphology consists of two basic operations dilation and erosion and several composite relations closing and opening conditional dilation . . .

Dilation Dilation expands the connected sets of 1s of a binary image.
It can be used for 1. growing features 2. filling holes and gaps

Erosion Erosion shrinks the connected sets of 1s of a binary image.
It can be used for 1. shrinking features 2. Removing bridges, branches and small protrusions

Structuring Elements A structuring element is a shape mask used in
the basic morphological operations. They can be any shape and size that is digitally representable, and each has an origin. box disk hexagon something box(length,width) disk(diameter)

Dilation with Structuring Elements
The arguments to dilation and erosion are a binary image B a structuring element S dilate(B,S) takes binary image B, places the origin of structuring element S over each 1-pixel, and ORs the structuring element S into the output image at the corresponding position. dilate 1 1 1 S B B  S origin

Erosion with Structuring Elements
erode(B,S) takes a binary image B, places the origin of structuring element S over every pixel position, and ORs a binary 1 into that position of the output image only if every position of S (with a 1) covers a 1 in B. origin 1 erode B S B S

Example to Try S B 1 1 1 erode dilate with same structuring element

Opening and Closing Closing is the compound operation of dilation followed by erosion (with the same structuring element) Opening is the compound operation of erosion followed by dilation (with the same structuring element)

Use of Opening Original Opening Corners
What kind of structuring element was used in the opening? How did we get the corners?

Gear Tooth Inspection original binary image How did they do it?
detected defects

Some Details

Region Properties Properties of the regions can be used to recognize objects. geometric properties (Ch 3) gray-tone properties color properties texture properties shape properties (a few in Ch 3) motion properties relationship properties (1 in Ch 3)

Geometric and Shape Properties
area centroid perimeter perimeter length circularity elongation mean and standard deviation of radial distance bounding box extremal axis length from bounding box second order moments (row, column, mixed) lengths and orientations of axes of best-fit ellipse Which are statistical? Which are structural?

Region Adjacency Graph
A region adjacency graph (RAG) is a graph in which each node represents a region of the image and an edge connects two nodes if the regions are adjacent. 1 1 2 2 4 3 4 3

CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen

Similar presentations

Presentation on theme: "CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen

Similar presentations

Presentation on theme: "CSE 484: Computer Vision Fatoş T. Yarman Vural Atıl İşçen"— Presentation transcript:

Similar presentations

About project

Feedback