Goals of Computer Vision

Goals of Computer Vision
To make useful decisions based on sensed images To construct 3D structure from 2D images.

Related Areas of Computer Vision
Computer Graphics Image Processing Pattern Recognition Artificial Intelligence Virtual Reality

Applications of Computer Vision
Human Face Identification Image Database Query Inspecting Products Examining the Inside of a Human Head (MRI) Optic Character Recognition (OCR) Analyze Satellite Image Unmanned Autonomous Vehicle (UAV)

Human Visual Perception

Human eye

Human eye is more sensitive to luminance than to chrominance
How do we see an object? Light Eye Object Luminance  Lightness  Rods Chrominance  Color  Cones Human eye is more sensitive to luminance than to chrominance

Cones & Rods ( day & night )

Machine Visual Perception

Digital Image

Sampling & Quantization

Types of Images Analog Image Digital Image Binary Image
Gray-scale Image Color (Multispectral) Image

Image Formats Vector Image Bitmap Image RAW  no header
RLE (Run-Length Encoding) PGM,PPM,PNM (Portable Gray Map) GIF (Graphics Interchange Format)  no more than 256 colors TIF (Tag Image File Format)  Scanner EPS (Encapsulated Postscript)  Printer JPEG (Joint Photographic Experts Group)  Compression ratio MPEG (Motion Picture Experts Group)  Video

Comparison of Image Formats

Chap 3 : Binary Image Analysis

Counting Foreground Objects

Example

Counting Object Algorithm

Counting Background Objects

Connected Component Labeling

1. Recursive Connected Components Algorithm

Example

2. Classical Connected Components Algorithm

Size Filter To remove small size noise 
After connected component labeling, all components below T in size are removed by changing the corresponding pixels to 0.

Pepper & Salt Noise Reduction
Change a pixel from 0 to 1 if all neighborhood pixels of the pixel is 1 Change a pixel from 1 to 0 if all neighborhood pixels of the pixel is 0

Expanding & Shrinking

Example 1

Example 2

Morphological Filter

Example

Closing & Opening

Opening Example

Morphological Filter Example 1

Structure Element Example 1

Morpho-logical Filter Example 2

Structure Element Example 2

Conditional Dilation

Conditional Dilation Example

Area & Centroid

Perimeter

Circularity

Second Moment

Orientation

Bounding Box

Thresholding

P-Tile Method

Mode Method

Mode Algorithm

Iterative Method

Adaptive Method

Adaptive Method Example

Variable Thresholding Example

Double Thresholding Method

Double Thresholding Example

Recursive Histogram Clustering

Chapter 4: Pattern Recognition

Classification Classification is a process that assigns a label to an object according to some representation of the object’s properties. Classifier is a device or algorithm that inputs an object representation and outputs a class label. Reject class is a generic class for objects that cannot be placed in any of the designated known classes.

Error & Reject rate Empirical error rate of a classification system is the number of errors made on independent test data divided by the number of classifications attempted. Empirical reject rate of a classification system is the number of rejects made on independent test data divided by the number of classifications attempted.

False Alarm & Miss Detection
Two-class problem example: whether a person has disease or not False Alarm (False Positive): The system incorrectly says that the person does have disease. Miss Detection (False Negative): The system incorrectly says that the person does not have disease.

Receiver Operating Curve (ROC)

Precision & Recall Example: the objective of document retrieval (image retrieval) is to retrieve interesting objects and not too many uninteresting objects according to features supplied in a user’s query. Precision is the number of relevant documents retrieved divided by the total number of documents retrieved. Recall is the number of relevant documents retrieved by the system divided by the total number of relevant documents in the database.

Example Suppose an image database contains 200 sunset images.
Suppose an automatic retrieval system retrieves 150 of those 200 relevant images and 100 other images.

Features used for representation
Area of the character in units of black pixels Height and Width of the bounding box of its pixels Number of holes inside the character Number of strokes forming the character Center (Centroid) of the set of pixels Best axis direction (Orientation) through the pixels as the axis of least inertia Second moments of the pixels about the axis of least inertia and most inertia

Example Features

Classification using nearest mean

Euclidean Distance

Example

Classification using nearest neighbors
A brute force approach computes the distance from x to all samples in the database and remembers the minimum distance. Then, x is classified into the same category as its nearest neighbor. Advantage  new labeled samples can be added to the database at any time. A better approach is the k nearest neighbors rule.

Structural Pattern Recognition
Graph matching algorithm can be used to perform structural pattern recognition.

Two characters with the same global features but different structure
Lid : a virtual line segment that closes up a bay. Left : specifies that one lid lies on the left of another. Right : specifies that one lid lies on the right of another.

Confusion Matrix Reject Rate = Error Rate =

Decision Tree 1

Decision Tree 2

Automatic Construction of a Decision Tree
Information content I(C;F) of the class variable C with respect to the feature variable F is defined by The feature variable F with maximum I(C,F) will be selected as the first feature to be tested.

Example I(C,X) = I(C,Y) = I(C,Z) =

General Case At any branch node of the tree when the selected feature does not completely separate a set of training samples into the proper classes  the tree construction algorithm is invoked recursively for the subsets of training samples at each of the child nodes.

Bayesian Decision Making

Bayesian classifier Bayesian classifier classifies an object into the class to which it is most likely to belong based on the observed features. In other words, it makes the classification decision wi for the maximum p(x) is the same for all the classes, so compare p(x|wi)P(wi) is enough. Poisson, Exponential, and Normal (Gaussian) distributions are commonly used for p(x|wi).

Chapter 5. Filtering Images

Image Enhancement & Restoration
Image Enhancement operators improve the detectability of important image details or objects by man or machine. Image Restoration attempts to restore a degraded image to an ideal condition.

Histogram Stretching

Histogram Equalization
The output image should use all available gray levels The output image has approximately the same number of pixels of each gray level.

Histogram Equalization Example

Steps to perform Histogram Equalization
Calculate cumulative histogram. Normalize by dividing by the total number of pixels. Multiply these values by the maximum gray-level values, then round the result to the closest integer. Map the original values to the results from step 3.

Practice Original Gray-Level Number of pixels Step 1 Step 2 Step 3 10
10 1 8 2 9 3 4 14 5 6 7

Removal of small regions
Removal of Salt-and-Pepper Noise Removal of Small Components

Convolution process Kernel or Mask

Convolution

Image Smoothing Mean Filtering Gaussian Filtering Median Filtering
Smoothing uniform regions Preserve edge structure

Mean Filtering Example

Gaussian Filtering Masks

Properties of smoothing masks
The amount of smoothing and noise reduction is proportional to the mask size. Step edges are blurred in proportion to the mask size.

Median Filtering Example

Detecting Edges

Edge Detection Masks

Properties of derivative masks
The sum of coordinates of derivative masks is zero so that a zero response is obtained on constant regions. First derivative masks produce high absolute values at point of high contrast. Second derivative masks produce zero-crossings at points of high contrast.

Edge Magnitude & Orientation

Edge Detection Example

Laplacian Of Gaussian (LOG)

Two equivalent methods
Convolve the image with a Gaussian smoothing filter and compute the Laplacian of the result. Convolve the image with the linear filter that is the Laplacian of the Gaussian filter. 1 2

Zero crossing detection
A zero crossing at a pixel implies that the values of the two opposing neighboring pixels in some direction have different signs. There four cases to test: up/down left/right up-left/down-right up-right/down-left

Gaussian Equations

Gaussian Plots

Gaussian Properties Symmetry matrix
95% of the total weight is contained within 2 of the center. In the first derivative of 1D Gaussian, extreme points are located at – and + . In the second derivative of 1D Gaussian, zero crossings are located at – and + . The LOG filter responds well to: small blobs coinciding with the center lobe. large step edges very close to the center lobe.

LOG Masks

LOG Example

Chapter 6. Color & Shading

Perception of objects

Perception of objects The spectrum (energy) of light source.
The spectral reflectance of the object surface. The spectral sensitivity of the sensor.

Human eye is more sensitive to luminance than to chrominance
How do we see an object? Light Eyes Object Luminance  Lightness Chrominance  Color Human eye is more sensitive to luminance than to chrominance

Light Spectrum

Chromaticity Diagram

RGB Model

RGB signals from a video camera

RGB Colors Colors specify: A mixture of red, green, and blue light
Values between 0.0 (none) and 1.0 (lots) Color Red Green Blue White Black Yellow Magenta Cyan

Normalized RGB r+g+b=1

HSI Model

Light vs. Pigment

CMY Model

YIQ Model TV transmission  digital space  YCBCR
 analog space  YIQ (NTSC)  YUV (PAL)

YUV & YCBCR Model

TV Broadcast

Color Histogram Color Histogram are relatively invariant to
Translation Rotation Scaling Simple methods for color histogram construction Concatenate the higher order two bits of each RGB color code.  64 bins Compute three separate RGB histograms (4 bits each) and just concatenate them into one.  48 bins

Similarity measure for histogram matching
It is common to smoothing the histogram before matching  to adapt minor shifts of the reflectance spectrum.

Color Segmentation A plot of pixels (r,g) taken from different images containing faces. (r,g) : normalized red and green values

Face Detection Face region classification. (R>G>B)
Connected Component Labeling. Select the largest component as face object assuming there is only one face in the image. Discard remaining components or merges them with the face object. Computing the location of eyes and nose.

Shading

Three types of Material Reflection
Diffuse  Color of reflected light from diffuse reflection (light scattered randomly) Ambient  Amount of background light the surface reflects Specular  Color of reflected light from specular reflection (light reflected in a regular manner)

Diffuse Reflection

Specular Reflection

Darkening with Distance

Complications The above models of illumination and reflection are simplified. Some objects reflect light as well as emit light. For example: light bulbs. In uncontrolled scenes, such as outdoor scenes, it is much more difficult to account for the different phenomena.

Chapter 7 Texture

Texture Analysis Structural approach Statistical approach

Structural approach

Shape from texture Size Shape Density

Statistical approach Edge density and direction Local binary partition
Co-occurrence matrices and features Laws texture energy measures Autocorrelation

Edge density and direction
Gradient magnitude : Mag(p) Gradient direction : Dir(p)

Histogram based texture description
Magnitude histogram: Hmag(R) Direction histogram: Hdir(R)

Example

L1 Distance between two histograms
Another similarity measure for histogram matching

Local binary partition
For each pixel p in the image, the eight neighbors are examined to see if their intensity is greater than that of p. The results from the eight neighbors are used to construct an eight-digit binary number b1b2b3b4b5b6b7b8 bi=0 if the intensity of the ith neighbor is less than or equal to that of p bi=1 otherwise A histogram of these numbers is used to represent the texture of the image.

Practice

Co-occurrence matrices

Practice

Two variations of the co-occurrence matrix
Normalized co-occurrence matrix Symmetric co-occurrence matrix

Features derived from co-occurrence matrix

Laws texture energy measures
Remove the effects of illumination by moving a small window around the image, and subtracting the local average from each pixel. Nine different 5x5 masks are applied to the preprocessed image, producing 9 images. Smooth the images using absolute mean filter. The output is a single image with a vector of nine texture attributes at each pixel.

Laws nine 5x5 masks L5 (Level) = [1 4 6 4 1] E5 (Edge) = [-1 -2 0 2 1]
S5 (Spot) = [ ] R5 (Ripple) = [ ] L5E5/E5L5 L5S5/S5L5 L5R5/R5L5 E5S5/S5E5 E5R5/R5E5 S5R5/R5S5 E5E5 S5S5 R5R5

Autocorrelation To describe the fineness/coarseness of the texture.

Texture Segmentation

Chapter 8 Content-Based Image Retrieval

Image Database Queries
Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed according to these attributes, so that they can be rapidly retrieved when a query is issued. This type of query can be expressed in Structured Query Language (SQL). Query By Example (QBE): User just show the system a sample image, then the system should be able to return similar images or images containing similar objects.

Image Distance & Similarity Measures
Color Similarity Texture Similarity Shape Similarity Object & Relationship similarity

Color Similarity Color percentages matching: R:20%, G:50%, B:30%
Color histogram matching Dhist(I,Q)=(h(I)-h(Q))TA(h(I)-h(Q)) A is a similarity matrix  colors that are very similar should have similarity values close to one.

Color Similarity Color layout matching: compares each grid square of the query to the corresponding grid square of a potential matching image and combines the results into a single image distance  where CI(g) represents the color in grid square g of a database image I and CQ(g) represents the color in the corresponding grid square g of the query image Q. some suitable representations of color are Mean Mean and standard deviation Multi-bin histogram

IQ based on Color Layout

Texture Similarity Pick and click
Suppose T(I) is a texture description vector which is a vector of numbers that summarizes the texture in a given image I (for example: Laws texture energy measures), then the texture distance measure is defined by Texture layout

IQ based on Pick and Click

Shape Similarity Shape Histogram Boundary Matching Sketch Matching

1. Shape Histogram Projection matching
Horizontal & vertical projection: Each row and each column become a bin in the histogram. The count that is stored in a bin is the number of 1-pixels that appear in that row or column. Diagonal projection: An alternative is to define the bins from the top left to the bottom right of the shape. Size invariant  the number of row bins and the number of column bins in the bounding box can be fixed, histograms can be normalized before matching. Translation invariant Rotation invariant  compute the axis of the best-fitting ellipse and rotate the shape

Horizontal and vertical projections

Diagonal projection

1. Shape Histogram Orientation histogram
Construct a histogram over the tangent angle at each pixel on the boundary of the shape. Size invariant  histograms can be normalized before matching. Translation invariant Rotation invariant  choosing the bin with the largest count to be the first bin. Starting point invariant

2. Boundary Matching 1D Fourier Transform on the boundary

Fourier Descriptors If only the first M coefficients (a0, a1, …, aM-1) are used, then is an approximation of un the coefficients (a0, a1, …, aM-1) is called Fourier Descriptors The Fourier distance measure is defined as:

Properties of Fourier Descriptors
Simple geometric transformations of a boundary, such as translation, rotation, and scaling, are related to simple operations of the boundary’s Fourier descriptors.

A secret formula A formula for Fourier Descriptor that is invariant to translation, scaling, rotation, and starting point.

IQ based on Boundary Matching

3. Sketch Matching Affine transformation to specified size and applying median filter. Edge detection using a gradient-based edge-finding algorithm  Refined edge image Thinning and shrinking  Abstract image The images are divided into grid squares and matching is performed based on local correlation.

3. Sketch Matching The sketch distance measure is the inverse of the sum of each of the local correlations  where I(g) refres to grid square g of the abstract image I, Q(g) refers to grid square g of the linear sketch resulting from query image Q.

Object and Relational Similarity
Face finding: Neural net classifier Flesh finding: then threshold based on

Chapter 9. Motion from 2D Image Sequence

Cases of motion Still camera, still object Still camera, moving object
Moving camera, still object Moving camera, moving object Single object Multiple objects

Applications Many Safety and Security applications
Automatically switch on a light upon detection of significant motion. Tracks the players and ball in a tennis match and provides an analysis of the elements of the game.

Uses of motion Create more observations than available from a single viewpoint. Provide information to compute relative depth of objects since the images of close objects change faster than the images of remote objects. Shape from Motion  the multiple viewpoints allow for a triangulating computation similar to binocular stereo.

Image Subtraction Subtract the image It from the previous image It-1
Subtract the image I from the background image B

Algorithm for Image Subtraction

Motion Vectors Motion field Focus of expansion (FOE)
Focus of contraction (FOC) Image flow

Decathelete Game

Detecting interesting points

Search Rectangle

Algorithm for Motion Vectors

Computing the paths of moving points
 Track by motion only, no texture or color information for moving objects  Assumptions The location of a physical object changes smoothly over time. The velocity of a physical object changes smoothly over time (including both speed and direction) An object can be at only one location in space at a given time. Two objects cannot occupy the same location at the same time.

Trajectories of two objects

Smoothness If an object i is observed at time instants t=1,2,…,n, then the sequence of image points Ti=(pi,1, pi,2, …, pi,n) is called the trajectory of i.

Greedy-Exchange Algorithm

Example of Greedy-Exchange Algorithm

Detecting significant changes in videos
Scene change Shot change Camera pan Camera zoom Camera effects: fade, dissolve, and wipe  Segment and store video subsequences in digital libraries for random access.

Segmenting Video Sequences
The transitions can be used to segment the video and can be detected by large changes in the features of the images over time.

Similarity measure by histogram

Similarity measure by likelihood ratio
Break the image into larger blocks and test to see if a majority of the blocks are essentially the same in both images.

Moving cursor by tracking face

Chapter 10. Image Segmentation

10.1 Identifying Regions Clustering Region Growing

Clustering

Iterative K-Means Clustering

Isodata Clustering

Region Growing (Split & Merge Algorithm)
Split the image into equally sized regions. Calculate the gray level variance for each region If the gray level variance is larger than a threshold, then split the region. Otherwise, an effort is made to merge the region with its neighbors. Repeat Step 2 & 3. Gray level variance :

Region Growing

Recursive Histogram Clustering

10.2 Representing Regions Overlays Labeled Images Boundary Coding
Chain Code Polygonal Approximation Quadtrees Property Tables

Overlaying

Boundary Coding

Quadtrees

10.3 Identifying Contours Tracking Existing Region Boundaries (labeled image as input) Canny Edge Detector (gray-scale image as input) Aggregating Consistent Neighboring Edges into Curves (binary edge image as input) Hough Transform (gray-scale image as input)

Finding the Borders of Labeled Regions

Canney Edge Detector

Canny Edge Detector Example

Tracking Edges of a Binary Edge Image

Hough Transform

Accumulator array for Hough Transform

Hough Transform for Accumulating Straight Lines

Hough Transform Example

Hough Transform for Extracting Straight Lines

Chap 12. Perceiving 3D from 2D Images

Geometric model

Epipolar geometry (trivial)

Epipolar geometry (general)

Epipolar lines and Epipolar curves
圖十、Epipolar Constraint e2 e1 (a) (b)

Goals of Computer Vision

Similar presentations

Presentation on theme: "Goals of Computer Vision"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Goals of Computer Vision

Similar presentations

Presentation on theme: "Goals of Computer Vision"— Presentation transcript:

Similar presentations

About project

Feedback