Goals of Computer Vision

Slides:



Advertisements
Similar presentations
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Advertisements

電腦視覺 Computer and Robot Vision I
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Document Image Processing
CDS 301 Fall, 2009 Image Visualization Chap. 9 November 5, 2009 Jie Zhang Copyright ©
July 27, 2002 Image Processing for K.R. Precision1 Image Processing Training Lecture 1 by Suthep Madarasmi, Ph.D. Assistant Professor Department of Computer.
Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.
Computer Vision Lecture 16: Region Representation
COLORCOLOR A SET OF CODES GENERATED BY THE BRAİN How do you quantify? How do you use?
Each pixel is 0 or 1, background or foreground Image processing to
Color Image Processing
1Ellen L. Walker Edges Humans easily understand “line drawings” as pictures.
Lecture 4 Linear Filters and Convolution
1 Image Filtering Readings: Ch 5: 5.4, 5.5, 5.6,5.7.3, 5.8 (This lecture does not follow the book.) Images by Pawan SinhaPawan Sinha formal terminology.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
CS292 Computational Vision and Language Visual Features - Colour and Texture.
A Novel 2D To 3D Image Technique Based On Object- Oriented Conversion.
Information that lets you recognise a region.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Goals of Computer Vision To make useful decisions based on sensed images To construct 3D structure from 2D images.
Chapter 2. Image Analysis. Image Analysis Domains Frequency Domain Spatial Domain.
Neighborhood Operations
Machine Vision for Robots
Digital Image Processing, 2nd ed. © 2002 R. C. Gonzalez & R. E. Woods Chapter 11 Representation & Description Chapter 11 Representation.
Chap 3 : Binary Image Analysis. Counting Foreground Objects.
Chapter 1. Introduction. Goals of Image Processing “One picture is worth more than a thousand words” 1.Improvement of pictorial information for human.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
Content-Based Image Retrieval QBIC Homepage The State Hermitage Museum db2www/qbicSearch.mac/qbic?selLang=English.
1 Machine Vision. 2 VISION the most powerful sense.
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.
Announcements Final is Thursday, March 18, 10:30-12:20 –MGH 287 Sample final out today.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
Another Example: Circle Detection
- photometric aspects of image formation gray level images
CS262: Computer Vision Lect 09: SIFT Descriptors
Color Image Processing
Color Image Processing
Chapter 10 Image Segmentation
Digital Image Processing Lecture 16: Segmentation: Detection of Discontinuities Prof. Charlene Tsai.
Announcements Final is Thursday, March 20, 10:30-12:20pm
Color Image Processing
Image Segmentation – Edge Detection
Mean Shift Segmentation
Introduction Computer vision is the analysis of digital images
Fitting Curve Models to Edges
Image filtering Hybrid Images, Oliva et al.,
Presented by: Cindy Yan EE6358 Computer Vision
Introduction Computer vision is the analysis of digital images
ECE 692 – Advanced Topics in Computer Vision
Computer Vision Lecture 16: Texture II
Image Enhancement in the Spatial Domain
Color Image Processing
Image filtering Images by Pawan Sinha.
Image filtering Images by Pawan Sinha.
Filtering Things to take away from this lecture An image as a function
Computer and Robot Vision I
Announcements Final is Thursday, March 16, 10:30-12:20
Color Image Processing
Image filtering
Image filtering
Introduction Computer vision is the analysis of digital images
Filtering An image as a function Digital vs. continuous images
Image Filtering Readings: Ch 5: 5. 4, 5. 5, 5. 6, , 5
Review and Importance CS 111.
DIGITAL IMAGE PROCESSING Elective 3 (5th Sem.)
Presentation transcript:

Goals of Computer Vision To make useful decisions based on sensed images To construct 3D structure from 2D images.

Related Areas of Computer Vision Computer Graphics Image Processing Pattern Recognition Artificial Intelligence Virtual Reality

Applications of Computer Vision Human Face Identification Image Database Query Inspecting Products Examining the Inside of a Human Head (MRI) Optic Character Recognition (OCR) Analyze Satellite Image Unmanned Autonomous Vehicle (UAV)

Human Visual Perception

Human eye

Human eye is more sensitive to luminance than to chrominance How do we see an object? Light Eye Object Luminance  Lightness  Rods Chrominance  Color  Cones Human eye is more sensitive to luminance than to chrominance

Cones & Rods ( day & night )

Machine Visual Perception

Digital Image

Sampling & Quantization

Types of Images Analog Image Digital Image Binary Image Gray-scale Image Color (Multispectral) Image

Image Formats Vector Image Bitmap Image RAW  no header RLE (Run-Length Encoding) PGM,PPM,PNM (Portable Gray Map) GIF (Graphics Interchange Format)  no more than 256 colors TIF (Tag Image File Format)  Scanner EPS (Encapsulated Postscript)  Printer JPEG (Joint Photographic Experts Group)  Compression ratio MPEG (Motion Picture Experts Group)  Video

Comparison of Image Formats

Chap 3 : Binary Image Analysis

Counting Foreground Objects

Example

Counting Object Algorithm

Counting Background Objects

Connected Component Labeling

1. Recursive Connected Components Algorithm

Example

2. Classical Connected Components Algorithm

Size Filter To remove small size noise  After connected component labeling, all components below T in size are removed by changing the corresponding pixels to 0.

Pepper & Salt Noise Reduction Change a pixel from 0 to 1 if all neighborhood pixels of the pixel is 1 Change a pixel from 1 to 0 if all neighborhood pixels of the pixel is 0

Expanding & Shrinking

Example 1

Example 2

Morphological Filter

Example

Example

Closing & Opening

Opening Example

Morphological Filter Example 1

Structure Element Example 1

Morpho-logical Filter Example 2

Structure Element Example 2

Conditional Dilation

Conditional Dilation Example

Area & Centroid

Perimeter

Circularity

Second Moment

Orientation

Bounding Box

Thresholding

P-Tile Method

Mode Method

Mode Algorithm

Iterative Method

Adaptive Method

Adaptive Method Example

Variable Thresholding Example

Double Thresholding Method

Double Thresholding Example

Recursive Histogram Clustering

Chapter 4: Pattern Recognition

Classification Classification is a process that assigns a label to an object according to some representation of the object’s properties. Classifier is a device or algorithm that inputs an object representation and outputs a class label. Reject class is a generic class for objects that cannot be placed in any of the designated known classes.

Error & Reject rate Empirical error rate of a classification system is the number of errors made on independent test data divided by the number of classifications attempted. Empirical reject rate of a classification system is the number of rejects made on independent test data divided by the number of classifications attempted.

False Alarm & Miss Detection Two-class problem example: whether a person has disease or not False Alarm (False Positive): The system incorrectly says that the person does have disease. Miss Detection (False Negative): The system incorrectly says that the person does not have disease.

Receiver Operating Curve (ROC)

Precision & Recall Example: the objective of document retrieval (image retrieval) is to retrieve interesting objects and not too many uninteresting objects according to features supplied in a user’s query. Precision is the number of relevant documents retrieved divided by the total number of documents retrieved. Recall is the number of relevant documents retrieved by the system divided by the total number of relevant documents in the database.

Example Suppose an image database contains 200 sunset images. Suppose an automatic retrieval system retrieves 150 of those 200 relevant images and 100 other images.

Features used for representation Area of the character in units of black pixels Height and Width of the bounding box of its pixels Number of holes inside the character Number of strokes forming the character Center (Centroid) of the set of pixels Best axis direction (Orientation) through the pixels as the axis of least inertia Second moments of the pixels about the axis of least inertia and most inertia

Example Features

Classification using nearest mean

Euclidean Distance

Example

Classification using nearest neighbors A brute force approach computes the distance from x to all samples in the database and remembers the minimum distance. Then, x is classified into the same category as its nearest neighbor. Advantage  new labeled samples can be added to the database at any time. A better approach is the k nearest neighbors rule.

Structural Pattern Recognition Graph matching algorithm can be used to perform structural pattern recognition.

Two characters with the same global features but different structure Lid : a virtual line segment that closes up a bay. Left : specifies that one lid lies on the left of another. Right : specifies that one lid lies on the right of another.

Confusion Matrix Reject Rate = Error Rate =

Decision Tree 1

Decision Tree 2

Automatic Construction of a Decision Tree Information content I(C;F) of the class variable C with respect to the feature variable F is defined by The feature variable F with maximum I(C,F) will be selected as the first feature to be tested.

Example I(C,X) = I(C,Y) = I(C,Z) =

General Case At any branch node of the tree when the selected feature does not completely separate a set of training samples into the proper classes  the tree construction algorithm is invoked recursively for the subsets of training samples at each of the child nodes.

Bayesian Decision Making

Bayesian classifier Bayesian classifier classifies an object into the class to which it is most likely to belong based on the observed features. In other words, it makes the classification decision wi for the maximum p(x) is the same for all the classes, so compare p(x|wi)P(wi) is enough. Poisson, Exponential, and Normal (Gaussian) distributions are commonly used for p(x|wi).

Chapter 5. Filtering Images

Image Enhancement & Restoration Image Enhancement operators improve the detectability of important image details or objects by man or machine. Image Restoration attempts to restore a degraded image to an ideal condition.

Histogram Stretching

Histogram Equalization The output image should use all available gray levels The output image has approximately the same number of pixels of each gray level.

Histogram Equalization Example

Steps to perform Histogram Equalization Calculate cumulative histogram. Normalize by dividing by the total number of pixels. Multiply these values by the maximum gray-level values, then round the result to the closest integer. Map the original values to the results from step 3.

Practice Original Gray-Level Number of pixels Step 1 Step 2 Step 3 10 10 1 8 2 9 3 4 14 5 6 7

Removal of small regions Removal of Salt-and-Pepper Noise Removal of Small Components

Convolution process Kernel or Mask

Convolution

Image Smoothing Mean Filtering Gaussian Filtering Median Filtering Smoothing uniform regions Preserve edge structure

Mean Filtering Example

Gaussian Filtering Masks

Properties of smoothing masks The amount of smoothing and noise reduction is proportional to the mask size. Step edges are blurred in proportion to the mask size.

Median Filtering Example

Detecting Edges

Edge Detection Masks

Properties of derivative masks The sum of coordinates of derivative masks is zero so that a zero response is obtained on constant regions. First derivative masks produce high absolute values at point of high contrast. Second derivative masks produce zero-crossings at points of high contrast.

Edge Magnitude & Orientation

Edge Detection Example

Laplacian Of Gaussian (LOG)

Two equivalent methods Convolve the image with a Gaussian smoothing filter and compute the Laplacian of the result. Convolve the image with the linear filter that is the Laplacian of the Gaussian filter. 1 2

Zero crossing detection A zero crossing at a pixel implies that the values of the two opposing neighboring pixels in some direction have different signs. There four cases to test: up/down left/right up-left/down-right up-right/down-left

Gaussian Equations

Gaussian Plots

Gaussian Properties Symmetry matrix 95% of the total weight is contained within 2 of the center. In the first derivative of 1D Gaussian, extreme points are located at – and + . In the second derivative of 1D Gaussian, zero crossings are located at – and + . The LOG filter responds well to: small blobs coinciding with the center lobe. large step edges very close to the center lobe.

LOG Masks

LOG Example

Chapter 6. Color & Shading

Perception of objects

Perception of objects The spectrum (energy) of light source. The spectral reflectance of the object surface. The spectral sensitivity of the sensor.

Human eye is more sensitive to luminance than to chrominance How do we see an object? Light Eyes Object Luminance  Lightness Chrominance  Color Human eye is more sensitive to luminance than to chrominance

Light Spectrum

Chromaticity Diagram

RGB Model

RGB signals from a video camera

RGB Colors Colors specify: A mixture of red, green, and blue light Values between 0.0 (none) and 1.0 (lots) Color Red Green Blue White 1.0 1.0 1.0 Black 0.0 0.0 0.0 Yellow 1.0 1.0 0.0 Magenta 1.0 0.0 1.0 Cyan 1.0 1.0 0.0

Normalized RGB r+g+b=1

HSI Model

Light vs. Pigment

CMY Model

YIQ Model TV transmission  digital space  YCBCR  analog space  YIQ (NTSC)  YUV (PAL)

YUV & YCBCR Model

TV Broadcast

Color Histogram Color Histogram are relatively invariant to Translation Rotation Scaling Simple methods for color histogram construction Concatenate the higher order two bits of each RGB color code.  64 bins Compute three separate RGB histograms (4 bits each) and just concatenate them into one.  48 bins

Similarity measure for histogram matching It is common to smoothing the histogram before matching  to adapt minor shifts of the reflectance spectrum.

Color Segmentation A plot of pixels (r,g) taken from different images containing faces. (r,g) : normalized red and green values

Face Detection Face region classification. (R>G>B) Connected Component Labeling. Select the largest component as face object assuming there is only one face in the image. Discard remaining components or merges them with the face object. Computing the location of eyes and nose.

Shading

Three types of Material Reflection Diffuse  Color of reflected light from diffuse reflection (light scattered randomly) Ambient  Amount of background light the surface reflects Specular  Color of reflected light from specular reflection (light reflected in a regular manner)

Diffuse Reflection

Specular Reflection

Darkening with Distance

Complications The above models of illumination and reflection are simplified. Some objects reflect light as well as emit light. For example: light bulbs. In uncontrolled scenes, such as outdoor scenes, it is much more difficult to account for the different phenomena.

Chapter 7 Texture

Texture Analysis Structural approach Statistical approach

Structural approach

Shape from texture Size Shape Density

Statistical approach Edge density and direction Local binary partition Co-occurrence matrices and features Laws texture energy measures Autocorrelation

Edge density and direction Gradient magnitude : Mag(p) Gradient direction : Dir(p)

Histogram based texture description Magnitude histogram: Hmag(R) Direction histogram: Hdir(R)

Example

L1 Distance between two histograms Another similarity measure for histogram matching

Local binary partition For each pixel p in the image, the eight neighbors are examined to see if their intensity is greater than that of p. The results from the eight neighbors are used to construct an eight-digit binary number b1b2b3b4b5b6b7b8 bi=0 if the intensity of the ith neighbor is less than or equal to that of p bi=1 otherwise A histogram of these numbers is used to represent the texture of the image.

Practice

Co-occurrence matrices

Practice

Two variations of the co-occurrence matrix Normalized co-occurrence matrix Symmetric co-occurrence matrix

Features derived from co-occurrence matrix

Laws texture energy measures Remove the effects of illumination by moving a small window around the image, and subtracting the local average from each pixel. Nine different 5x5 masks are applied to the preprocessed image, producing 9 images. Smooth the images using absolute mean filter. The output is a single image with a vector of nine texture attributes at each pixel.

Laws nine 5x5 masks L5 (Level) = [1 4 6 4 1] E5 (Edge) = [-1 -2 0 2 1] S5 (Spot) = [-1 0 2 0 -1] R5 (Ripple) = [1 -4 6 -4 1] L5E5/E5L5 L5S5/S5L5 L5R5/R5L5 E5S5/S5E5 E5R5/R5E5 S5R5/R5S5 E5E5 S5S5 R5R5

Autocorrelation To describe the fineness/coarseness of the texture.

Texture Segmentation

Chapter 8 Content-Based Image Retrieval

Image Database Queries Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed according to these attributes, so that they can be rapidly retrieved when a query is issued. This type of query can be expressed in Structured Query Language (SQL). Query By Example (QBE): User just show the system a sample image, then the system should be able to return similar images or images containing similar objects.

Image Distance & Similarity Measures Color Similarity Texture Similarity Shape Similarity Object & Relationship similarity

Color Similarity Color percentages matching: R:20%, G:50%, B:30% Color histogram matching Dhist(I,Q)=(h(I)-h(Q))TA(h(I)-h(Q)) A is a similarity matrix  colors that are very similar should have similarity values close to one.

Color Similarity Color layout matching: compares each grid square of the query to the corresponding grid square of a potential matching image and combines the results into a single image distance  where CI(g) represents the color in grid square g of a database image I and CQ(g) represents the color in the corresponding grid square g of the query image Q. some suitable representations of color are Mean Mean and standard deviation Multi-bin histogram

IQ based on Color Layout

Texture Similarity Pick and click Suppose T(I) is a texture description vector which is a vector of numbers that summarizes the texture in a given image I (for example: Laws texture energy measures), then the texture distance measure is defined by Texture layout

IQ based on Pick and Click

Shape Similarity Shape Histogram Boundary Matching Sketch Matching

1. Shape Histogram Projection matching Horizontal & vertical projection: Each row and each column become a bin in the histogram. The count that is stored in a bin is the number of 1-pixels that appear in that row or column. Diagonal projection: An alternative is to define the bins from the top left to the bottom right of the shape. Size invariant  the number of row bins and the number of column bins in the bounding box can be fixed, histograms can be normalized before matching. Translation invariant Rotation invariant  compute the axis of the best-fitting ellipse and rotate the shape

Horizontal and vertical projections

Diagonal projection

1. Shape Histogram Orientation histogram Construct a histogram over the tangent angle at each pixel on the boundary of the shape. Size invariant  histograms can be normalized before matching. Translation invariant Rotation invariant  choosing the bin with the largest count to be the first bin. Starting point invariant

2. Boundary Matching 1D Fourier Transform on the boundary

Fourier Descriptors If only the first M coefficients (a0, a1, …, aM-1) are used, then is an approximation of un the coefficients (a0, a1, …, aM-1) is called Fourier Descriptors The Fourier distance measure is defined as:

Properties of Fourier Descriptors Simple geometric transformations of a boundary, such as translation, rotation, and scaling, are related to simple operations of the boundary’s Fourier descriptors.

A secret formula A formula for Fourier Descriptor that is invariant to translation, scaling, rotation, and starting point.

IQ based on Boundary Matching

3. Sketch Matching Affine transformation to specified size and applying median filter. Edge detection using a gradient-based edge-finding algorithm  Refined edge image Thinning and shrinking  Abstract image The images are divided into grid squares and matching is performed based on local correlation.

3. Sketch Matching The sketch distance measure is the inverse of the sum of each of the local correlations  where I(g) refres to grid square g of the abstract image I, Q(g) refers to grid square g of the linear sketch resulting from query image Q.

Object and Relational Similarity Face finding: Neural net classifier Flesh finding: then threshold based on

Chapter 9. Motion from 2D Image Sequence

Cases of motion Still camera, still object Still camera, moving object Moving camera, still object Moving camera, moving object Single object Multiple objects

Applications Many Safety and Security applications Automatically switch on a light upon detection of significant motion. Tracks the players and ball in a tennis match and provides an analysis of the elements of the game.

Uses of motion Create more observations than available from a single viewpoint. Provide information to compute relative depth of objects since the images of close objects change faster than the images of remote objects. Shape from Motion  the multiple viewpoints allow for a triangulating computation similar to binocular stereo.

Image Subtraction Subtract the image It from the previous image It-1 Subtract the image I from the background image B

Algorithm for Image Subtraction

Motion Vectors Motion field Focus of expansion (FOE) Focus of contraction (FOC) Image flow

Decathelete Game

Detecting interesting points

Search Rectangle

Algorithm for Motion Vectors

Computing the paths of moving points  Track by motion only, no texture or color information for moving objects  Assumptions The location of a physical object changes smoothly over time. The velocity of a physical object changes smoothly over time (including both speed and direction) An object can be at only one location in space at a given time. Two objects cannot occupy the same location at the same time.

Trajectories of two objects

Smoothness If an object i is observed at time instants t=1,2,…,n, then the sequence of image points Ti=(pi,1, pi,2, …, pi,n) is called the trajectory of i.

Greedy-Exchange Algorithm

Example of Greedy-Exchange Algorithm

Detecting significant changes in videos Scene change Shot change Camera pan Camera zoom Camera effects: fade, dissolve, and wipe  Segment and store video subsequences in digital libraries for random access.

Segmenting Video Sequences The transitions can be used to segment the video and can be detected by large changes in the features of the images over time.

Similarity measure by histogram

Similarity measure by likelihood ratio Break the image into larger blocks and test to see if a majority of the blocks are essentially the same in both images.

Moving cursor by tracking face

Demo

Chapter 10. Image Segmentation

10.1 Identifying Regions Clustering Region Growing

Clustering

Iterative K-Means Clustering

Isodata Clustering

Region Growing (Split & Merge Algorithm) Split the image into equally sized regions. Calculate the gray level variance for each region If the gray level variance is larger than a threshold, then split the region. Otherwise, an effort is made to merge the region with its neighbors. Repeat Step 2 & 3. Gray level variance :

Region Growing

Recursive Histogram Clustering

10.2 Representing Regions Overlays Labeled Images Boundary Coding Chain Code Polygonal Approximation Quadtrees Property Tables

Overlaying

Boundary Coding

Quadtrees

10.3 Identifying Contours Tracking Existing Region Boundaries (labeled image as input) Canny Edge Detector (gray-scale image as input) Aggregating Consistent Neighboring Edges into Curves (binary edge image as input) Hough Transform (gray-scale image as input)

Finding the Borders of Labeled Regions

Canney Edge Detector

Canny Edge Detector Example

Tracking Edges of a Binary Edge Image

Hough Transform

Accumulator array for Hough Transform

Hough Transform for Accumulating Straight Lines

Hough Transform Example

Hough Transform for Extracting Straight Lines

Chap 12. Perceiving 3D from 2D Images

Geometric model

Epipolar geometry (trivial)

Epipolar geometry (general)

Epipolar lines and Epipolar curves 圖十、Epipolar Constraint e2 e1 (a) (b)