Sampling, Template Matching and Pyramids

Sampling, Template Matching and Pyramids
T-11 Computer Vision University of Ioannina Christophoros Nikou Images and slides from: James Hayes, Brown University, Computer Vision course Svetlana Lazebnik, University of North Carolina at Chapel Hill, Computer Vision course D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2011. R. Gonzalez and R. Woods. Digital Image Processing, Prentice Hall, 2008.

Jean Baptiste Joseph Fourier (1768-1830)
...the manner in which the author arrives at these equations is not exempt of difficulties and...his analysis to integrate them still leaves something to be desired on the score of generality and even rigour. had crazy idea (1807): Any univariate function can be rewritten as a weighted sum of sines and cosines of different frequencies. Don’t believe it? Neither did Lagrange, Laplace, Poisson and other big wigs Not translated into English until 1878! But it’s (mostly) true! called Fourier Series there are some subtle restrictions Laplace Lagrange Legendre

A sum of sines Any function that periodically repeats itself can be expressed as a sum of sines and cosines of different frequencies each multiplied by a different coefficient Add many terms to approximate any signal

Other signals We can also think of all kinds of other signals the same way xkcd.com

The Fourier Transform Represent a function on a new basis
Think of functions as vectors, with many components We now apply a linear transformation to transform the basis dot product with each basis element In the Fourier transform, u and v select the basis element, so a function of x and y becomes a function of u and v For a fixed pair of frequencies (u,v) the basis elements have the form

The Fourier Transform The FT is a complex function having a magnitude and a phase for each (u,v) pair. The FT is linear. It “measures” the amount of sinusoids at spatial frequencies (u,v) carried by the image. It may be discretized to provide the DFT.

The Fourier Transform The real part of some basis elements (complex exponential). (u,v) = (0, 0.4) (u,v) = (1, 2) (u,v) = (10,-5)

The Fourier Transform Image FT magnitude FT phase

The Fourier Transform Phase of zebra - magnitude of tiger
Phase of tiger - magnitude of zebra

The Convolution Theorem
The Fourier transform of the convolution of two functions is the product of their Fourier transforms The inverse Fourier transform of the product of two Fourier transforms is the convolution of the two inverse Fourier transforms Convolution in spatial domain is equivalent to multiplication in frequency domain!

The Fourier Transform The value of the FT at a particular frequency pair (u,v) depends on the whole image. A local change in the image affects all the values of the FT. It is difficult to use it alone as an image representation locally. The magnitudes of the FT of images tend to be similar. Phase component seems to be different. The FT helps us to explain the difference between a continuous image and its discrete version.

Sampling Why does a lower resolution image still make sense to us? What do we lose? Image:

The procedure: subsampling by a factor of 2
Throw away every other row and column to create a 1/2 size image

Sampling Top left board sampling seem reasonable.
Top right also, although it is sparser. Bottom left will provide an all black (dubious) signal. Bottom right will provide checks that are too big.

Aliasing problem 1D example (sinewave): Source: S. Marschner

Sampling Aliasing Wagon wheels rolling the wrong way in movies. Checkerboards misrepresented. Striped shirts look funny on color television. The Nyquist theorem says that we should sample with at least twice the maximum frequency carried by the continuous signal. If this is not known remove some high frequenccies before sampling Loss of information but beter than aliasing

Sampling A common and interesting case is when we want to halve the width and height of an image (recursively). A Gaussian filter is generally applied to remove high frequencies and avoid aliasing. Remember that the FT of a Gaussian of standard deviation σ is also a Gaussian of standard deviation 1/σ. The selection of the filter standard deviation is important.

Sampling Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer.

Sampling Sampling without smoothing. Notice the aliasing at the coarse resolution levels. Image FT magnitude

Sampling Sampling with smoothing by a Gaussian with σ=1. Aliasing is reduced (along with some high frequencies). Image FT magnitude

Sampling Sampling with smoothing by a Gaussian with σ=1.4 reducing aliasing but removing more high frequency components than σ=1. Image FT magnitude

Subsampling without pre-filtering
1/2 1/4 (2x zoom) 1/8 (4x zoom) Slide by Steve Seitz

Subsampling with pre-filtering
Gaussian 1/2 G 1/4 G 1/8 Slide by Steve Seitz

Application: Hybrid Images
People may appear sad, up close, but step back a few meters and look at the expressions again. A. Oliva, A. Torralba and P. G. Schyns. Hybrid images. SIGGRAPH 2006.

A. Oliva, A. Torralba and P. G. Schyns. Hybrid images. SIGGRAPH 2006.

Salvador Dali invented Hybrid Images?
“Gala Contemplating the Mediterranean Sea, which at 30 meters becomes the portrait of Abraham Lincoln”, 1976

A. Oliva, A. Torralba and P. G. Schyns. Hybrid images. SIGGRAPH 2006.

Clues from Human Perception
Early processing in humans filters for various orientations and scales of frequency Perceptual cues in the mid-high frequencies dominate perception When we see an image from far away, we are effectively subsampling it Early Visual Processing: Multi-scale edge and blob filters

Campbell-Robson contrast sensitivity curve
Perceptual cues in the mid-high frequencies dominate perception

Filters as Templates Applying a filter at some point can be seen as taking a dot-product between the image and some vector. It has a strong response at locations where these vectors are parallel. Filtering the image is a set of dot products. Insight filters find effects they look like (they have a large positive response at these effects).

Filters as Templates Image and filter Positive responses
Zero-mean image (-max:max)

Template matching Goal: find in image
Main challenge: What is a good similarity or distance measure between two patches? Correlation Zero-mean correlation Sum of square differences Normalized cross correlation Slide: Hoiem

Matching with filters Goal: find in image
Method 0: filter the image with eye patch f = image g = filter What went wrong? The value may be large only because of locally high intensities. Problem: response is stronger for higher intensity Something should be changed Input Filtered Image Slide: Hoiem

Method 1: filter the image with zero-mean eye mean of f True detections False detections Likes bright pixels where filters are above average, dark pixels where filters are below average. Problems: response is sensitive to gain/contrast, pixels in filter that are near the mean have little effect, does not require pixel values in image to be near or proportional to values in filter. Input Filtered Image (scaled) Thresholded Image Slide: Hoiem

Matching with filters Goal: find in image Method 2: SSD
True detections Problem: SSD sensitive to average intensity Input 1- sqrt(SSD) Thresholded Image Slide: Hoiem

Matching with filters Can SSD be implemented with linear filters?

Matching with filters Goal: find in image Method 2: SSD
What’s the potential downside of SSD? Sensitive to local contrast changes Problem: SSD sensitive to average intensity Input 1- sqrt(SSD) Slide: Hoiem

Method 3: Normalized cross-correlation mean template mean image patch Invariant to mean and scale of intensity Matlab: normxcorr2(template, im) Slide: Hoiem

Method 3: Normalized cross-correlation True detections Input Thresholded Image Normalized X-Correlation Slide: Hoiem

Normalized Cross Correlation
Filters as dot products NCC is the the cosine of the angle between the tempalte and the image patch considered as vectors.

Application: Controlling the TV by Finding Hands
System responding to human gesture. The computer vision system needs to determine whether either a small set of events occur or nothing. An open hand turns the TV on. Robust system. distance from camera fairly constant. hand up and open and the hand size is known. normalized correlation is used. W. Freeman et al. Computer vision for interactive computer graphics. IEEE Computer Graphics and Applications, 1998.

Application: Controlling the TV by Finding Hands
Other operations are possible (volume control, etc.) W. Freeman et al. Computer vision for interactive computer graphics. IEEE Computer Graphics and Applications, 1998.

Scale and Image Pyramids
Images look different at different scales A zebra may be described in terms of individual hairs (small scale oriented filters) stripes (large scale oriented filters) A practical approach is to apply small filters to smoothed and resampled versions of the image. Image pyramid representation at different scales

The Gaussian Pyramid Each layer is smoothed by a symmetric Gaussian filter and resampled to get the next layer. The smallest image is the most heavily smoothed.

The Gaussian Pyramid Scale-space representation A bar is
a hair (large images) the whole nose (small images)

Template Matching using the Gaussian Pyramid
Input: Image, Template Match template at current scale Downsample image Repeat 1-2 until image is very small Take responses above some threshold, perhaps with non-maxima suppression Slide: Hoiem

Coarse-to-fine Image Registration
Compute Gaussian pyramid Align with coarse pyramid level Successively align with finer pyramids Search smaller range of parameters Why is this faster? A single pixel processing in a 4x4 image corresponds to 256 pixels in a 1024x1024 image. Do we to get the same result? Slide: Hoiem

Major applications of image pyramids
Compression Object detection Detection of stable interest points Image registration Visual tracking

Filters, convolution and template matching – crucial points
A filter is a pattern of weights Convolution applies the filter to the image Output measures similarity between filter and image patch only a rough estimate Normalized correlation gives improved pattern detection

Sampling, Template Matching and Pyramids

Similar presentations

Presentation on theme: "Sampling, Template Matching and Pyramids"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sampling, Template Matching and Pyramids

Similar presentations

Presentation on theme: "Sampling, Template Matching and Pyramids"— Presentation transcript:

Similar presentations

About project

Feedback