Representation, Description and Matching

Name: Representation, Description and Matching
Uploaded: 2017-07-15T16:47:54+00:00
Duration: PTM30S49
Description: Representation, Description and Matching

Representation, Description and Matching
CS485/685 Computer Vision Dr. George Bebis

Matching Given covariant region descriptors, what information should we use for matching corresponding regions in different images? ? 2

Simplest approach: correlation
Directly compare intensities using “sum of squared differences” or “normalized cross-correlation”

Simplest approach: correlation (cont’d)
Works satisfactorily when we matching corresponding regions related mostly by translation. e.g., stereo pairs, video sequence assuming small camera motion 4

Simplest approach: correlation (cont’d)
Sensitive to small variations with respect to: Location Pose Scale Intra-class variability Poorly distinctive! We will discuss a powerful descriptor called SIFT

Region Detection Steps: Review
Feature Extraction Extract affine regions Normalize regions Eliminate rotational ambiguity Compute appearance descriptors SIFT (Lowe ’04)

Scale Invariant Feature Transform (SIFT)
Remember how we resolved the orientation ambiguity? Eliminate rotational ambiguity Compute appearance descriptors Extract affine regions Normalize regions ? SIFT (Lowe ’04)

Find dominant gradient direction using the histograms of gradient direction. Dominant direction of gradient 2 p (36 bins) 8

Same theory, except that we use 16 histograms (8 bins each). 16 histograms x 8 orientations = 128 features Main idea: Take a 16 x16 window around detected interest point Divide into a 4x4 grid of cells Compute histogram in each cell

Properties of SIFT Highly distinctive! Scale and rotation invariant.
A single feature can be correctly matched with high probability against a large database of features from many images. Scale and rotation invariant. Partially invariant to 3D camera viewpoint Can tolerate up to about 60 degree out of plane rotation Can be computed fast and efficiently 10

Properties of SIFT (cont’d)
Partially invariant to changes in illumination 11

SIFT – Main Steps (1) Scale-space extrema detection
Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization Determine location and scale for each interest point. (3) Orientation assignment Assign one or more orientations to each keypoint. (4) Keypoint descriptor Use local image gradients at the selected scale. D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004. Cited 9589 times (as of 3/7/2011)

Scale-space Extrema Detection
y  Harris   LoG  Harris-Laplacian Find local maxima of: Harris detector in space LoG in scale SIFT Find local maxima of: DoG in space DoG in scale scale x y  Hessian   DoG  σn =knσ0 (k=2)

1. Scale-space Extrema Detection (cont’d)
DoG images are grouped by octaves -An octave corresponds to doubling the value of σ Fixed number of scales (i.e., levels) per octave 22σ0 Down-sample 2σ0 σ0

2σ0 (ks=2) Images separated by a constant factor k If each octave is divided in s intervals, we have s+1 DoG images/octave where: ks=2 or k=21/s Note: need (s+1)+2 = s+3 blurred images … k2σ0 kσ0 σ0 15

Choosing SIFT parameters
Experimentally using a matching task: 32 real images (outdoor, faces, aerial etc.) Images subjected to a wide range of transformations (i.e., rotation, scaling, shear, change in brightness, noise). Keypoints are detected in each image. Parameters are chosen based on keypoint repeatability, localization, and matching accuracy. 16

How many scales sampled per octave? # of keypoints increases but they are not stable! 3 scales

Smoothing is applied to the first level of each octave. How to choose σ? (i.e., integration scale) σ =1.6

2σ Pre-smoothing discards high frequencies. Double the size of the input image (i.e., using linear interpolation) prior to building the first level of the DoG pyramid. Increases the number of stable keypoints by a factor of 4. … k2σ kσ σ 19

Extract local extrema (i.e., minima or maxima) in DoG pyramid. Compare each point to its 8 neighbors at the same level, 9 neighbors in the level above, and 9 neighbors in the level below (i.e., 26 total).

2. Keypoint Localization
Determine the location and scale of keypoints to sub-pixel and sub-scale accuracy by fitting a 3D quadratic function at each keypoint. Substantial improvement to matching and stability! 21 21

Use Taylor expansion of D(x,y,σ) (i.e., DoG function) around the sample point : where is the offset from this point. 22

To find the extrema of D(ΔX): ΔX can be computed by solving a 3x3 linear system: use finite differences: 23 23

2. Keypoint Localization (cont’d)
Sub-pixel, sub-scale interpolated estimate: If in any dimension, repeat. 24

Reject keypoints having low contrast. i.e., sensitive to noise If reject keypoint i.e., assumes that image values have been normalized in [0,1]

Reject points lying on edges (or being close to edges) Harris uses the 2nd order moment matrix: R(AW) = det(AW) – α trace2(AW) or R(AW) = λ1 λ2- α (λ1+ λ2)2

SIFT uses the Hessian matrix for efficiency. i.e., encodes principal curvatures α: largest eigenvalue (λmax) β: smallest eigenvalue (λmin) (proportional to principal curvatures) (r = α/β) (SIFT uses r = 10)

(a) 233x189 image (b) 832 DoG extrema (c) 729 left after low contrast threshold (d) 536 left after testing ratio based on Hessian

3. Orientation Assignment
Create histogram of gradient directions, within a region around the keypoint, at selected scale (i.e., scale invariance): 2 p 36 bins (i.e., 10o per bin) Histogram entries are weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 1.5 times the scale of the keypoint.

3. Orientation Assignment (cont’d)
Assign canonical orientation at peak of smoothed histogram (fit parabola to better localize peak). In case of peaks within 80% of highest peak, multiple orientations assigned to keypoints. About 15% of keypoints has multiple orientations assigned. Significantly improves stability of matching. 2 p

3. Orientation Assignment (cont’d)
Stability of location, scale, and orientation (within 15 degrees) under noise.

4. Keypoint Descriptor Orientation histogram of gradient magnitudes
Have achieved invariance to location, scale, and orientation. Next, tolerate illumination and viewpoint changes. Orientation histogram of gradient magnitudes 8 bins

4. Keypoint Descriptor (cont’d)
Take a 16 x16 window around detected interest point. Divide into a 4x4 grid of cells. Compute histogram in each cell. (8 bins) 16 histograms x 8 orientations = 128 features

Each histogram entry is weighted by (i) gradient magnitude and (ii) a Gaussian function with σ equal to 0.5 times the width of the descriptor window.

Partial Voting: distribute histogram entries into adjacent bins (i.e., additional robustness to shifts) Each entry is added to all bins, multiplied by a weight of 1-d, where d is the distance from the bin it belongs. 35

Descriptor depends on two parameters: (1) number of orientations r (2) n x n array of orientation histograms rn2 features SIFT: r=8, n=4 128 features

Invariance to affine (linear) illumination changes: Normalization to unit length is sufficient. 128 features

Non-linear illumination changes : Saturation affects gradient magnitudes more than orientations Threshold gradient magnitudes to be no larger than 0.2 and renormalize to unit length (i.e., emphasizes gradient orientations than magnitudes) 128 features

Robustness to viewpoint changes
Match features after random change in image scale and orientation, with 2% image noise, and affine distortion. Find nearest neighbor in database of 30,000 features. Additional robustness can be achieved using affine invariant region detectors.

Distinctiveness Vary size of database of features, with 30 degree affine change, 2% image noise. Measure % correct for single nearest neighbor match.

Matching SIFT features
Given a feature in I1, how to find the best match in I2? Define distance function that compares two descriptors. Test all the features in I2, find the one with min distance. I2 I1 41

Matching SIFT features (cont’d)
How to define the distance between two features f1, f2? Simple approach: SSD(f1, f2) (i.e., sum of squared differences) Can give good scores to very ambiguous (bad) matches I1 I2 f1 f2 42

SSD(f1,f2) < t How to set t?

How to define the difference between two features f1, f2? Better approach: SSD(f1, f2) / SSD(f1, f2’) f2 is best SSD match to f1 in I2 f2’ is 2nd best SSD match to f1 in I2 I1 I2 f1 f2 f2' 44

Accept a match if SSD(f1, f2) / SSD(f1, f2’) < t t=0.8 has given good results in object recognition. Eliminated 90% of false matches. Discarded less than 5% of correct matches 45

How to evaluate the performance of a feature matcher? 50 75 200 46

Threshold t affects # of correct/false matches 50 75 200 false match true match True positives (TP) = # of detected matches that are correct False positives (FP) = # of detected matches that are incorrect 47

Matching SIFT features(cont’d)
0.7 1 FP rate TP rate 0.1 ROC Curve - Generated by computing (FP, TP) for different thresholds. - Need to maximize area under the curve (AUC) 1 48

Applications of SIFT Object recognition Object categorization
Location recognition Robot localization Image retrieval Image panoramas 49

Object Recognition Object Models

Object Categorization
51

Location recognition 52

Robot Localization 53

Map continuously built over time
54

Image retrieval – Example 1
CS4495/ 4/17/2017 Image retrieval – Example 1 … > 5000 images change in viewing angle Local Invariant Features 55 55

Matches 22 correct matches CS4495/7495 2004 4/17/2017
Local Invariant Features 56 56

Image retrieval – Example 2
CS4495/ 4/17/2017 Image retrieval – Example 2 … > 5000 images change in viewing angle + scale change Local Invariant Features 57 57

Matches 33 correct matches CS4495/7495 2004 4/17/2017
Local Invariant Features 58 58

Image panoramas from an unordered image set
CS4495/ 4/17/2017 Image panoramas from an unordered image set Local Invariant Features 59 59

PCA-SIFT Steps 1-3 are the same; Step 4 is modified.
Take a 41 x 41 patch at the given scale, centered at the keypoint, and normalized to a canonical direction. Yan Ke and Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”, Computer Vision and Pattern Recognition, 2004 60 60

PCA-SIFT Instead of using weighted histograms, concatenate the horizontal and vertical gradients into a long vector. Normalize vector to unit length. 2 x 39 x 39 = 3042 vector Yan Ke and Rahul Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors”, Computer Vision and Pattern Recognition, 2004 61 61

PCA-SIFT Reduce the dimensionality of the vector using Principal Component Analysis (PCA) e.g., from 3042 to 36 Proved to be less discriminatory than SIFT. PCA N x 1 K x 1 62 62

SURF: Speeded Up Robust Features
Fast implementation of a variation of the SIFT descriptor. Key idea: fast approximation of (i) Hessian matrix and (ii) descriptor using “integral images”. What is an “integral image”? Herbert Bay, Tinne Tuytelaars, and Luc Van Gool, “SURF: Speeded Up Robust Features”, European Computer Vision Conference (ECCV), 2006. 63 63

Integral Image The integral image IΣ(x,y) of an image I(x, y) represents the sum of all pixels in I(x,y) of a rectangular region formed by (0,0) and (x,y). . Using integral images, it takes only four array references to calculate the sum of pixels over a rectangular region of any size. 64 64

SURF: Speeded Up Robust Features (cont’d)
Approximate Lxx, Lyy, and Lxy using box filters. Can be computed very fast using integral images! (box filters shown are 9 x 9 – good approximations for a Gaussian with σ=1.2) derivative approximation derivative approximation 65 65

In SIFT, images are repeatedly smoothed with a Gaussian and subsequently sub-sampled in order to achieve a higher level of the pyramid. 66 66

Alternatively, we can use filters of larger size on the original image. Due to using integral images, filters of any size can be applied at exactly the same speed! (see paper for details on how the filter size and octaves are computed) 67 67

Approximation of H: Using DoG Using box filters 68 68

Instead of using a different measure for selecting the location and scale of interest points (like in SIFT(, SURF uses the determinant of to find both. Determinant elements must be weighted to obtain a good approximation of det(H): 69 69

Once interest points have been localized both in space and scale, the next steps are: (1) Orientation assignment (2) Keypoint descriptor 70 70

Orientation assignment Circular neighborhood of radius 6σ around the interest point (σ = the scale at which the point was detected) 600 angle x response y response Haar wavelets (responses weighted with Gaussian) Side length = 4σ Can be computed very fast using integral images! 71 71

Keypoint descriptor (square region of size 20σ) Sum the response over each sub-region for dx and dy separately. To bring in information about the polarity of the intensity changes, extract the sum of absolute value of the responses too. Feature vector size: 4 x 16 = 64 4 x 4 grid 72 72

The sum of dx and |dx| are computed separately for points where dy < 0 and dy >0 Similarly for the sum of dy and |dy| More discriminatory! 73 73

SURF: Speeded Up Robust Features
Has been reported to be 3 times faster than SIFT. Less robust to illumination and viewpoint changes compared to SIFT. K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp , 2005. 74 74

Gradient location-orientation histogram (GLOH)
Compute SIFT using a log-polar location grid: 3 bins in radial direction (i.e., radius 6, 11, and 15) 8 bins in angular direction Gradient orientation quantized in 16 bins. Total: (2x8+1)*16=272 bins PCA. K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp , 2005.

Shape Context A 3D histogram of edge point locations and orientations.
Edges are extracted by the Canny edge detector. Location is quantized into 9 bins (using a log-polar coordinate system). Orientation is quantized in 4 bins (i.e., (horizontal, vertical, and two diagonals). Total number of features: 4 x 9 = 36 K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp , 2005.

Spin image A histogram of quantized pixel locations and intensity values. A normalized histogram is computed for each of five rings centered on the region. The intensity of a normalized patch is quantized into 10 bins. Total number of features: 5 x 10 = 50 K. Mikolajczyk and C. Schmid,"A Performance Evaluation of Local Descriptors", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp , 2005.

Differential Invariants
“Local jets” of derivatives obtained by convolving the image with Gaussian derivates. Derivates are computed at different orientations by rotating the image patches. Example: some Gaussian derivatives up to fourth order invariants 78

Bank of Filters (e.g., Gabor filters) 79

Moment Invariants Moments are computed for derivatives of an image patch using: where p and q is the order, a is the degree, and Id is the image gradient in direction d. Derivatives are computed in x and y directions. 80 80

Bank of Filters: Steerable Filters
81

Representation, Description and Matching

Similar presentations

Presentation on theme: "Representation, Description and Matching"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Representation, Description and Matching

Similar presentations

Presentation on theme: "Representation, Description and Matching"— Presentation transcript:

Similar presentations

About project

Feedback