Feature Detection and Descriptors

Feature Detection and Descriptors
Charles Hatt Nisha Kiran Lulu Zhang

Overview Background SIFT / SIFT Extensions DAISY
Motivation Timeline and related work SIFT / SIFT Extensions PCA – SIFT GLOH DAISY Performance Evaluation

Scope We cover local descriptors Basic Procedure: Local vs Global:
Find patches or key points Compute a descriptor Match to other points Local vs Global: Robust to occlusion and clutter Stable under image transforms The scope of our presentation is various different types of local descriptors. In the past global descriptors have also been used, but local descriptors are favorable in many situations. This is because local descriptors are robust to occlusion and clutter, and stable under image transformations, illumination changes, etc.

Color Histogram: A Global Descriptor
Color histograms are an example of a global descriptor. They take every RGB value present in the image and build a histogram. The image on the right is a scrambled version of the pixels of the image on the left. These two will yield identical color histograms but are obviously not similar in the slightest.

Motivation The next few slides show some reasons, motivation, applications of why we want to do feature detection and description.

Object Recognition This is Honda’s Asimo Robot, using object recognition to extract features that are of interest, in order to make decisions such as whether to grasp the object. Other things Asimo is able to do include recognizing its charging station and plugging itself in.

Robot Self Localization
DARPA urban challenge, cars can recognize four way stops Darpa sponsors their grand challenge and urban challenge for driverless vehicles, in the urban challenge cars must recognize street signs and four way stops before they can successfully drive in urban traffic.

Image Retrieval Can we have online search engines that accept as input images instead of text? Say for instance we wanted to google ourselves with our picture instead of our name?

Image Retrieval You can try this at a website called TinEye.com, it will take any image and try to find images that are similar to it on the web. It seems to work with global descriptors, however if someday it could work with local descriptors it would be even more powerful, and we could upload a picture of something and find that object with wildly different poses and backgrounds.

Tracking The Kinect for Xbox 360 which was just released this month features gesture, facial, voice recognition. It is presumably capable of tracking six players by using feature extraction, and extracting 20 features, or joints per player.

Things We Did in Class Image stitching Image alignment
Project 1 images had to be aligned, and obviously we used SIFT for project 3.

Good Descriptors are Invariant to
The trickiest change among these is the viewpoint change where the position of the camera actually moves.

Timeline Cross correlation Canny Edge Detector 1986
Harris Corner Detector Moment Invariants SIFT Shape Context PCA-SIFT Spin Images GLOH Daisy In this class we have already studied the Canny Edge Detector and the Harris Corner Detector. We will now cover four classes of older techniques (in red) in feature detection to give you background knowledge of some approaches before covering the methods presented in the papers.

Cross Correlation Correlation is a relatively ancient technique that simply passes a kernel over an image. To simplify the problem we would read in the image and convert it to grayscale, sometimes normalizing it to the range [-1, 1] or [0, 1]. When the kernel overlaps the image, multiply all pixels in kernel with corresponding pixel intensity in image, sum, store.

Cross Correlation The peaks will be the places with highest correlation. A lot of the time preprocessing is required to get good results via correlation, such as morphological opening and closing.

Moment Invariants a = degree p + q = order
Id(x, y) = image gradient in direction d d = horizontal or vertical Invariant to convolution, blurring, affine transforms; can compute any order or degree Higher order sensitive to small photometric distortions This is the general equation for a moment, however in the paper that we present we limit it to 2nd degree, 2nd order, and only compute image gradients in horizontal and vertical directions. On board: describe 2x10 descriptor, emphasize small size: a=1or2, p+q = 2; d = h or v.

Spin Images (Johnson 97) Took me forever to figure out that the spin images referred to in the paper we are supposed to present are not exactly the ones presented by Johnson and Herbert in ‘97, but the adaptation to 2D images by Lazebnik in 2005, however it is inspired by Johnson and Herbert. Basically for each model in the library construct a set of spin images

Spin Images (Johnson 97) Each model is made into many small spin images, these spin images are used to look for matches in the scene.

Spin Images (Lazebnik 05)
Normalized patch implies invariant to intensity changes; invariant to rotation. Usually there are 10 bins for intensity, 5 bins for distance from center in the histogram. Descriptor is 50 elements.

Shape Context In this scheme points are selected on the edges of the patch (preprocessing required), for each point on the edge construct a histogram of the distance for point to every other point. If the Histograms are similar, match found. Note that you do not want this to be rotation invariant, else you could not differentiate 6 from 9 etc.

Scale Invariant Features

Characteristic of good features
Repeatability The same feature can be found in several images despite the geometric and photometric transformation Saliency Each feature has a distinctive description Compactness and efficiency Many fewer features than image pixels Locality Features occupy very small area of the image, robust to clutter and occlusion Now we look over some of the characteristics that our features should have to be considered as good features. The features must be repeatable ie given two different images with different geometric and photometric transformations on them, we should be able to identify the same features in both of these images. The features must also be salient i.e the description of features must be such that if we have say feature A we are clearly able to distinguish it from feature B based just on the descriptor that we have. The next point is compactness and efficiency basically means that we should have fewer features than the image pixels. Also since we are talking about the local descriptors so that they are robust to clutter and occlusion.

Good features - Corners in image
Harris corner detector Key idea: in the region around the corner, image gradient has two or more dominant directions Invariant to Rotation Partially to affine intensity change I = I + b (Invariant) – Only derivatives are used I = a*I (Not in this case) Not invariant to scale We have heard of corners a number of times. Saw the Harris corner detector to detect corners in an image. Went over all the maths to finding corners. Corners are good feature points because they are distinctive and repeatable. Now corners are invariant to rotation because of the fact that even if you rotate the image, the eigenvalues would be the same. It is partially invariant to affine intensity changes, as we see if we scale the intensity values we would get different eigenvalues for the corner and hence different response. It is not invariant to scale because if we see a corner at larger scale, we would classify all the points as edges on the surface.

Not invariant to scale All points will be classified as edges Corner !
Well if you have a scaled version of the same corner, you’ll end up classifying the areas under the small window that you see on the left as edges. Dah!! What do we do next? All points will be classified as edges Corner !

Scale Invariant Detection
Consider regions (e.g. circles) of different sizes around a point Regions of corresponding sizes will look the same in both images An easy solution to this problem would be to consider regions of different sizes around the point and the regions of corresponding sizes will look the same in both the images. But now we need some machanism to select the best scale for a given image of corner.

Scale invariant feature detection
Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region size that is covariant with the image transformation Well we saw that the corner detecting method is not invariant to scale. We want the features that we detect to be scale invariant that is if we have two images at different scales we want to identify the same feature points. Hence, if we have a leaf and an image containing the tree, we would like to identify the same leaf in both the images. So in order to ensure this we need a mechanism to select the scale finding the characteristic region size that is covariant with the image transformation. Ok now that we understand a bit of scale invariance if not completely let us try to work through the scale selection mechanism to find the characteristic region size in various images. But first lets overview some basic facts and terms to ensure you don’t get LOST!! :P

Recall: Edge detection
Convolution with derivative of Gaussian => Edge at maximum of derivative Convolution with second derivative of Gaussian => Edge at zero crossing

f Edge Derivative of Gaussian dg/dx Edge = maximum of derivative f*dg/dx

f Edge Second derivative of Gaussian (Laplacian) Edge = Zero crossing of second derivative

Scale selection Define the characteristic scale as the scale that produces peak of Laplacian response Now here is the mechanism that we would use to select the scale for a region. We would define the characteristic scale as the scale that produces the peak of the Laplacian response.

So here is an example of what we mean by the previous statement
So here is an example of what we mean by the previous statement. We have a signal which we convolve with Laplacian at various scales and then we set our characteristic scale as the one that produces the peak of Laplacian response. Ok, so far we know what is scale invariance and how to get the characteristic scale. Huh!! Looks like we are all set to look at the various stages of SIFT.

SIFT stages Scale space extrema detection Keypoint localization
Orientation assignment Keypoint descriptor So here are the stages of SIFT algorithm. The first stage basically constructs the scale space for the input image which basically means convolving the image with different sizes Laplacian kernels. And then it tries to detect the extrema in the constructed scale space using a neighborhood around a pixel. After we have the extrema, we have basically the set of potential keypoints. We then try to find the subpixel location of the keypoint and also get rid of low contrast and unstable keypoints. So after the two stages we have the location of the keypoint. We now in the third stage make the keypoint rotation invariant by assigning it an orientation. And finally after we have the location, scale and the orientation, we try to build the keypoint descriptor for the keypoint. Now let’s go over the stages one by one.

Scale space extrema detection
Approximate Laplacian of Gaussian with Difference of Gaussian Computationally less intensive Invariant to scale Images of the same size(vertical) form an octave. Each octave have certain level of blurred images. To construct the scale space, we don’t use the scale normalized Laplacian, instead we convolve the image with Gaussian kernels at different scales sigma and then take the difference between two consecutive convolved images. The reason we do this is that Difference of Gaussian is computationally less intensive and is scale invariant unlike Laplacian which needs to be multiplied by sigma squared to get true scale invariance. So in the equation here, L is the resulting image obtained by convolving with a Gaussian at a scale sigma, and D is the difference of the Gaussian images at scale ktimes sigma and sigma. Images at the same resolution form an octave. Each octave have certain level of blurred images. For SIFT implementation they use 5 blurred images in an octave and 4 octaves in total. Lets look at a slide showing this.

In this slide, we see that each octave has five blurred images, and to get the next octave we simply downsample the image at scale 2sigma in the previous octave and continue the procedure. The images on the right are the difference of Gaussian between two successvie Gaussian. Now that we have approximated the Laplacian of Gaussian, we would like to detect the potential keypoints and the scale. To do this we need to find the peak of the response if you remember!!

Maxima/Minima selection in DoG
SIFT: Find the local maxima of difference of Gaussian in space and scale So to do that at a scale sigma, we consider a neighborhood of 3*3*3 and if the current pixel marked X is larger than all of those then we mark it as a potential keypoint. The result of this stage is that it detects many keypoints and some of these keypoints are not good in the sense that they could either be points on an edge or have low contrast. Also the keypoints that we detect at this stage may not be the actual location of the keypoints as the actual keypoints could lie in a subpixel location.

Keypoint localization
Lot of keypoints detected Sub pixel localization: Accurate location of keypoints Eliminating points with low contrast Eliminating edge responses So basically what we want to do is to find the exact subpixel location of the keypoint and also get rid of points with low contrast and those with edge responses.

Sub pixel localization
Hmm , so this slide basically shows the idea behind finding the subpixel location of the keypoint. We approximate the difference of Gaussian around the keypoint location that we just calculated and then we find the mimimum for this function and get the actual location of the keypoint. So now we have the location and the scale of the keypoints. The next stage is to just keep the stable keypoints.

Eliminating extra keypoints
If the magnitude of intensity at the current pixel in the DoG image (that is being checked for maxima/minima) is less than a certain value, it is rejected. Removing edges – Idea similar to Harris corner detector So following two are the ideas behind eliminating the extra keypoints. Read the slide.

Until now, we have seen scale invariance
Now, let’s make the keypoint rotation invariant So now we have a set of good keypoints. Now we want to assign each keypoint with a dominant orientation. This would ensure rotational invariance.

Orientation assignment
Key idea: Collect gradient directions and magnitudes around each keypoint. Then figure out the most prominent orientations in that region. Assign these orientations to the keypoint Size of the orientation collection region depends on the scale. Bigger the scale, bigger the collection region. The key idea behind orientation assignment is to collect gradient directions and magnitudes around each keypoint. And then we try to figure our the most prominent orientations in that region. Assign that orientation to the keypoint. Now we need to decide upon the size of the orientation collection region. It basically depends on the scale. Bigger the scale, bigger is the collection region.

Peak of the histogram taken as the keypoint orientation
Compute gradient magnitude and orientations for each pixel and then construct a histogram Peak of the histogram taken as the keypoint orientation So here we take a region around the keypoint. I think we take a region with the radius 1.5 times sigma and then compute the gradient magnitudes and orientations for each pixel. After doing that we divide 360 degrees into ten bins and construct a histogram. So if there is a pixel with gradient orientation 45 degrees and magnitude 5 units we put add 5 to the bin of degrees. From this histogram we find the peak of the histogram and then take that as the keypoint orientation.

Keypoint descriptor So now what do we have? We have the keypoint location, scale and the orientation. Now we would like to have a descriptor for this keypoint. We would want the descriptor to be distinctive. The way it is done is SIFT is that they a 16*16 patch around the keypoint and divide this patch into 4*4 subregions. This patch is oriented in the direction of dominant gradient at that keypoint. For each of these subregions, we divide it into 8 bins and construct the histogram for all the subregions. So in the end we have 4*4*8 element vector which is 128 element vector.

Based on 16*16 patches 4*4 subregions 8 bins in each subregion 4*4*8=128 dimensions in total

PCA-SIFT PCA-SIFT is a modification of SIFT, which changes how the keypoint descriptors are constructed Basic Idea: Use PCA(Principal Component Analysis) to represent the gradient patch around the keypoint PCA stages Computing projection matrix Constructing PCA-SIFT descriptor Now we have seen all the stages of SIFT algorithm. Lets look at PCA-SIFT now. PCA-SIFT is a modification of SIFT which just changes the way keypoint descriptors are constructed. It doesn’t mess up with the keypoint detection stage of SIFT algorithm. The basic idea is to use Principal Component Analysis to represent the gradient patch around the keypoint. PCA consists of the following two stages: Computing the projection matrix and the construction of the PCA-SIFT descriptor.

Computing projection matrix
Select a representative set of pictures and detect all keypoints in these pictures For each keypoint Extract an image patch around it with size 41*41 pixels Calculate horizontal and vertical gradients, resulting in a vector of size 39*39*2 = 3042 Put all these vectors into a k*3042 matrix A where k is the number of keypoints detected Calculate the covariance matrix of A To construct the projection matrix, we select a representative set of pictures and detect all the keypoints in these pictures. For each keypoint that we find in these images, we extract patch of 41*41 around the pixel oriented in the direction of the gradient of the keypoint. Then we calculate the horizontal and vertical gradients at all the pixels in the patch except the top and bottom row and first and last column. So we have a set of 39*39*2 gradients. Put all these vectors into a k*3042 matrix A where k is the number of keypoints detected. Now we find the covariance of the matrix A ...

Contd.. Compute the eigenvectors and the eigenvalues of cov A.
Select the first n eigenvectors; the projection matrix is a n*3042 matrix composed of these eigenvectors The projection matrix is only computed once and saved. And compute the eigenvectors and the eigenvalues of the covariance matrix A. Select the n topmost eigenvectors and then we get a n*3042 matrix composed of these eigenvectors. These form the representative set of keypoints for the set of all keypoints. The projection matrix is only computed once and saved.

Dimension reduction through PCA
Ok now we will look at why PCA gives good dimensionality reduction. The reason is that the keypoints are a highly restricted set of patched that passed the first three stages of SIFT. From the graph, we see that for the keypoint gradients that passed the first stages the dimensionality reduction is large as compared to the random gradients. The image patches do not span the entire space of pixel values, and also not the Smaller space of patches from natural images. They consist of highly restricted set of patches that passed the first three stages of SIFT.

Constructing PCA-SIFT descriptor
Input: location of keypoint, scale, orientation. Extract 41*41 patch around the keypoint at the given scale, rotated to its orientation Calculate 39*39 horizontal and vertical gradients, resulting in a vector of size 3042 Multiply this vector using the precomputed n*3042 projection matrix This results in a PCA-SIFT descriptor of size n Now let us look at the second stage of PCA that is actually the construction of the PCA-SIFT descriptor. So the idea is first to get the 39*39*2 vector for the keypoint and then multiplying this vector using the precomputed projection matrix and this basically gives the PCA-SIFT descriptor of size n.

Now let us compare the performance of SIFT with PCA-SIFT
Now let us compare the performance of SIFT with PCA-SIFT. So in the paper they have tested it on images by adding noise, rotation and scale, projective warp and illumination changes and they have found that PCA-SIFT does better than the standard SIFT. Here they use Recall vs. 1-precision to do the comparison.

Eigenspace construction
This graph basically shows that PCA-SIFT’s performance READ THE SLIDE.

Effect of PCA dimension
Now let us have a look at how the PCA-SIFT performance varies as we change the PCA-dimension which is n in our case. Well as can be seen from the graph that not much of an improvement is gained in performance as we increase the PCA-dimension n. So the hypothesis that they give in the paper is that the first several components of the PCA subspace are sufficient for encoding the variations caused by the keypoint identity, while the later components represent details that are not useful like the distortion due to projective warp. Hypothesis:First several components of the PCA subspace are sufficient for encoding variations caused by keypoint identity, while the later components represent details that are not useful, of potentially detrimental, such as distortion from projective warp

Gradient Location Orientation Histogram
Another SIFT – Extension Gradients quantized into 16 bins Log Polar location grid 3 bins for radius: 6, 11, 15 8 bins for direction: 0, π/4, π/2, … 7π/4 This method differs from SIFT in sampling method, and is of course robust to rotation.

GLOH 17 location bins, 16 gradient bins per location bin
272 elements -> down to 128 with PCA

GLOH Results 192 correct, 208 false positive… Not as bad as it sounds
Yellow ovals are correctly matched features, blue are incorrectly matched features, using nearest neighbor matches. Now for something completely different from SIFT: DAISY 192 correct, 208 false positive… Not as bad as it sounds

DAISY An efficient dense local descriptor Similar to SIFT and GLOH
Descriptor is fundamentally based on pixel gradient histograms Has key differences that make it much faster for dense matching. Original application: Wide-baseline stereo Other applications: Face Recognition

SIFT GLOH* + Good Performance + Good Performance + Better Localization
Not suitable for dense computation However, we can make it more suitable by changing its weighing kernel to a Gaussian and use circular regions which are actually realized by the Gaussians naturally. Additionally, we also know that GLOH has a better record in localization but it also is not suitable for dense computation (because of its grid shape and weighting again) + Good Performance Not suitable for dense computation * K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local Descriptors. PAMI’04. 56

DAISY + Suitable for dense computation + Improved performance:*
We propose to merge these two ideas where we have circular regions defined by Gaussians with increasing variance as you go away from the point center. Actually, in a recent publication, a grid similar to our DAISY has been shown to outperform SIFT for sparse feature matching also. But our motivation in using a grid like this also stems from its computational advantages. + Suitable for dense computation + Improved performance:* + Precise localization + Rotational Robustness 57

DAISY Parameters:

Computation Steps Compute H (number of histogram bins) orientation maps, G_i (0 < i < H), one for each gradient orientation G_o(u,v) = Gradient norm in direction o, at pixel u,v. If gradient norm is < 0, G_o(u,v) = 0

Computation Steps Each orientation map is then repeatedly convolved with a Gaussian kernel to obtain “convolved” orientation maps.

Computation Steps There are a total of Q (the number of ‘rings’ in the DAISY) levels of convolution.

Computation Steps Each pixel now has Q vectors, each H long, of the form: Each of these vectors in normalized to unit norm, which helps preserve viewpoint invariance.

Full Descriptor Total Size Q*T*H + H In this case, 200

Computational Complexity
DAISY SIFT Configuration: H=8, T=8, Q=3 S=25 122 Multiplications/pixel 119 Summations/pixel 25 Sampling/pixel Configuration: 16 4x4 arrays, 8 bins. 1280 Multiplications/pixel 512 Summations/pixel 256 Sampling/pixel

DAISY vs SIFT Computation Time: Reasons:
DAISY descriptors share histograms. Computation pipeline enables efficient memory access pattern and histogram layers are separated early Easily parallelized

Performance with parallel cores
Computation time falls almost linearly.

Choosing the Best DAISY
Winder et al. Tested a wide variety of gradient and steerable filter based configurations for calculating image gradients at each pixel and found the best parameters for each. Found best configurations for different applications.

Real-time applications: DAISY configuration: 1 or 2 rings 4 bins Rectification of image gradients to length one, no use of PCA and quantization of histogram values to a bit depth of 2-3.

Applications requiring good discrimination: 2nd order steerable filters at two spatial scales Application of PCA Large-database applications (low storage requirements and computational burden) Steerable filters with H=4 histogram bins, Q=2 rings, T= 8 segments Rectified gradients with 4 histogram bins, Q=1 ring and T= 8 segments

Reported Applications of DAISY
Wide-baseline stereo Face recognition

Depth Map Estimation DAISY descriptors used to measure similarities across images Graph cut based reconstruction algorithm used to generate maps. Occlusion masks are used to properly deal with occlusions

Occlusion maps

Depth Map Accuracy Ground truth – Laser scan

Depth Map Results

Face Recognition Dense descriptor computation is necessary for recognizing faces due to wide baseline nature of facial images. DAISY descriptors calculated and matched using recursive grid search Matches distances are vectorized and input to a Support Vector Machine (SVM)

Recognition Rate compared to previous, similar methods
Olivetti Research Lab Database FERET Database FERET Fafb – Varying facial expressions FERET Fafb – Varying illumination

Local Descriptor Matching
Methods for matching descriptor vectors Exhaustive Search Recursive Grid Search KD trees

Recursive Grid Search Finds the local descriptor for each section of the template image in a grid (DT). Find the local descriptor for the corresponding section in the query image (DQ). Distance is computed between DT and DQ, as well as the descriptors of DQ’s neighbors at a distance d. Point showing minimum distance (DT2) is consider for further analysis. Descriptors Neighbors for DT2, at a distance, d/2, are calculated…

Recursive Grid Search

KD Trees Search for nearest neighbor of an n-dimensional point.
Guaranteed to be log2 (n) depth Has been shown to run in O(log n) average time. Pre-processing time is O(n log n)

KD Trees

Performance Evaluation
Mikolajczyk Schmid 2005 Detecting Normalizing Describing Matching Graphing Before we can present the results from Mikolajczyk Schmid we must establish the procedure they used to test all the feature descriptors, from the detection of regions to be described, normalizing all the patches, actually describing them, verifying matches and finally graphing results.

Detecting 10 descriptors will be tested, but first what will they be tested on Harris points Harris Laplace Hessian Laplace Harris Affine Hessian Affine They used five different detectors to find regions in the first place.

Normalizing With respect to size: 41 pixels
Orientation: Dominant gradient Illumination: normalize standard deviation and mean of pixel intensities 41 pixels is an arbitrary size they selected, perhaps partially because Harris points have a fixed support region of 41 pixels. Dominant gradients are calculated with a histogram method, make the histogram and the highest peak will be taken as dominant. Illumination is easy to normalize, a simple example is the Matlab histeq() functionality which will basically recenter and spread the histogram around the center of the range of intensity values.

Demonstration of Matlab equalizing a histogram
Demonstration of Matlab equalizing a histogram. After we normalize all our patches, we will apply every single way of describing them to get all our descriptor vectors.

Matching For histogram based methods, Euclidean distance
For non-histogram based methods, Mahalanobis distance (S = covariance matrix) After distance is calculated, two regions match if D < threshold Nearest neighbor threshold Finally we can use the following equations to determine whether or not one patch is a match for another. The equations differ depending on whether we are using a histogram based method such as SIFT, or not, such as moment invariants. After we calculate the distance between all the features we are comparing, we can set a threshold, and if the distance is below the threshold, we have a match. A more stringent matching scheme requires that the candidate both be the nearest neighbor of the source vector in addition to having a distance below the threshold.

Data Set Original image is subjected to …
Rotations: deg around optical axis Scale: camera zoom 2 – 2.5x Blur: defocusing Viewpoint: frontal to foreshortened Light: aperture varied Compression: JPEG at 5% quality Here are the specifics on how they distorted the original images.

Evaluation Criteria For each patch, compute distance; does d<t?
Compare to ground truth Count number of correct and false matches Recall vs 1-Precision graphs To build curves, change t and repeat. Now you can use recall and 1-precision to build graphs. For every type of descriptor, for every patch, compute the distance from the formulas. Compare to the ground truth, and save the number of correct and false matches. When you vary t and repeat for multiple values of t, you now have a set of ordered pairs for recall and 1-precision that you can use to build graphs

Notes If recall = 1 for any precision, we have a perfect descriptor
Slowly increasing curve => descriptor is affected by the type of noise or transformation we applied to it Generally, if the curve for one type of descriptor is higher than the other, it is more robust to that type of transformation These are some general guide lines for how to interpret recall vs 1-precision graphs. Perfect recall means a perfect descriptor, a slowly increasing curve implies that it is sensitive to the distortion that we are subjecting the images to, a higher curve is a higher recall which is better, for that particular type of transformation.

Hessian-Affine detector on Structured Scene
These are the results of all the descriptors when applied to features extracted by Hessian Affine detector, under threshold based matching and nearest neighbor matching; on the data subjected to a viewpoint change of fifty degrees with a STRUCTURED SCENE, meaning a scene with homogeneous regions and distinctive edge boundaries; this is the most challenging distortion. Another thing to note is that if you want fewer false positives, you should use nearest neighbor matching, but this is of course more costly if you are planning to apply it to a large dataset. Hessian-Affine detector on Structured Scene

These two histograms not only compare the effectiveness of histogram based methods vs non hist based methods, but now that they have established the superiority of their proposed GLOH descriptor over PCA-SIFT they want to compare the performance using different size n in the principle components analysis reduction part. If you recall the PCA-SIFT paper presented by Nisha earlier also varied the size of n, and determined that 36 was best, this graph shows that GLOH trumps PCA for any size n. Cross correlation is put in there to emphasize just how great they are.

On the left is the same graph as two slides ago, results for a structured scene, on the right are the results for a textured scene. GLOH is better for structured, SIFT is better for textured. Recall is generally better for textured scenes, but you detect more regions of interest in a structured scene.

Compared to viewpoint changes, recognizing features that have undergone scale and rotation changes is much easier, therefore the recall is uniformly higher for these graphs. Surprisingly in this extensive survey PCA-SIFT does not fare very well at all, and we see that SIFT’s performance is still better for textured scenes.

Hessian Laplace Regions
These are the results for a blurred scene, again demonstrating SIFT’s dominance in textured scenes. Recall is uniformly low for all descriptors, none are robust to a blurred textured scene. Blur especially hurts the performance of shape context because edges are no longer dependable. Hessian Laplace Regions

PCA-SIFT is best for regions where 1-precision is very low, ie precision is very high.

Conclusion Feature detection and descriptors through the ages
Hopefully we’ve introduced this rich topic to you, and showed you how it’s been done, how it’s being done, and successfully introduced why it’s important.

Feature Detection and Descriptors

Similar presentations

Presentation on theme: "Feature Detection and Descriptors"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Feature Detection and Descriptors

Similar presentations

Presentation on theme: "Feature Detection and Descriptors"— Presentation transcript:

Similar presentations

About project

Feedback