Download presentation
Presentation is loading. Please wait.
Published bySabina Leonard Modified over 9 years ago
1
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter 2014-2015)
2
Introduction The Scale-Invariant Feature Transform by David Lowe is useful in many applications of object recognition. Our objective in this presentation is to understand how to extract SIFT descriptors from an image
3
Introduction To extract SIFT keypoints, we use a cascaded filtering algorithm with the following four steps of filtering: –Scale-Space Extrema Detection –Keypoint Localization –Orientation Assignment –Keypoint Descriptor This algorithm is efficient as its more expensive operations are performed on a small subset of the initial image input.
4
Scale-Space Extrema Detection – Get the Points! In order to have scale-invariant features, we must have a way to extract features from an image across all scales. This can be done using a continuous function known as scale-space (Witkin, 1983) The only scale-space kernel is the Gaussian function. Lowe proposed to use Difference of Gaussians (DOG) in order to collect extrema as interest points.
5
Scale-Space Extrema Detection – Get the Points! Scale-space groups an image into an octave with S levels. The smoothing is done incrementally such that σ of the S + 1 image in the octave is twice that of the first image.
6
Scale-Space Extrema Detection – Get the Points!
7
DOG is used for its efficiency. Using the images to the right, we may now find the extrema for this octave.
8
Scale-Space Extrema Detection – Get the Points! If a point is greater or less than its 26 neighbors, it is regarded as an extreme point. This is a relatively inexpensive step as most points are not compared to every neighbor. Note that this comparison cannot be done on the boundaries of an image or on the top and bottom DOG.
9
Scale-Space Extrema Detection – Get the Points!... Each octave is processed separately. Each octave starts with σ twice the value of σ of the previous octave and continues to increase. 2σ σ As sample points are collected, they are stored as a three-vector p = (x, y, σ) [σ being scale in this case]
10
Refine the Points! If we were to stop after the first steps, we would have too many interest points to be effective. In this second step, we eliminate points of low contrast. [Ignoring localization of “real” SIFT here…] Can you see the truck??
11
Refine the Points! Only keep points where DOG > some threshold (e.g. 3% of maximum intensity in original image)
12
Refine the Points! By applying this to our previous image, with 8714 sample points… We reduce the number of sample points to 362
13
We may further refine the sample points by removing them from edges. First, we take the Hessian matrix computed at the location and scale of the keypoint. Further Refine the Points!
14
The eigenvalues of the matrix H are proportional to the principal curvatures of D. If a point is on an edge, its ratio of eigenvalues will be very high (recall Harris Corner Detector). Since we are only concerned with ratios we may set a threshold r, where α = rβ and Therefore, if the point is ignored.
15
Further Refine the Points! By applying this to our previous image, with 362 sample points… We reduce the number of sample points to 240
16
Orientation Assignment In order to be rotation invariant, each point must have a reference angle based on its neighbor points. We find the magnitude and angle of every pixel in the scale space by the following equations We are concerned with the points in the region of the keypoint.
17
The magnitudes are weighted according to a Gaussian function centered at the keypoint. Orientation Assignment
18
We then use the magnitudes to populate a histogram of 36 bins Orientation Assignment
19
A parabola is fit to the maximum value and the two values nearest to it. The maximum of this parabola gives us the angle θ. Furthermore, the point now has four components p = (x, y, σ, θ) Orientation Assignment
20
Keypoint Descriptor We now assign a descriptor to the sample point. The two above points represent sample points, with the red arrow being the points orientation assignment. By assigning a keypoint descriptor, we will know if these two are alike or not.
21
Keypoint Descriptor We again use gradients of neighboring pixels to determine the descriptor. The size of the region is a Gaussian window proportional to the scale of the keypoint.
22
Keypoint Descriptor We first must rotate the neighboring pixels vectors relative to the keypoint’s angle θ.
23
Notice that these two are (most likely) a match after this step is done to ensure rotation invariance! Keypoint Descriptor
24
We then group the vectors from step 3 into a 2 x 2 set with 8 bins each. However, experimentation has shown it is best to use a 4 x 4 set with 8 bins each for maximum effectiveness and efficiency. This is essentially a 128-feature vector. Keypoint Descriptor
25
By generalizing the gradient vectors in the neighboring pixels into 8 bins, this keypoint is resilient against different 3D perspectives. Keypoint Descriptor
26
In order to be resilient to differences in illumination, we normalize the entries of the feature vector. This makes the descriptor invariant to changes in contrast or brightness In order to be resilient to non-linear changes in illumination, such as camera saturation, we reduce the effect of large gradient vectors by setting a threshold in the feature vector such that no value is larger than 0.2. We then re- normalize. Keypoint Descriptor
27
Rotation Invariance
28
Scale Invariance
29
3D Perspective Resilience
30
Occlusion – with outliers
31
Occlusion
32
Tracking
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.