Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce.

Similar presentations


Presentation on theme: "Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce."— Presentation transcript:

1 Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce

2 CS223b 2 Advanced Features: Topics Template matching SIFT features Haar features

3 CS223b 3 Features for Object Detection/Recognition Want to find … in here

4 CS223b 4 Template Convolution Pick a template - rectangular/square region of an image Goal - find it in the same image/images of the same scene from Different viewpoint

5 CS223b 5 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 255; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

6 CS223b 6 Template Convolution

7 CS223b 7 Aside: Convolution Theorem Fourier Transform of g : F is invertible Convolution is a spatial domain is a multiplication in frequency domain - often more efficient when fast FFT available

8 CS223b 8 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256;; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

9 CS223b 9 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling Rotation Illumination Perspective Projection Feature Matching with templates

10 CS223b 10 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling - NO Rotation - NO Illumination - depends Perspective Projection - NO Feature Matching with templates

11 CS223b 11 Scale Invariance: Image Pyramid

12 CS223b 12 Aliasing Effects Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer Slide credit: Gary Bradski

13 CS223b 13 “Drop” vs “Smooth and Drop” Drop every second pixelSmooth and Drop every second pixel Aliasing problems

14 CS223b 14 Improved Invariance Handling Want to find … in here

15 CS223b 15 SIFT Features Invariances: Scaling Rotation Illumination Deformation Provides Good localization Yes Not really Yes

16 CS223b 16 SIFT Reference Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp. 91-110. SIFT = Scale Invariant Feature Transform

17 CS223b 17 Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

18 CS223b 18 Advantages of invariant local features Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

19 CS223b 19 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for many different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

20 CS223b 20 Finding “Keypoints” (Corners) Idea: Find Corners, but scale invariance Approach: Run linear filter (difference of Gaussians) Do this at different resolutions of image pyramid

21 CS223b 21 Difference of Gaussians Minus Equals Approximates Laplacian (see filtering lecture)

22 CS223b 22 Difference of Gaussians surf(fspecial('gaussian',40,4)) surf(fspecial('gaussian',40,8)) surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4)) im =imread('bridge.jpg'); bw = double(im(:,:,1)) / 256; for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow end

23 CS223b 23 Gaussian Kernel Size i=1

24 CS223b 24 Gaussian Kernel Size i=2

25 CS223b 25 Gaussian Kernel Size i=3

26 CS223b 26 Gaussian Kernel Size i=4

27 CS223b 27 Gaussian Kernel Size i=5

28 CS223b 28 Gaussian Kernel Size i=6

29 CS223b 29 Gaussian Kernel Size i=7

30 CS223b 30 Gaussian Kernel Size i=8

31 CS223b 31 Gaussian Kernel Size i=9

32 CS223b 32 Gaussian Kernel Size i=10

33 CS223b 33 Key point localization In D. Lowe’s paper image is decomposed to octaves (consecutively sub-sampled versions of the same image) Instead of convolving with large kernels within an octave kernels are kept the same Detect maxima and minima of difference- of-Gaussian in scale space Look for 3x3 neighbourhood in scale and space

34 CS223b 34 Example of keypoint detection (a) 233x189 image (b) 832 DOG extrema (c) 729 above threshold

35 CS223b 35 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

36 CS223b 36 Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (c) 729 left after peak value threshold (from 832) (d) 536 left after testing ratio of principle curvatures

37 CS223b 37 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

38 CS223b 38 Select canonical orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation)

39 CS223b 39 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

40 CS223b 40 SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions

41 CS223b 41 Nearest-neighbor matching to feature database Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm Use heap data structure to identify bins in order by their distance from query point Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

42 CS223b 42 3D Object Recognition Extract outlines with background subtraction

43 CS223b 43 3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate

44 CS223b 44 Recognition under occlusion

45 CS223b 45 Test of illumination invariance Same image under differing illumination 273 keys verified in final match

46 CS223b 46 Examples of view interpolation

47 CS223b 47 Location recognition

48 CS223b 48 SIFT Invariances: Scaling Rotation Illumination Perspective Projection Provides Good localization Yes Maybe Yes State-of-the-art in invariant feature matching! Alternative detectors/descriptors/references can be found at http://www.robots.ox.ac.uk/~vgg/software/

49 CS223b 49 SOFTWARE for Matlab (at UCLA)

50 CS223b 50 SIFT demos Run sift_compile sift_demo2

51 CS223b 51 Advanced Features: Topics SIFT Features Learning with Many Simple Features

52 CS223b 52 A totally different idea Use many very simple features Learn cascade of tests for target object Efficient if: features easy to compute cascade short

53 CS223b 53 Using Many Simple Features Viola Jones / Haar Features (Generalized) Haar Features: rectangular blocks, white or black 3 types of features: two rectangles: horizontal/vertical three rectangles four rectangles in 24x24 window: 180,000 possible features

54 CS223b 54 Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. We can calculate the integral image representation of the image in a single pass. (x,y) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) (0,0) x y Slide credit: Gyozo Gidofalvi

55 CS223b 55 Efficient Computation of Rectangle Value Using the integral image representation one can compute the value of any rectangular sum in constant time. Example: Rectangle D ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Slide credit: Gyozo Gidofalvi

56 CS223b 56 Idea 1: Linear Separator Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

57 CS223b 57 Linear Separator for Image features (highly related to Vapnik’s Support Vector Machines) Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

58 CS223b 58 Problem How to find hyperplane? How to avoid evaluating 180,000 features? Answer: Boosting [AdaBoost, Freund/Shapire] Finds small set of features that are “sufficient” Generalizes very well (a lot of max-margin theory) Requires positive and negative examples

59 CS223b 59 AdaBoost Idea (in Viola/Jones): Given set of “weak” classifiers: Pick best one Reweight training examples, so that misclassified images have larger weight Reiterate; then linearly combine resulting classifiers Weak classifiers: Haar features

60 CS223b 60 AdaBoost Idea (in Viola/Jones): We will dicuss the classification later Sneak preview of Adaboost and Results on face and car detection … to be continued when discussing object detection and recognition

61 CS223b 61 AdaBoost Weak Classifier 1 Weights Increased Weak classifier 3 Final classifier is linear combination of weak classifiers Weak Classifier 2 Freund & Shapire

62 CS223b 62 Adaboost Algorithm Freund & Shapire

63 CS223b 63 AdaBoost gives efficient classifier: Features = Weak Classifiers Each round selects the optimal feature given: Previous selected features Exponential Loss AdaBoost Surprise Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)

64 CS223b 64 Boosted Face Detection: Image Features “Rectangle filters” Unique Binary Features Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

65 CS223b 65 Example Classifier for Face Detection ROC curve for 200 feature classifier A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in 14084 false positives. Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

66 CS223b 66 Classifier are Efficient Given a nested set of classifier hypothesis classes vs falsenegdetermined by % False Pos % Detection 0 50 50 100 IMAGE SUB-WINDOW Classifier 1 F NON-FACE F FACE Classifier 3 T F NON-FACE T T T Classifier 2 F NON-FACE Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

67 CS223b 67 Cascaded Classifier 1 Feature 5 Features F 50% 20 Features 20%2% FACE NON-FACE F F IMAGE SUB-WINDOW A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) using data from previous stage. A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

68 CS223b 68 Output of Face Detector on Test Images Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

69 CS223b 69 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

70 CS223b 70 Face Localization Features Learned features reflect the task Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

71 CS223b 71 Face Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

72 CS223b 72 Face Profile Features

73 CS223b 73 Finding Cars (DARPA Urban Challenge) Hand-labeled images of generic car rear-ends Training time: ~5 hours, offline 1100 images Credit: Hendrik Dahlkamp

74 CS223b 74 Generating even more examples Generic classifier finds all cars in recorded video. Compute offline and store in database 28700 images Credit: Hendrik Dahlkamp

75 CS223b 75 Results - Video

76 CS223b 76 Summary Viola-Jones Many simple features Generalized Haar features (multi-rectangles) Easy and efficient to compute Discriminative Learning: finds a small subset for object recognition Uses AdaBoost Result: Feature Cascade 15fps on 700Mhz Laptop (=fast!) Applications Face detection Car detection Many others


Download ppt "Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce."

Similar presentations


Ads by Google