Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce.

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints

Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Instructor: Mircea Nicolescu Lecture 15 CS 485 / 685 Computer Vision.

AdaBoost & Its Applications

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.

The Viola/Jones Face Detector (2001)

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.

Stanford CS223B Computer Vision, Winter Lecture 4 Advanced Features

Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2006 Lecture 3 More On Features Professor Sebastian Thrun.

Stanford CS223B Computer Vision, Winter 2007 Lecture 5 Advanced Image Filters Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Distinctive Image Feature from Scale-Invariant KeyPoints

Feature extraction: Corners and blobs

Learning and Vision: Discriminative Models

Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.

Scale Invariant Feature Transform (SIFT)

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.

Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.

Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Interest Point Descriptors

Computer vision.

Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.

Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.

CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.

Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.

Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.

HCI/ComS 575X: Computational Perception Instructor: Alexander Stoytchev

Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Presented by David Lee 3/20/2006

Face detection Many slides adapted from P. Viola.

Blob detection.

SIFT Scale-Invariant Feature Transform David Lowe

Cascade for Fast Detection

Presented by David Lee 3/20/2006

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Distinctive Image Features from Scale-Invariant Keypoints

Scale Invariant Feature Transform (SIFT)

Nearest-neighbor matching to feature database

Lit part of blue dress and shadowed part of white dress are the same color

Feature description and matching

Nearest-neighbor matching to feature database

CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Interest Points & Descriptors 3 - SIFT

Feature descriptors and matching

Presented by Xu Miao April 20, 2005

Presentation transcript:

Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce

CS223b 2 Advanced Features: Topics Template matching SIFT features Haar features

CS223b 3 Features for Object Detection/Recognition Want to find … in here

CS223b 4 Template Convolution Pick a template - rectangular/square region of an image Goal - find it in the same image/images of the same scene from Different viewpoint

CS223b 5 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 255; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

CS223b 6 Template Convolution

CS223b 7 Aside: Convolution Theorem Fourier Transform of g : F is invertible Convolution is a spatial domain is a multiplication in frequency domain - often more efficient when fast FFT available

CS223b 8 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256;; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

CS223b 9 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling Rotation Illumination Perspective Projection Feature Matching with templates

CS223b 10 Given a template - find the region in the image with the highest matching score Matching score - result of convolution is maximal (or use SSD, SAD, NSS similarity measures) Given rotated, scaled, perspectively distorted version of the image Can we find the same patch (we want invariance!) Scaling - NO Rotation - NO Illumination - depends Perspective Projection - NO Feature Matching with templates

CS223b 11 Scale Invariance: Image Pyramid

CS223b 12 Aliasing Effects Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer Slide credit: Gary Bradski

CS223b 13 “Drop” vs “Smooth and Drop” Drop every second pixelSmooth and Drop every second pixel Aliasing problems

CS223b 14 Improved Invariance Handling Want to find … in here

CS223b 15 SIFT Features Invariances: Scaling Rotation Illumination Deformation Provides Good localization Yes Not really Yes

CS223b 16 SIFT Reference Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp SIFT = Scale Invariant Feature Transform

CS223b 17 Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

CS223b 18 Advantages of invariant local features Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects Quantity: many features can be generated for even small objects Efficiency: close to real-time performance Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

CS223b 19 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for many different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

CS223b 20 Finding “Keypoints” (Corners) Idea: Find Corners, but scale invariance Approach: Run linear filter (difference of Gaussians) Do this at different resolutions of image pyramid

CS223b 21 Difference of Gaussians Minus Equals Approximates Laplacian (see filtering lecture)

CS223b 22 Difference of Gaussians surf(fspecial('gaussian',40,4)) surf(fspecial('gaussian',40,8)) surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4)) im =imread('bridge.jpg'); bw = double(im(:,:,1)) / 256; for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow end

CS223b 23 Gaussian Kernel Size i=1

CS223b 24 Gaussian Kernel Size i=2

CS223b 25 Gaussian Kernel Size i=3

CS223b 26 Gaussian Kernel Size i=4

CS223b 27 Gaussian Kernel Size i=5

CS223b 28 Gaussian Kernel Size i=6

CS223b 29 Gaussian Kernel Size i=7

CS223b 30 Gaussian Kernel Size i=8

CS223b 31 Gaussian Kernel Size i=9

CS223b 32 Gaussian Kernel Size i=10

CS223b 33 Key point localization In D. Lowe’s paper image is decomposed to octaves (consecutively sub-sampled versions of the same image) Instead of convolving with large kernels within an octave kernels are kept the same Detect maxima and minima of difference- of-Gaussian in scale space Look for 3x3 neighbourhood in scale and space

CS223b 34 Example of keypoint detection (a) 233x189 image (b) 832 DOG extrema (c) 729 above threshold

CS223b 35 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

CS223b 36 Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (c) 729 left after peak value threshold (from 832) (d) 536 left after testing ratio of principle curvatures

CS223b 37 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

CS223b 38 Select canonical orientation Create histogram of local gradient directions computed at selected scale Assign canonical orientation at peak of smoothed histogram Each key specifies stable 2D coordinates (x, y, scale, orientation)

CS223b 39 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

CS223b 40 SIFT vector formation Thresholded image gradients are sampled over 16x16 array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions

CS223b 41 Nearest-neighbor matching to feature database Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm Use heap data structure to identify bins in order by their distance from query point Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

CS223b 42 3D Object Recognition Extract outlines with background subtraction

CS223b 43 3D Object Recognition Only 3 keys are needed for recognition, so extra keys provide robustness Affine model is no longer as accurate

CS223b 44 Recognition under occlusion

CS223b 45 Test of illumination invariance Same image under differing illumination 273 keys verified in final match

CS223b 46 Examples of view interpolation

CS223b 47 Location recognition

CS223b 48 SIFT Invariances: Scaling Rotation Illumination Perspective Projection Provides Good localization Yes Maybe Yes State-of-the-art in invariant feature matching! Alternative detectors/descriptors/references can be found at

CS223b 49 SOFTWARE for Matlab (at UCLA)

CS223b 50 SIFT demos Run sift_compile sift_demo2

CS223b 51 Advanced Features: Topics SIFT Features Learning with Many Simple Features

CS223b 52 A totally different idea Use many very simple features Learn cascade of tests for target object Efficient if: features easy to compute cascade short

CS223b 53 Using Many Simple Features Viola Jones / Haar Features (Generalized) Haar Features: rectangular blocks, white or black 3 types of features: two rectangles: horizontal/vertical three rectangles four rectangles in 24x24 window: 180,000 possible features

CS223b 54 Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. We can calculate the integral image representation of the image in a single pass. (x,y) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) (0,0) x y Slide credit: Gyozo Gidofalvi

CS223b 55 Efficient Computation of Rectangle Value Using the integral image representation one can compute the value of any rectangular sum in constant time. Example: Rectangle D ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Slide credit: Gyozo Gidofalvi

CS223b 56 Idea 1: Linear Separator Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

CS223b 57 Linear Separator for Image features (highly related to Vapnik’s Support Vector Machines) Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

CS223b 58 Problem How to find hyperplane? How to avoid evaluating 180,000 features? Answer: Boosting [AdaBoost, Freund/Shapire] Finds small set of features that are “sufficient” Generalizes very well (a lot of max-margin theory) Requires positive and negative examples

CS223b 59 AdaBoost Idea (in Viola/Jones): Given set of “weak” classifiers: Pick best one Reweight training examples, so that misclassified images have larger weight Reiterate; then linearly combine resulting classifiers Weak classifiers: Haar features

CS223b 60 AdaBoost Idea (in Viola/Jones): We will dicuss the classification later Sneak preview of Adaboost and Results on face and car detection … to be continued when discussing object detection and recognition

CS223b 61 AdaBoost Weak Classifier 1 Weights Increased Weak classifier 3 Final classifier is linear combination of weak classifiers Weak Classifier 2 Freund & Shapire

CS223b 62 Adaboost Algorithm Freund & Shapire

CS223b 63 AdaBoost gives efficient classifier: Features = Weak Classifiers Each round selects the optimal feature given: Previous selected features Exponential Loss AdaBoost Surprise Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)

CS223b 64 Boosted Face Detection: Image Features “Rectangle filters” Unique Binary Features Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

CS223b 65 Example Classifier for Face Detection ROC curve for 200 feature classifier A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in false positives. Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

CS223b 66 Classifier are Efficient Given a nested set of classifier hypothesis classes vs falsenegdetermined by % False Pos % Detection IMAGE SUB-WINDOW Classifier 1 F NON-FACE F FACE Classifier 3 T F NON-FACE T T T Classifier 2 F NON-FACE Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

CS223b 67 Cascaded Classifier 1 Feature 5 Features F 50% 20 Features 20%2% FACE NON-FACE F F IMAGE SUB-WINDOW A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) using data from previous stage. A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

CS223b 68 Output of Face Detector on Test Images Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

CS223b 69 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

CS223b 70 Face Localization Features Learned features reflect the task Slide credit: Frank Dellaert, Paul Viola, Forsyth&Ponce

CS223b 71 Face Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

CS223b 72 Face Profile Features

CS223b 73 Finding Cars (DARPA Urban Challenge) Hand-labeled images of generic car rear-ends Training time: ~5 hours, offline 1100 images Credit: Hendrik Dahlkamp

CS223b 74 Generating even more examples Generic classifier finds all cars in recorded video. Compute offline and store in database images Credit: Hendrik Dahlkamp

CS223b 75 Results - Video

CS223b 76 Summary Viola-Jones Many simple features Generalized Haar features (multi-rectangles) Easy and efficient to compute Discriminative Learning: finds a small subset for object recognition Uses AdaBoost Result: Feature Cascade 15fps on 700Mhz Laptop (=fast!) Applications Face detection Car detection Many others