Stanford CS223B Computer Vision, Winter 2007 Lecture 5 Advanced Image Filters Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
AdaBoost & Its Applications
Face detection Many slides adapted from P. Viola.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Edge and Corner Detection Reading: Chapter 8 (skip 8.1) Goal: Identify sudden changes (discontinuities) in an image This is where most shape information.
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
Edge Detection. Our goal is to extract a “line drawing” representation from an image Useful for recognition: edges contain shape information –invariance.
The Viola/Jones Face Detector (2001)
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Edge detection Goal: Identify sudden changes (discontinuities) in an image Intuitively, most semantic and shape information from the image can be encoded.
Advanced Features Jana Kosecka CS223b Slides from: S. Thurn, D. Lowe, Forsyth and Ponce.
Stanford CS223B Computer Vision, Winter Lecture 4 Advanced Features
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2006 Lecture 3 More On Features Professor Sebastian Thrun.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature extraction: Corners and blobs
Learning and Vision: Discriminative Models
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Scale Invariant Feature Transform (SIFT)
A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.
SIFT - The Scale Invariant Feature Transform Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer.
3-D Computer Vision CSc Feature Detection and Grouping.
776 Computer Vision Jan-Michael Frahm, Enrique Dunn Spring 2013.
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
776 Computer Vision Jan-Michael Frahm Fall SIFT-detector Problem: want to detect features at different scales (sizes) and with different orientations!
CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.
Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.
CSE 185 Introduction to Computer Vision Feature Matching.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Presented by David Lee 3/20/2006
Face detection Many slides adapted from P. Viola.
SIFT.
SIFT Scale-Invariant Feature Transform David Lowe
CS262: Computer Vision Lect 09: SIFT Descriptors
Presented by David Lee 3/20/2006
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Scale Invariant Feature Transform (SIFT)
Nearest-neighbor matching to feature database
Lit part of blue dress and shadowed part of white dress are the same color
Feature description and matching
Nearest-neighbor matching to feature database
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
SIFT.
Feature descriptors and matching
Presented by Xu Miao April 20, 2005
Presentation transcript:

Stanford CS223B Computer Vision, Winter 2007 Lecture 5 Advanced Image Filters Professors Sebastian Thrun and Jana Košecká CAs: Vaibhav Vaish and David Stavens

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Features in Matlab im = imread('bridge.jpg'); bw = rgb2gray(im); edge(im,’sobel’) - (almost) linear edge(im,’canny’) - not local, no closed form

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Sobel Operator S1=S1=S 2 = Edge Magnitude = Edge Direction = S 1 + S tan -1 S1S1 S2S2

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Sobel in Matlab edge(im,’sobel’)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detector edge(im,’canny’)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Comparison CannySobel

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection Steps: 1.Apply derivative of Gaussian 2.Non-maximum suppression Thin multi-pixel wide “ridges” down to single pixel width 3.Linking and thresholding Low, high edge-strength thresholds Accept all edges over low threshold that are connected to edge over high threshold

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Non-Maximum Supression Non-maximum suppression: Select the single maximum point across the width of an edge.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Linking to the Next Edge Point Assume the marked point q is an edge point. Take the normal to the gradient at that point and use this to predict continuation points (either r or p).

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Edge Hysteresis n Hysteresis: A lag or momentum factor Idea: Maintain two thresholds k high and k low –Use k high to find strong edges to start edge chain –Use k low to find weak edges which continue edge chain n Typical ratio of thresholds is roughly k high / k low = 2

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection (Example) courtesy of G. Loy gap is gone Original image Strong edges only Strong + connected weak edges Weak edges

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Canny Edge Detection (Example) Using Matlab with default thresholds

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Bridge Example Again edge(im,’canny’)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Corner Effects

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary: Canny Edge Detection n Most commonly used method n Traces edges, accommodates variations in contrast n Not a linear filter! n Problems with corners

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Towards Global Features Local versus global

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points A. Canaletto [1740], Arrival of the French Ambassador in Venice

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Vanishing Points…? A. Canaletto [1740], Arrival of the French Ambassador in Venice

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 From Edges to Lines

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform y x

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 m Hough Transform: Quantization Detecting Lines by finding maxima / clustering in parameter space x y

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform: Algorithm n For each image point, determine –most likely line parameters b,m (direction of gradient) –strength (magnitude of gradient) n Increment parameter counter by strength value n Cluster in parameter space, pick local maxima

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Hough Transform: Results Hough TransformImageEdge detection

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary Hough Transform n Smart counting –Local evidence for global features –Organized in a table –Careful with parameterization! n Problem: Curse of dimensionality –Works great for simple features with 3 unknowns –Will fail for complex objects n Problem: Not a local algorithm

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Features for Object Detection/Recognition Want to find … in here

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Templates n Find an object in an image! n We want Invariance! –Scaling –Rotation –Illumination –Perspective Projection

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Convolution

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Convolution

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Aside: Convolution Theorem Fourier Transform of g : F is invertible

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates % read image im = imread('bridge.jpg'); bw = double(im(:,:,1))./ 256;; imshow(bw) % apply FFT FFTim = fft2(bw); bw2 = real(ifft2(FFTim)); imshow(bw2) % define a kernel kernel=zeros(size(bw)); kernel(1, 1) = 1; kernel(1, 2) = -1; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = real(ifft2(FFTresult)); imshow(result) % select an image patch patch = bw(221:240,351:370); imshow(patch) patch = patch - (sum(sum(patch)) / size(patch,1) / size(patch, 2)); kernel=zeros(size(bw)); kernel(1:size(patch,1),1:size(patch,2)) = patch; FFTkernel = fft2(kernel); % apply the kernel and check out the result FFTresult = FFTim.* FFTkernel; result = max(0, real(ifft2(FFTresult))); result = result./ max(max(result)); result = (result.^ 1 > 0.5); imshow(result) % alternative convolution imshow(conv2(bw, patch, 'same'))

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Convolution with Templates n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Scale Invariance: Image Pyramid

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Pyramid Convolution with Templates n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No Yes No

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Pyramid warning: Aliasing

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Aliasing Effects Constructing a pyramid by taking every second pixel leads to layers that badly misrepresent the top layer Slide credit: Gary Bradski

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Solution to Aliasing n Convolve with Gaussian

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Templates with Image Pyramid n Invariance: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization No (maybe rotate template?) Yes No Not really No

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Template Matching, Commercial

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Improved Invariance Handling Want to find … in here

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT Features n Invariances: –Scaling –Rotation –Illumination –Deformation n Provides –Good localization Yes Not really Yes

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT Reference Distinctive image features from scale-invariant keypoints. David G. Lowe, International Journal of Computer Vision, 60, 2 (2004), pp SIFT = Scale Invariant Feature Transform

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Invariant Local Features n Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters SIFT Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advantages of invariant local features n Locality: features are local, so robust to occlusion and clutter (no prior segmentation) n Distinctiveness: individual features can be matched to a large database of objects n Quantity: many features can be generated for even small objects n Efficiency: close to real-time performance n Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Finding “Keypoints” (Corners) Idea: Find Corners, but scale invariance Approach: n Run linear filter (diff of Gaussians) n Do this at different resolutions of image pyramid

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Difference of Gaussians Minus Equals

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Difference of Gaussians surf(fspecial('gaussian',40,4)) surf(fspecial('gaussian',40,8)) surf(fspecial('gaussian',40,8) - fspecial('gaussian',40,4))

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Find Corners with DiffOfGauss im =imread('bridge.jpg'); bw = double(im(:,:,1)) / 256; for i = 1 : 10 gaussD = fspecial('gaussian',40,2*i) - fspecial('gaussian',40,i); res = abs(conv2(bw, gaussD, 'same')); res = res / max(max(res)); imshow(res) ; title(['\bf i = ' num2str(i)]); drawnow end

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=1

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=2

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=3

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=4

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=5

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=6

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=7

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=8

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=9

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Gaussian Kernel Size i=10

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Key point localization n Detect maxima and minima of difference-of-Gaussian in scale space

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example of keypoint detection (a) 233x189 image (b) 832 DOG extrema (c) 729 above threshold

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example of keypoint detection Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach) (c) 729 left after peak value threshold (from 832) (d) 536 left after testing ratio of principle curvatures

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Select canonical orientation n Create histogram of local gradient directions computed at selected scale n Assign canonical orientation at peak of smoothed histogram n Each key specifies stable 2D coordinates (x, y, scale, orientation)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT On-A-Slide 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation.

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT vector formation n Thresholded image gradients are sampled over 16x16 array of locations in scale space n Create array of orientation histograms n 8 orientations x 4x4 histogram array = 128 dimensions

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Nearest-neighbor matching to feature database n Hypotheses are generated by approximate nearest neighbor matching of each feature to vectors in the database –SIFT use best-bin-first (Beis & Lowe, 97) modification to k-d tree algorithm –Use heap data structure to identify bins in order by their distance from query point n Result: Can give speedup by factor of 1000 while finding nearest neighbor (of interest) 95% of the time

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter D Object Recognition n Extract outlines with background subtraction

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter D Object Recognition n Only 3 keys are needed for recognition, so extra keys provide robustness n Affine model is no longer as accurate

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Recognition under occlusion

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Test of illumination invariance n Same image under differing illumination 273 keys verified in final match

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Examples of view interpolation

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Location recognition

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT n Invariances: –Scaling –Rotation –Illumination –Perspective Projection n Provides –Good localization Yes Maybe Yes

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SOFT for Matlab (at UCLA)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 SIFT demos Run sift_compile sift_demo2

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary SIFT 1. Enforce invariance to scale: Compute Gaussian difference max, for may different scales; non-maximum suppression, find local maxima: keypoint candidates 2. Localizable corner: For each maximum fit quadratic function. Compute center with sub-pixel accuracy by setting first derivative to zero. 3. Eliminate edges: Compute ratio of eigenvalues, drop keypoints for which this ratio is larger than a threshold. 4. Enforce invariance to orientation: Compute orientation, to achieve rotation invariance, by finding the strongest second derivative direction in the smoothed image (possibly multiple orientations). Rotate patch so that orientation points up. 5. Compute feature signature: Compute a "gradient histogram" of the local image region in a 4x4 pixel region. Do this for 4x4 regions of that size. Orient so that largest gradient points up (possibly multiple solutions). Result: feature vector with 128 values (15 fields, 8 gradients). 6. Enforce invariance to illumination change and camera saturation: Normalize to unit length to increase invariance to illumination. Then threshold all gradients, to become invariant to camera saturation. Defines state-of-the-art in invariant feature matching!

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Advanced Features: Topics n Advanced Edge Detection n Global Image Features (Hough Transform) n Templates, Image Pyramid n SIFT Features n Learning with Many Simple Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 A totally different idea n Use many very simple features n Learn cascade of tests for target object n Efficient if: –features easy to compute –cascade short

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Using Many Simple Features n Viola Jones / Haar Features (Generalized) Haar Features: rectangular blocks, white or black 3 types of features: two rectangles: horizontal/vertical three rectangles four rectangles in 24x24 window: 180,000 possible features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Integral Image Def: The integral image at location (x,y), is the sum of the pixel values above and to the left of (x,y), inclusive. We can calculate the integral image representation of the image in a single pass. (x,y) s(x,y) = s(x,y-1) + i(x,y) ii(x,y) = ii(x-1,y) + s(x,y) (0,0) x y Slide credit: Gyozo Gidofalvi

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Efficient Computation of Rectangle Value Using the integral image representation one can compute the value of any rectangular sum in constant time. Example: Rectangle D ii(4) + ii(1) – ii(2) – ii(3) As a result two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively. Slide credit: Gyozo Gidofalvi

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Idea 1: Linear Separator Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Linear Separator for Image features (highly related to Vapnik’s Support Vector Machines) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Problem n How to find hyperplane? n How to avoid evaluating 180,000 features? n Answer: Boosting [AdaBoost, Freund/Shapire] –Finds small set of features that are “sufficient” –Generalizes very well (a lot of max-margin theory) –Requires positive and negative examples

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost Idea (in Viola/Jones): n Given set of “weak” classifiers: –Pick best one –Reweight training examples, so that misclassified images have larger weight –Reiterate; then linearly combine resulting classifiers Weak classifiers: Haar features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost Weak Classifier 1 Weights Increased Weak classifier 3 Final classifier is linear combination of weak classifiers Weak Classifier 2 Freund & Shapire

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Adaboost Algorithm Freund & Shapire

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 AdaBoost gives efficient classifier: n Features = Weak Classifiers n Each round selects the optimal feature given: –Previous selected features –Exponential Loss n AdaBoost Surprise –Generalization error decreases even after all training examples 100% correctly classified (margin maximization phenomenon)

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Boosted Face Detection: Image Features “Rectangle filters” Unique Binary Features Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Example Classifier for Face Detection ROC curve for 200 feature classifier A classifier with 200 rectangle features was learned using AdaBoost 95% correct detection on test set with 1 in false positives. Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Classifier are Efficient n Given a nested set of classifier hypothesis classes vs falsenegdetermined by % False Pos % Detection IMAGE SUB-WINDOW Classifier 1 F NON-FACE F FACE Classifier 3 T F NON-FACE T T T Classifier 2 F NON-FACE Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Cascaded Classifier 1 Feature 5 Features F 50% 20 Features 20%2% FACE NON-FACE F F IMAGE SUB-WINDOW n A 1 feature classifier achieves 100% detection rate and about 50% false positive rate. n A 5 feature classifier achieves 100% detection rate and 40% false positive rate (20% cumulative) –using data from previous stage. n A 20 feature classifier achieve 100% detection rate with 10% false positive rate (2% cumulative) Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Output of Face Detector on Test Images Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Solving other “Face” Tasks Facial Feature Localization Demographic Analysis Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Localization Features n Learned features reflect the task Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Profile Detection Slide credit: Frank Dellaert, Paul Viola, Foryth&Ponce

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Face Profile Features

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Finding Cars (DARPA Urban Challenge) n Hand-labeled images of generic car rear-ends n Training time: ~5 hours, offline 1100 images Credit: Hendrik Dahlkamp

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Generating even more examples n Generic classifier finds all cars in recorded video. n Compute offline and store in database images Credit: Hendrik Dahlkamp

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Results - Video

Sebastian Thrun and Jana Košecká CS223B Computer Vision, Winter 2007 Summary Viola-Jones n Many simple features –Generalized Haar features (multi-rectangles) –Easy and efficient to compute n Discriminative Learning: –finds a small subset for object recognition –Uses AdaBoost n Result: Feature Cascade –15fps on 700Mhz Laptop (=fast!) n Applications –Face detection –Car detection –Many others