Object Recognition using Local Invariant Features Claudio Scordino July 5 th 2006.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Feature Detection. Description Localization More Points Robust to occlusion Works with less texture More Repeatable Robust detection Precise localization.
Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,
Distinctive Image Features from Scale- Invariant Keypoints Mohammad-Amin Ahantab Technische Universität München, Germany.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
IBBT – Ugent – Telin – IPI Dimitri Van Cauwelaert A study of the 2D - SIFT algorithm Dimitri Van Cauwelaert.
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
Lecture 6: Feature matching CS4670: Computer Vision Noah Snavely.
(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection,
A Study of Approaches for Object Recognition
Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.
Feature matching and tracking Class 5 Read Section 4.1 of course notes Read Shi and Tomasi’s paper on.
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Distinctive Image Feature from Scale-Invariant KeyPoints
Feature extraction: Corners and blobs
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Object Recognition Using Distinctive Image Feature From Scale-Invariant Key point D. Lowe, IJCV 2004 Presenting – Anat Kaspi.
Scale Invariant Feature Transform (SIFT)
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Distinctive Image Features from Scale-Invariant Keypoints David G. Lowe – IJCV 2004 Brien Flewelling CPSC 643 Presentation 1.
Sebastian Thrun CS223B Computer Vision, Winter Stanford CS223B Computer Vision, Winter 2005 Lecture 3 Advanced Features Sebastian Thrun, Stanford.
Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Object Tracking/Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition.
Speaker : Meng-Shun Su Adviser : Chih-Hung Lin Ten-Chuan Hsiao Ten-Chuan Hsiao Date : 2010/01/26 ©2010 STUT. CSIE. Multimedia and Information Security.
CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
Wenqi Zhu 3D Reconstruction From Multiple Views Based on Scale-Invariant Feature Transform.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Distinctive Image Features from Scale-Invariant Keypoints Ronnie Bajwa Sameer Pawar * * Adapted from slides found online by Michael Kowalski, Lehigh University.
Features Digital Visual Effects, Spring 2006 Yung-Yu Chuang 2006/3/15 with slides by Trevor Darrell Cordelia Schmid, David Lowe, Darya Frolova, Denis Simakov,
Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.
Jack Pinches INFO410 & INFO350 S INFORMATION SCIENCE Computer Vision I.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Local features: detection and description
776 Computer Vision Jan-Michael Frahm Spring 2012.
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
Blob detection.
11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.
SIFT.
776 Computer Vision Jan-Michael Frahm Spring 2012.
SIFT Scale-Invariant Feature Transform David Lowe
Interest Points EE/CSE 576 Linda Shapiro.
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Distinctive Image Features from Scale-Invariant Keypoints
Paper – Stephen Se, David Lowe, Jim Little
Scale Invariant Feature Transform (SIFT)
Nearest-neighbor matching to feature database
TP12 - Local features: detection and description
Feature description and matching
Nearest-neighbor matching to feature database
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
Interest Points & Descriptors 3 - SIFT
SIFT keypoint detection
SIFT.
Feature descriptors and matching
Presented by Xu Miao April 20, 2005
Presentation transcript:

Object Recognition using Local Invariant Features Claudio Scordino July 5 th 2006

Object Recognition Widely used in the industry for Inspection Registration Manipulation Robot localization and mapping Current commercial systems Correlation-based template matching Computationally infeasible when object rotation, scale, illumination and 3D pose vary Even more infeasible with partial occlusion Alternative: Local Image Features

Local Image Features Unaffected by Nearby clutter Partial occlusion Invariant to Illumination 3D projective transforms Common object variations...but, at the same time, sufficiently distinctive to identify specific objects among many alternatives!

Related work Line segments, edges and regions grouping Detection not good enough for reliable recognition Peaks detection in local image variations Example: Harris corner detector Drawback: image examined at only a single scale Different key locations as the image scale changes Eigenspace matching, color and receptive field histograms Successful on isolated objects Unextendable to cluttered and partially occluded images

SIFT Method Scale Invariant Feature Transform (SIFT) Staged filtering approach Identifies stable points (image “ keys ” ) Computation time less than 2 secs

SIFT Method (2) Local features: Invariant to image translation, scaling, rotation Partially invariant to illumination changes and 3D projection (up to 20° of rotation) Minimally affected by noise Similar properties with neurons in Inferior Temporal cortex used for object recognition in primate vision

First stage Input: original image (512 x 512 pixel) Goal: key localization and image description Output: SIFT keys Feature vector describing the local image region sampled relative to its scale-space coordinate frame

First stage (2) Description: Represents blurred image gradient locations in multiple orientations planes and at multiple scales Approach based on a model of cells in the celebral cortex of mammalian vision Less than 1 sec of computation time Build a pyramid of images Images are difference-of-Gaussian (DOG) functions Resampling between each level

Key localization Algorithm: Expand original image by a factor of 2 using bilinear interpolation For each pyramid level: 1. Smooth input image through a convolution with the 1D Gaussian function (horizontal direction): withobtaining Image A

Key localization (2) 2. Smooth Image A through a further convolution with th 1D Gaussian function (vertical direction) obtaining Image B 3. The DOG image of this level is B-A 4. Resample Image B using bilinear interpolation with pixel spacing 1.5 in each direction and use the result as Input Image of the new pyramid level Each new sample is a constant linear combination of 4 adjacent pixels

Key localization (3) Find maxima and minima of the DOG images: 2 nd level 1 st level

Key orientation 1. Extract image gradients and orientation at each pyramid level. For each pixel A ij compute 2. M ij thresholded at a value of 0.1 times the maximum possible gradient value Provides robustness to illumination Image Gradient Magnitude Image Gradient Orientation

Key orientation (2) 3. Create an orientation histogram using a circular Gaussian-weighted window with σ=3 times the current smoothing scale The weights are multiplied by M ij The histogram is smoothed prior to peak selection The orientation is determined by the peak in the histogram

Experimental results Original image Keys on image after rotation (15°), scaling (90%), horizontal streching (110%), change of brightness (-10%) and contrast (90%), and addition of pixel noise 78%

Experimental results (2) Image transformationLocation and scale match Orientation match Decrease constrast by %86.6 % Decrease intensity by %85.9 % Rotate by 20°85.4 %81.0 % Scale by %80.3 % Stretch by %76.1 % Stretch by %65.0 % Add 10% pixel noise90.3 %88.4 % All previous78.6 %71.8 % 20 different images, around 15,000 keys

Image description Approach suggested by the response properties of complex neurons in the visual cortex A feature position is allowed to vary over a small region, while orientation and spatial frequency are maintained Image descripted through 8 orientation planes Keys inserted according to their orientations

Second stage Goal: identify candidate object matches The best candidate match is the nearest neighbour (i.e., minimum Euclidean distance between decriptor vectors) The exact solution for high dimensional vectors is known to have high complexity

Second stage (2) Algorithm: approximate Best-Bin-First (BBF) search method (Beis and Lowe) Modification of the k-d tree algorithm Identifies the nearest neighbours with high probability and small computation The keys generated at the larger scale are given twice the weight of those at the smaller scale Improves recognition by giving more weight to the least- noisy scale

Third stage Description: final verification Algorithm: low-residual least-squares fit Solution of a linear system: x = [A T A] -1 A T b When at least 3 keys agree with low residual, there is strong evidence for the presence of the object Since there are dozens of keys in the image, this works also with partial occlusion

Perspective projection

Partial occlusion Computation time: 1.5 secs on Sun Sparc 10 (0.9 secs first stage)

Connections to human vision Performance of human vision is obviously far superior than current computer vision... The brain uses a highly computational- intensive parallel process instead of a staged filtering approach

Connections to human vision However... the results are much the same Recent research in neuroscience showed that the neurons of Inferior Temporal cortex Recognize shape features The complexity of the features is roughly the same as for SIFT They also recognize color and texture properties in addition to shape Further research: 3D structure of objects Additional feature types for color and texture

Augmented Reality (AR) Registration of virtual objects into a live video sequence Current AR systems: Rely on markers strategically placed in the environment Need manual camera calibration

Related work Harris corner detector and Kanade-Lucas- Tomasi (KLT) tracker Not enough feature invariance Parallelogram-shaped and elliptical image regions tracking Requires planar structures in viewed scene Pre-built user-supplied CAD object models Not always available Limited to objects that can be easily modelled Off-line batch processing of the entire video

AR using SIFT Flexible automated AR Not needed: Camera pre-calibration Prior knowledge of scene geometry Manual initialization of the tracker Placement of special markers Special tools or equipment (just a camera) Short time and small effort to setup Robust 6 degrees of freedom

AR using SIFT (2) Need only a set of reference images taken by a handheld uncalibrated camera from arbitrary viewpoints Acquired from unknown spatially separated viewpoints by a handheld camera At least two images 5 to 20 images separated by at most 45° Used to build a 3D model of the viewed scene

AR using SIFT (3) First (off-line) stage: 1. Extract SIFT features from reference images 2. Establish multi-view correspondences 3. Build a metric model of the real world 4. Compute calibration parameters and camera poses 5. The user places the virtual object The placement is achieved by anchoring object projection in the first image Then, a second projection is adjusted in the second image Finally, the user fine-tunes position, orientation and size

AR using SIFT (4) Second (on-line) stage: 1. Features are detected in the current frame 2. Features are matched to those of the model using the BBF algorithm 3. The matches are used to compute the current pose of the camera 4. Solution is stabilized by using the values computed for the previous frame

AR using SIFT: prototype Software C programming language OpenGL and GLUT libraries Hardware: IBM ThinkPad Pentium 4-M processor (1.8 GHz) Logitech QuickCam Pro 4000 camera OperationComputation time Feature extraction150 msec Feature matching40 msec Camera pose computation25 msec 4 FPS

AR using SIFT: drawbacks The tracker is very slow 4 FPS (Frame Per Second) Too slow for real-time operations (25 FPS) The main bottleneck is feature extraction Unable to handle occlusion of inserted virtual content by real objects A full model of the observed scene is required

AR using SIFT: examples Videos: mug tabletop

Conclusions Object recognition using SIFT Reliable recognition Several characteristics in common with human vision Augmented reality using SIFT Very flexible Not possible in real-time due to the high computation times In future possible using faster processors

References David G. Lowe, "Object recognition from local scale- invariant features" International Conference on Computer Vision, Corfu, Greece (September 1999), pp Stephen Se, David G. Lowe and Jim Little, "Vision-based mobile robot localization and mapping using scale- invariant features" Proceedings of IEEE International Conference on Robotics and Automation, Seoul, Korea (May 2001), pp Iryna Gordon and David G. Lowe, "Scene modelling, recognition and tracking with invariant image features" International Symposium on Mixed and Augmented Reality (ISMAR), Arlington, VA (Nov. 2004), pp

For any question... David Lowe Computer Science Department 2366 Main Mall University of British Columbia Vancouver, B.C., V6T 1Z4, Canada