Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

CS479/679 Pattern Recognition Dr. George Bebis

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Object Recognition using Invariant Local Features Applications l Mobile robots, driver assistance l Cell phone location or object recognition l Panoramas,

Computer Vision Detecting the existence, pose and position of known objects within an image Michael Horne, Philip Sterne (Supervisor)

Mapping: Scaling Rotation Translation Warp

Lab 2 Lab 3 Homework Labs 4-6 Final Project Late No Videos Write up

Image alignment Image from

Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Robust and large-scale alignment Image from

Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.

Accurate Non-Iterative O( n ) Solution to the P n P Problem CVLab - Ecole Polytechnique Fédérale de Lausanne Francesc Moreno-Noguer Vincent Lepetit Pascal.

A Study of Approaches for Object Recognition

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

Uncalibrated Geometry & Stratification Sastry and Yang

(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection, Extraction.

Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,

Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2006 with a lot of slides stolen from Steve Seitz and.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Scale-Invariant Feature Transform (SIFT) Jinxiang Chai.

(Fri) Young Ki Baik Computer Vision Lab.

Radial-Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Image Stitching Ali Farhadi CSE 455

CSE 185 Introduction to Computer Vision

CSC 589 Lecture 22 Image Alignment and least square methods Bei Xiao American University April 13.

Computing the Fundamental matrix Peter Praženica FMFI UK May 5, 2008.

Final Exam Review CS485/685 Computer Vision Prof. Bebis.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Multimodal Interaction Dr. Mike Spann

General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.

Terrorists Team members: Ágnes Bartha György Kovács Imre Hajagos Wojciech Zyla.

CVPR 2003 Tutorial Recognition and Matching Based on Local Invariant Features David Lowe Computer Science Department University of British Columbia.

21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Distinctive Image Features from Scale-Invariant Keypoints David Lowe Presented by Tony X. Han March 11, 2008.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

CSE 185 Introduction to Computer Vision Feature Matching.

Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Real-Time Feature Matching using Adaptive and Spatially Distributed Classification Trees Aurélien BOFFY, Yanghai TSIN, Yakup GENC.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

Invariant Local Features Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging.

776 Computer Vision Jan-Michael Frahm Spring 2012.

SIFT Scale-Invariant Feature Transform David Lowe

Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Distinctive Image Features from Scale-Invariant Keypoints

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Real-Time Human Pose Recognition in Parts from Single Depth Image

Image Stitching Slides from Rick Szeliski, Steve Seitz, Derek Hoiem, Ira Kemelmacher, Ali Farhadi.

Structure from motion Input: Output: (Tomasi and Kanade)

Hidden Markov Models Part 2: Algorithms

CSE 455 – Guest Lectures 3 lectures Contact Interest points 1

Aim of the project Take your image Submit it to the search engine

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Calibration and homographies

Back to equations of geometric transformations

Structure from motion Input: Output: (Tomasi and Kanade)

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Image Stitching Linda Shapiro ECE/CSE 576.

Image Stitching Linda Shapiro ECE P 596.

Presentation transcript:

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit

Real-Time 3D Object Detection Runs at 15 Hz

3 Keypoint Recognition Pre-processing Make the actual classification easier Nearest neighbor classification One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise... The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification: Search in the Database

4 Used at run-time to recognize the keypoints Training phase Classifier

5 A New Classifier: Ferns Joint Work with Mustafa Özuysal

6 Compromise: which is proportional to but complete representation of the joint distribution infeasible. Naive Bayesian ignores the correlation: We are looking for If patch can be represented by a set of image features { f i } :

Presentation on an Example

Ferns: Training The tests compare the intensities of two pixels around the keypoint: Invariant to light change by any raising function. Posterior probabilities:

Ferns: Training

Ferns: Training

Ferns: Training Results

Ferns: Recognition

It Really Works

14 Ferns outperform Trees 500 classes. No orientation or perspective correction. FERNS TREES Number of structures Recognition rate Ferns responses are combined multiplicatively (Naive Bayesian rule) Trees responses are combined additively (average)

Optimized Locations versus Random Locations: We Can Use Random Tests Number of trees Recognition rate Information gain optimization Randomness Comparison of the recognition rates for 200 keypoints:

16 We Can Use Random Tests For a small number of classes we can try several tests, and retain the best one according to some criterion.

17 We Can Use Random Tests For a small number of classes we can try several tests, and retain the best one according to some criterion. When the number of classes is large any test does a decent job:

18 Another Graphical Interpretation

19 Another Graphical Interpretation

20 Another Graphical Interpretation

21 Another Graphical Interpretation

22 Another Graphical Interpretation

23 Another Graphical Interpretation

24 We Can Use Random Tests : Why It Is Interesting Building the ferns takes no time (except for the posterior probabilities estimation); Simplifies the classifier structure; Allows incremental learning.

25 Comparison with SIFT Recognition rate FERNS SIFT Frame Index Number of Inliers

26 Comparison with SIFT Computation time SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution); FERNS: 13.5 micro-second to classify one keypoint into 200 classes.

27 1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1; 10: for(int i = 0; i < H; i++) P[i] += p[i]; } Very simple to implement; No need for orientation nor perspective correction; (Almost) no parameters to tune; Very fast. Keypoint Recognition in Ten Lines of Code

28 Ferns Tuning The number of ferns, and The number of tests per ferns can be tuned to adapt to the hardware in terms of CPU power and memory size.

Feature Harvesting Estimate the posterior probabilities from a training video sequence:

Feature Harvesting Update Classifier Detect Object in Current Frame With the ferns, we can easily: - add a class; - remove a class; - add samples of a class to refine the classifier.  Incremental learning  No need to store image patches;  We can select the keypoints the classifier can recognize. Training examples Matches

Test Sequence

Handling Light Changes

35

Low Complexity Keypoint Recognition and Pose Estimation

37 EP n P: An Accurate Non-Iterative O(n) Solution to the P n P Problem Joint Work with Francesc Moreno-Noguer

38 The Perspective- n -Point (P n P) Problem How to take advantage of the internal parameters ? Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...]. Rotation, Translation ? Internal parameters known 2D/3D correspondences known

39 A Stable Algorithm MEANMEDIAN Rotation Error (%) Number of points used to estimate pose LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation); EPnP: Our method.

40 A Fast Algorithm Rotation Error (%) Computation Time (sec) - Logarithmic scale MEDIAN

41 General Approach Estimate the coordinates of the 3D points in the camera coordinate system. Rotation, Translation [Lu et al. PAMI00] Rotation, Translation [Lu et al. PAMI00]

42 Introducing Control Points The 3D points are expressed as a weighted sum of four control points.  12 unknowns: The coordinates of the control points in the camera coordinates system.

43 The Point Reprojections Give a Linear System For each correspondence i : Rewriting and Concatenating the Equations from all the Correspondences:

44 Mx = 0  M T Mx = 0  x belongs to the null space of M T M : with v i eigenvectors of matrix M T M associated to null eigenvalues. Computing M T M is the most costly operation — and linear in n, the number of correspondences. The Solution as Weighted Sum of Eigenvectors

45 The  i are our N new unknowns; N is the dimension of the null space of M T M ; Without noise: N = 1 (scale ambiguity). In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise). We found that only the cases N = 1, 2, 3 and 4 must be considered. From 12 Unknowns to 1, 2, 3, or 4

46 How the Control Points Vary with the  i Reprojections in the Image Corresponding 3D points When varying the  i :

47 Imposing the Rigidity Constraint The distances between the control points must be preserved:  6 quadratic equations in the  i.

48 The Case N = 1  1 can easily be computed: Its absolute value is solution of a linear system: Its sign is chosen so that the handedness of the control points is preserved., and 6 quadratic equations:

49 We use the linearization technique. Gives 6 linear equations in  11 =  1 2,  12 =  1  2, and  22 =  2 2 : The Case N = 2, and 6 quadratic equations:

50 Same linearization technique. Gives 6 linear equations for 6 unknowns: The Case N = 3, and 6 quadratic equations:

51 Six quadratic equations in  1,  2,  3, and  4. The linearization introduces 10 products  ab =  a  b  Not enough equations anymore ! Õ Relinearization: The  ab are expressed as a linear combination of eigenvectors. The Case N = 4

52 Algorithm Summary 1.The control points coordinates are the (12) unknowns; 2.The 3D points should project on the given corresponding 2D locations: Linear system in the control points coordinates. 3.The control points coordinates can be expressed as a linear combination of the null eigenvectors of this linear system: The weights (the  i ) are the new unknowns (not more than 4). 4.Adding the rigidity constraints gives quadratic equations in the  i. 5.Solving for the  i depends on their number (linearization or relinearization).

53 Results

54 Thank you. Questions ?

55 Estimating from Samples the empirical probability constant value It is easy to prove that this expression for can be estimated as if we modelize as with the number of samples that verify and u a positive constant (in practice u = 1).

The Arborescence is not Needed when the Features are Taken at Random f1f1 f2f2 f3f3

57 Ferns outperform Trees Recognition rate Number of classes FERNS TREES

58 Linking the Two Approaches The Ferns consider while the Trees consider with

59 Linking the Two Approaches It can be proved the two methods are equivalent when the are small:

60 The Point Reprojections Give a Linear System For each correspondence i : Let's expand: Concatenating equations from all the correspondences: From point reprojection: