Presented by: Chang Jia As for: Pattern Recognition

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Searching on Multi-Dimensional Data

Hongliang Li, Senior Member, IEEE, Linfeng Xu, Member, IEEE, and Guanghui Liu Face Hallucination via Similarity Constraints.

Proposed concepts illustrated well on sets of face images extracted from video: Face texture and surface are smooth, constraining them to a manifold Recognition.

Face Recognition Method of OpenCV

Verbs and Adverbs: Multidimensional Motion Interpolation Using Radial Basis Functions Presented by Sean Jellish Charles Rose Michael F. Cohen Bobby Bodenheimer.

Automatic Feature Extraction for Multi-view 3D Face Recognition

Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

© 2003 by Davi GeigerComputer Vision September 2003 L1.1 Face Recognition Recognized Person Face Recognition.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Face Recognition with Harr Transforms and SVMs EE645 Final Project May 11, 2005 J Stautzenberger.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Face Recognition Based on 3D Shape Estimation

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,

Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?

Linear Discriminant Functions Chapter 5 (Duda et al.)

Comparing Kernel-based Learning Methods for Face Recognition Zhiguo Li

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Radial Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

1 Graph Embedding (GE) & Marginal Fisher Analysis (MFA) 吳沛勳劉冠成韓仁智

Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,

Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,

Face Recognition: An Introduction

CSE 185 Introduction to Computer Vision Face Recognition.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Deformation Modeling for Robust 3D Face Matching Xioguang Lu and Anil K. Jain Dept. of Computer Science & Engineering Michigan State University.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Spectral Methods for Dimensionality

IMAGE PROCESSING RECOGNITION AND CLASSIFICATION

Instance Based Learning

Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi

Face Recognition and Feature Subspaces

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”

In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?

Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE

K Nearest Neighbor Classification

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Learning with information of features

The following slides are taken from:

3. Brute Force Selection sort Brute-Force string matching

Dimensionality Reduction

Image and Video Processing

Feature space tansformation methods

The loss function, the normal equation,

3. Brute Force Selection sort Brute-Force string matching

Mathematical Foundations of BME Reza Shadmehr

EM Algorithm and its Applications

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

3. Brute Force Selection sort Brute-Force string matching

Presentation transcript:

Object Recognition using Boosted Discriminants Shyjan Mahamud, Martial Hebert, and Jianbo Shi Presented by: Chang Jia As for: Pattern Recognition Instructor: Prof. George Bebis 04-28-2006

Outline Introduction Review of basic concepts Object discrimination method The Loss Function Boosting Discriminants Learning an Efficient Code Experimental results Conclusions and future work 04-28-2006

Object Recognition Recognize object class/ Identify individual object in given images Face detection, recognizing animals, cars, etc. Possible for both instances or object classes ( Mona Lisa vs. faces or Beetle vs. cars) 04-28-2006

Object Recognition The hard part: the same object can look incredibly different in different images due to differences in view points A robust recognizer must be tolerant to changes in pose, expression, illumination, and occlusion etc. 04-28-2006

COIL Object Database 04-28-2006

Code Space for Objects 04-28-2006 Illustration of a “code”-space for objects. Each image of an object of interest has a “code-word” in terms of the responses to a set of binary discriminants as illustrated at the top of the figure. The bottom shows a 2D embedding of such code-words for a sample of images from various object classes A, B, C, D. The goal is to find codes that cluster together images from the same object class while separating out images from different classes as much as possible. 04-28-2006

Boosting Algorithm Boosting - is an algorithm for constructing a strong classifier out of a linear combination of simple weak classifiers It provides a method of choosing the weak classifiers and setting the weights Terminology 04-28-2006

Example: combination of linear classifiers 04-28-2006

Correlation Function Distance Measure in code space Related to the Hamming distance when the weight all set to 1 Given an input image the class label corresponding to the training image that has the highest correlation with the input image is reported. Hamming distance between two strings of equal length is the number of positions for which the corresponding symbols are different. For example: The Hamming distance between 1011101 and 1001001 is 2. The Hamming distance between 2143896 and 2233796 is 3. The Hamming distance between "toned" and "roses" is 3. 04-28-2006

Proposed Method Idea: Various candidate discriminants are constructed by optimizing a pair-wise formulation of a generalization of the Fisher criteria. The candidate discriminant that reduces the total loss the most is chosen. The discriminants chosen so far are weighted and combined to give the final correlation function to be used at run-time. 04-28-2006

The Loss Function The exponential loss function The logistic cost function To simplify the presentation, use the first one In this framework, it can be shown that the exponential loss function is the optimal choice among all loss functions when unnormalized models are sought, while the logistic loss is optimal when conditional probability models are sought (seek). 04-28-2006

Boosting Discriminants Goal: Learning a good code. It requires finding good discriminants hk and the associated weights Assume that we are given a continuous feature space. For example, the pixel intensities in a localized m x m window around a given location in an input image lies in the continuous feature space R. We would like to find a discriminant in the feature space that satisfies some specific criteria. 04-28-2006

Finding Good Discriminants Criteria for good discriminants: focus on pairs of training images that have been difficult to classify so far (high ) pairs of training images from the same object class (i.e., yij = +1) should be put in the same partition induced by the discriminant, while pairs of training images from different object classes (i.e., yij = -1) should be put in different partitions the training images are partitioned into two well- separate groups, each of which is tightly clustered 04-28-2006

Discriminant Function Between-classes scatter Fisher Linear Discriminant Within-classes scatter Distance Function (a Kernel) Indicator variables Final Fisher Discriminant Function: 04-28-2006

Iterative Optimization Maximizing J keeping l fixed – solve for s in continuous interval [-1, +1] instead of binary values {-1, +1} Maximizing J keeping s fixed – return a value in [-1, +1] 04-28-2006

Pseudo-code for finding optimal discriminants Alternate between maximizing J w.r.t. s and l by solving for the corresponding eigenvector problems, until convergence. 04-28-2006

Illustration on a synthetic example in a continuous 2D feature space There are two training examples for every class (connected by a dashed line for each class). Both training examples in each class share the same indicator variable in the iteration. The algorithm converged to the optimal discriminant (approximately horizontal) in a few iterations, even though the Initialization was far from the optimal solution. Also, the final partition found (denoted by o and x) is consistent with what one would expect the optimal partition to be. Note that the variation within classes (approximately along the vertical direction) is more on average than variation across classes (mostly along the horizontal direction). Thus, if we had not specified the class membership of training examples through shared indicator variables, the optimal discriminant found would be almost orthogonal to the one shown in the figure since that would be the direction which maximizes the Fisher quotient. 04-28-2006

Choosing Threshold Finding the optimal threshold ө is a one dimensional problem along the discriminant hyperplane l Use a simple brute-force search The optimal value for ө is that which minimizes the total loss Determine ө as follows: sort the projections onto the optimal l of all the vi’s, find the total loss for each value of that are mid-points (for robustness at run-time) between successive sorted projections, and choose the that gives the minimum. The total loss changes only when ө crosses a vector vi projected onto l. 04-28-2006

Learning an Efficient Code Composing Discriminants Compose discriminants in a “tree” Tk to be more powerful in practice Partition function: Corresponding loss function: if Tk maps both images xi and xj to the same partition (i.e. same leaf node of Tk) if Tk maps both images xi and xj to the same partition (i.e. same leaf node of Tk) 04-28-2006

Composing Discriminants Composing simple discriminants into a tree of discriminants. Composing simple discriminants into a tree of discriminants. On the right is shown a tree T composed of two discriminants and on the left the partition induced on the image space X. Also shown is the path taken in the tree by an example image (x) and the corresponding partition that it belongs to. 04-28-2006

Optimizing Parameter Optimizing Smoothing in practice due to limited training data the optimal estimate can be large in value Introduce W+ is the total loss of all pairs of training examples that were correctly classified, while W- is the total loss of all incorrectly classified pairs by the kth discriminant. 04-28-2006

Overall Scheme 04-28-2006

Experimental Data FERET database Training Data Test Data Pairs of frontal images of 41 individuals Test Data Also pairs of frontal images of the same individuals but taken around a month apart from the training images with differences in hair, lighting and expressions. Faces are rigidly aligned 04-28-2006

Results Used the prominent regions around the eyes, nose and mouth as features 04-28-2006

Results Use eigenspace based method: Use presented method: Both training and testing data were projected onto the first 50 PCA components a search for the nearest training image for each testing image was performed The resulting recognition rate was 92.6%. Use presented method: After training, we classified a test image by finding the training image that was most correlated with the test image using the correlation function output. In other words, we found the nearest neighbor in code-space. The resulting recognition rate was 95.2%. 04-28-2006

Results Parameters to be set: Time: total number of discriminants ( set as twice the number of discriminants that gives a training error of 0) a regularization constant of γ= 1 was used to smooth the weights Time: The training time for our approach was around 6 hours, while the run-time was around 2 seconds. 04-28-2006

Conclusions and Future Work Presented an approach to learning good discriminators that can be thought of as that of learning good codes. Good discriminators are determined sequentially that focus on the currently hard to classify training images. Such discriminators are weighted and combined in an energy minimization scheme Can explore feature spaces in which distance measures can be non-linear by using more powerful non-linear kernels 04-28-2006

Thank you! 04-28-2006