Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Lecture 9 Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Detecting Faces in Images: A Survey
Linear Classifiers (perceptrons)

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Support Vector Machine
Support Vector Machines (and Kernel Methods in general)
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
x – independent variable (input)
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Reduced Support Vector Machine
Face Recognition with Harr Transforms and SVMs EE645 Final Project May 11, 2005 J Stautzenberger.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Learning and Vision: Discriminative Models
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
An Introduction to Support Vector Machines Martin Law.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
This week: overview on pattern recognition (related to machine learning)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Machine Learning Lecture 11 Summary G53MLE | Machine Learning | Dr Guoping Qiu1.
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Object Recognition in Images Slides originally created by Bernd Heisele.
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Handwritten digit recognition
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Histograms of Oriented Gradients for Human Detection(HOG)
Categorization by Learning and Combing Object Parts B. Heisele, T. Serre, M. Pontil, T. Vetter, T. Poggio. Presented by Manish Jethwa.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
9.913 Pattern Recognition for Vision Class9 - Object Detection and Recognition Bernd Heisele.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Face Detection 蔡宇軒.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
CS 9633 Machine Learning Support Vector Machines
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COMP61011 : Machine Learning Ensemble Models
An Introduction to Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Categorization by Learning and Combing Object Parts
Welcome to the Kernel-Club
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois at Urbana-Champaign Urbana, IL 61801

Overview Large margin classifiers have demonstrated success in visual learning Support Vector Machine (SVM) Sparse Network of Winnows (SNoW) Aim to present a theoretical account for their success and suitability in visual recognition Theoretical and empirical analysis of these two classifiers within the context of face detection Generalization error: expected error in test Efficiency: computational capability to represent features

Face Detection Goal: Identify and locate human faces in an image (usually gray scale) regardless of their position, scale, in plane rotation, orientation, pose and illumination The first step for any automatic face recognition system A very difficult problem! First aim to detect upright frontal faces with certain ability to detect faces with different pose, scale, and illumination See “Detecting Faces in Images: A Survey”, by M.-H. Yang, D. Kriegman, and N. Ahuja, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, Where are the faces, if any?

Large Margin Classifiers Based on linear decision surface (hyperplane) f: w T x + b = 0 Compute w and b from samples SNoW: based on Winnow with multiplicative update rule SVM: based on Perceptron with additive update rule Though SVM can be developed independently of the relation to perceptron, we view them as a large margin classifier for the sake of derivation of theoretical analysis

Sparse Network of Winnows (SNoW) Feature Vector Target nodes On line, mistake driven algorithm based on Winnow Attribute (feature) efficiency Allocations of nodes and links is data driven time complexity depends on number of active features Mechanisms for discarding irrelevant features Allows for combining task hierarchically

Winnow Update Rule Multiplicative weight update algorithm: Number of mistakes in training is O (k log n) where k is the number of relevant features of the concept and n is the number of features Tolerate a large number of features Mistake bound is logaritimic in number of features Advantageous when function space is sparse Robust in the presence of noisy features

Support Vector Machine (SVM) Can be viewed as a perceptron with maximum margin Based on statistical learning theory Extend to nonlinear SVM using kernel tricks Computational efficiency Expressive representation with nonlinear features Have demonstrated excellent empirical results in visual recognition tasks Training can be time consuming though fast algorithms have been developed

Generalization Error Bounds: SVM Theorem 1: If data is L 2 norm bounded as ||x|| 2  b, and the family of hyperplanes w such that ||w|| 2 <a, then for any margin  <0, with probability 1-  over n random samples, the misclassification error err(w) where k  = |{I: w T x i y i <  }| is the number of samples with margin less than 

Generalization Error Bounds: SNoW Theorem 2: If data is L  norm bounded as ||x||   b, and the family of hyperplanes w such that ||w|| 1 <a and  j ln( )  c, then for any margin  <0, with probability 1-  over n random samples, the misclassification error err(w) where k  = |{I: w T x i y i <  }| is the number of samples with margin less than 

Generalization Error Bounds In summary SVM: E a  ||w|| 2 2 max ||x i || 2 2 SNoW: E m  2 ln 2n||w|| 1 2 max ||x i ||  2 SNoW has lower generalization error if Data is L  norm bounded and there is a small L 1 norm hyperplane SVM has lower generalization error if Data is L 2 norm bounded and there is a small L 2 norm hyperplane SNoW performs better than SVM if the data has small L  norm but large L 2 norm

Efficiency Features in nonlinear SVMs are more expressive than linear features (and efficient as a result of kernel trick) Can use conjunctive features in SNoW as nonlinear features Represent the occurrence (conjunction) of intensity values of m pixels within a window by a new feature value

Experiments Training set: 6,  20 upright, frontal images: 2,429 faces and 4,548 nonfaces Appearance-based approach: Histogram equalized Convert each image to a vector of intensity values Test set: 24,045 images: 472 faces and 23,573 nonfaces

Empirical Results SNoW with local features performs better linear SVM SVM with 2 nd order polynomial performs better than SNoW with conjunctive features SNoW with local features SVM with linear features SVM with 2 nd poly kernel SNoW with conjunctive features

Discussion Studies have shown that the target hyperplane function in visual pattern recognition is usually sparse, i.e., the L 2 norm and L 1 of ||w|| are usually small Perceptron does not have any theoretical advantage over Winnow (or SNoW) In the experiments, L 2 is on average 10.2 times larger than L  Empirical results conform to theoretical analysis SNoW with local features SVM with linear features SNoW with local features SVM with linear features SVM with 2 nd poly kernel SNoW with conjunctive features

Conclusion Theoretical and empirical arguments suggest SNoW-based learning framework has important advantages for visual learning task SVMs have nice computational properties to represent nonlinear features as a result of kernel tricks Future work will focus on efficient methods (i.e., similar to kernel ticks) to represent nonlinear features for SNoW-based learning framework