Face Detection - Advanced Topics in Information Processing Junhong Liu

Face Detection - Advanced Topics in Information Processing Junhong Liu
Agenda Introduction of Face Detection Detecting Faces in A Single Image Face Image Databases A Bayesian Discriminating Features Method Conclusions

Introduction of Face Detection
Definition: Given an arbitrary image, the goal of face detection is to determine whether or not there are any faces in the image and, if present, return the image location and extent of each face. Numerous methods have been proposed to detect faces in a single image of intensity or color images, Yang et al. wrote a survey of face detection methods before 2001 [YKA]. With an aim to introduce recent existing methods and applications to detect faces, my presentation collected techniques from: IEEE. Transactions on Pattern Analysis and Machine Intelligence, Journal of Pattern Recognition, Pattern Recognition letters, and Proc. Pattern Recognition, mainly after 2001. Challenges: face detection from a single image is a challenging task because of variability in scale, location, orientation, pose, etc. The challenges associated with face detection can be attributed to the following factors:

Introduction of Face Detection (Cont.)
(1) Pose. The images of a face vary due to the relative camera-face pose (frontal, 45o, profile, upside down), and some facial features such as an eye or the nose may become partially or wholly occluded. (2) Presence or absence of structural components. Facial features such as beards, mustaches, and glasses may or may not be present and there is a great deal of variability among these components including shape, color, and size. (3) Facial expression. The appearance of faces is directly affected by a person's facial expression. (4) Occlusion. Faces may be partially occluded by other objects. (5) Image orientation. Face images directly vary for different rotations about the camera's optical axis. (6) Imaging conditions. When the image is formed, factors such as lighting (spectra, source distribution, and intensity) and camera characteristics (sensor response, lenses) affect the appearance of a face.

Introduction of Face Detection (Cont.)
Closely related problems of face detection: Face localization aims to determine the image position of a single face. Facial feature detection is to detect the presence and location of features: eyes, nose, nostrils, eyebrow, mouth, lips, ears, etc. Face recognition or face identification compares an input image against a database, and reports a match, if any. Face authentication is to verify the claim of the identity of an individual in an input image. Face tracking methods estimate the location and possibly the orientation of a face in an image sequence in real time. Facial expression recognition concerns identifying the affective states (happy, sad, disgusted, etc) of humans.

Detecting Faces in A Single Image
Recent existing techniques to detect faces in a single image are classified into four categories: Knowledge-based methods. These rule-based methods encode human knowledge of what constitutes a typical face. The rules capture the relationships between facial features. These methods are mainly for face localization. Feature invariant approaches. These algorithms aim to find structural features that exist even when the pose, viewpoint, or lighting conditions vary, and then use these to locate faces. These methods are mainly for face localization. Template matching methods. Several standard patterns of a face are stored to describe the face as a whole or the facial features separately. The correlations between an input image and the stored patterns are computed for detection. These methods have been used for localization and detection. Appearance-based methods. Models (or templates) are learned from a set of training images that should capture the representative variability of facial appearance. These learned models are then used for detection. These methods are designed mainly for face detection.

Detecting Faces In A Single Image: Knowledge-Based Top-Down Methods (1)
In this approach, face detection methods are developed based on the rules derived from the researcher's knowledge of human faces. Simple rules to describe the features of a face and their relationships, e.g., a face often appears in an image with two eyes symmetric to each other, a nose, and a mouth. The relationships between features can be represented by their relative distances and positions. Facial features in an input image are extracted first; Face candidates are identified based on the coded rules; A verification process is usually applied to reduce false detections.

Knowledge-Based Top-Down Methods: a triangle-based approach (1.1)
Lin and Fan used a triangle-based approach to detect human faces [LF]. The system is composed of two primary parts. (1) The first part is to search for the potential face regions, consists of four steps: (1.1) Read in an image, convert this input image to a binary image; (1.2) Label all 4-connected components in the image to form blocks and find out the centre of each block; (1.3) Detect any 3 centres of 3 different blocks that form an isosceles triangle (frontal view) or a right triangle (side view); (1.4) Clip the blocks that satisfy the triangle criteria, “the combination of two eyes and a mouth” and “the combination of one eye, one ear hole, and one mouth”, as the potential face regions. (2) The second part is to perform the task of face verification for each potential face region. An efficient weighting mask function, which is applied to decide whether a potential face region contains a face, uses three steps:

Knowledge-Based Top-Down Methods: a triangle-based approach (1. 1 cont
(2.1) Normalize the size of all potential facial regions; (2.2) Feed every normalized potential facial region into the weighting mask function, and calculate the weight; (2.3) Perform the verification task by thresholding the weight obtained in the previous step. The triangle-based face detection approach knows how to locate multiple faces oriented in complicated background automatically. It can handle different sizes, dissimilar lighting conditions, varying pose and expression, and noise and defocus problems. In addition to coping with the problem of partial occlusion of mouth and sunglasses, the system can detect faces presented in side view. The triangle-based segmentation process can reduce the background part of a cluttered image up to 97%. This process significantly speeds up the subsequent face detection procedure because only 3-9% regions of the original image are left for further processing.

Detecting Faces in A Single Image: Bottom-Up Feature-Based Methods (2)
Humans can effortlessly detect faces and objects in different poses and lighting conditions, there must exist properties or features that are invariant over the variability. Researchers have proposed methods: (1) First detect facial features, and then infer the presence of a face. Facial features such as eyebrows, eyes, nose, mouth, and hairline are commonly extracted using edge detectors. (2) Based on the extracted features, a statistical model is built to describe their relationships and to verify the existence of a face. One problem with these methods is that the image features can be severely corrupted due to illumination, noise, and occlusion. Feature boundaries can be weakened for faces, while shadows can cause numerous strong edges that together render perceptual grouping algorithms useless.

Bottom-Up Feature-Based Methods: Facial Features (2.1)
Wong et al. proposed an approach based on the genetic algorithm and eigenface technique in a gray-level image [WLS]. (1) Possible eye candidates are obtained by detecting the valley points using morphological operators. (2) Based on a pair of eye candidates, possible face regions are generated by means of the genetic algorithm. Each of the possible face candidates is normalized by approximating the shirring angle due to head movement. (3) Transforming their histograms into the histogram of a reference face image reduces the lighting effect. (4) The fitness value of a face candidate is calculated by projecting it onto the eigenfaces. Measuring their symmetries and determining the existence of the different facial features then further verify selected face candidates. The advantage of this method is that a tilted human face can still be detected robustly even if the face is shirred, under shadow, of a different scale, under bad lighting conditions, and is wearing glasses.

Bottom-Up Feature-Based Methods: Feature-Texture (2.2)
Fan and Sung presented a hybrid method for varying pose face detection in color images [FS]. (1) At the first stage, a skin-color Gaussian model is used to identify possible varying pose regions. (2) At the second stage, each face candidate is compared with a varying pose face model using a combined feature-texture similarity measure (FTSM). Setting an appropriate FTSM threshold eliminates false detection from the first stage. The proposed method can achieve reliable face detection and feature registration under various conditions, including different poses, face appearances, and lighting conditions.

Bottom-Up Feature-Based Methods: Skin Color (2.3)
Yao and Gao investigated color's composing facts from color images, obtained the relationship between chrominance and color components, and established a set of methods of coordinate transformation which is able to improve chrominance of interesting objects [YG]. With these coordinates, a new method of human face detection and location based on skin chrominance and lip chrominance transformation from color images is presented. it is not influenced by the pose of objects and their complex background.

Bottom-Up Feature-Based Methods: Skin Color and Multiple Features (2
Hsu et al. proposed an algorithm for color images using a skin-tone color model and facial features, which contains two major modules: (1) Face localization for finding face candidates; (2) Facial feature detection for verifying detected face candidates [HAJ]. Estimates and corrects the color bias based on a lighting compensation technique; Nonlinearly transforms the corrected red, green, and blue color components in the YCbCr color space. The skin-tone pixels are detected using an elliptical skin model in the transformed space. The parametric ellipse corresponds to contours of constant Mahalanobis distance under the assumption of Gaussian distribution of skin tone color. The detected skin-tone pixels are iteratively segmented using local color variance into connected components that are then grouped into face candidates based on both the spatial arrangement of these components and the similarity of their color. The size of a face candidate can range from 13*13 pixels to about three fourths of the input image size. The facial feature detection module rejects face candidate regions that do not contain any facial features such as eyes, mouth, and face boundary.

Skin Color and Multiple Features (2.4 cont.)
Hsieh et al. presented an algorithm combining the color and shape properties of face [HFL]. Steps are: (1) The color classification algorithm is applied to classify the skin and non-skin pixels in a nature scenery images. (2) A clustering-based splitting algorithm is used to separate the real human faces from other skin color regions. (3) An elliptical face model is designed to test the shape of a real face. (4) Finally, a statistic-based verification procedure is utilized to confirm the candidate faces. This algorithm can tolerate the wider lighting conditions. It is still difficult to segment the faces occluded by objects. So, the faces to be tested should not be occluded by objects, only frontal faces without larger turnaround (about 45o ) and tilt (about 12o) are allowed, the intensity of face must be larger than a threshold, and the color of the background connecting a face should be a non-skin or a skin color with perceptible difference.

Skin Color and Multiple Features (2.4 cont.)
Araki et al. proposed an algorithm to detect faces in the complex backgrounds where face position, size, and pose are arbitrary [ASS]. (1) Face features as candidates of six face components (both brows, both eyes, nostrils and a mouth) are extracted; their likelihoods as each face component are calculated. Faces are detected by using geometrical relations. The number of combinations of all labels of all features is reduced using Hough transform. Wang and Yuan proposed an approach to the detection, segmentation and localization of human faces in color images under complex background [WY]. (1) A number of evolutionary agents are uniformly distributed in the 2-D image environment to detect the skin-like pixels and segment each face-like region by activating their evolutionary behaviors. Both normalized RGB and HSV are used. (2) A face is represented using at least four-feature lines group. (3) Wavelet decomposition is applied to each region to detect the possible facial features; A three-layer BP neural network is used to detect the eyes among the features. This method may fail to detect a face or false detect a non-face because of illumination sensitive.

Detecting Faces in A Single Image: Template Matching (3)
In template matching, a standard face pattern (usually frontal) is manually predefined or parameterised by a function. Given an input image, the correlation values with the standard patterns are computed for the face contour, eyes, nose, and mouth independently. The existence of a face is determined based on the correlation values. This approach has the advantage of being simple to implement. However, it has proven to be inadequate for face detection since it cannot effectively deal with variation in scale, pose, and shape. Multiresolution, multiscale, subtemplates, and deformable templates have subsequently been proposed to achieve scale and shape invariance.

Template Matching: Deformable Templates by Perlibakas (3.1)
It is based on mathematical morphology and the variational calculus for the detection of a face contour in still grayscale images [P]. The facial features (eyes and lips) are detected by using the mathematical morphology and the heuristic rules. Using these features an image is filtered and an edge map is prepared. The face contour is detected using an active contour model (a variational snake) by minimizing its internal and external energy. The internal energy is defined by the contour tension and the rigidity. The external energy is defined by using the generalized gradient vector flow field of the image edge map with discarded face features. Initial contour is calculated using the detected face features. The contour detection experiments were performed using the database of 427 face images. Automatically detected contours were compared with manually labelled contours using an area and the Euclidean distance-base error measures. Image and edge map filtering with respect to the detected face features allow reducing the number of edges that could cause contour detection errors.

Deformable Templates (3.2 cont.) by Wang and Tan
A method based on shape information [WT]: Histogram equalization is used to enhance the image. Edge detection based on a multiple-scale filter follows. The extracted edges are then linked using a method based on an energy function. The face contour is finally extracted using the direction information of the linked edges by a deformable template (an elliptical ring). Results show good performance in detecting all faces in images with simple background. But the template does not include enough information to distinguish faces in very complex background.

Detecting Faces in A Single Image: Appearance-Based Methods (4)
In general, the appearance-based methods rely on techniques from statistical analysis and machine learning to find the relevant characteristics of face and nonface images. The learnt characteristics are in the form of distribution function models or discriminant functions that are constantly used for face detection. Dimensionality reduction is usually carried out for the sake of computation efficiency and detection efficiency. Templates are used but learnt from examples of images.

Appearance-Based Methods (4 cont.)
Many appearance-based methods can be understood in a probabilistic framework: An image or feature vector derived from an image is viewed as a random variable x, and this random variable is characterized for faces and nonfaces by the class-conditional density functions p(x|face) and p(x|nonface). Bayesian classification or maximum likelihood can be used to classify a candidate image location as face or nonface. A straightforward implementation of Bayesian classification is infeasible because of the high dimensionality of x. Because p(x|face) and p(x|nonface) are multimodal, and it is not yet understood if there are natural parameterized forms. Hence, much of the work in an appearance-based method concerns empirically validated parametric and nonparametric approximations to p(x|face) and p(x|nonface).

Appearance-Based Methods (4 cont.)
Another approach is to find a discriminant function (i.e., decision surface, separating hyperplane, threshold function) between face and nonface classes. Conventionally, image patterns are projected to a lower dimensional space and then a discriminant function is formed (usually based on distance metrics) for classification, or a nonlinear decision surface can be formed using multilayer neural networks. Support vector machines and other kernel methods have been proposed, which implicitly project patterns to a higher dimensional space and then form a decision surface between the projected face and nonface patterns.

Appearance-Based Methods: Neural networks (4.1)
Face detection can be treated as a two-class pattern recognition problem, various neural network architectures have been proposed. The advantage of using neural networks for face detection is the feasibility of training a system to capture the complex class conditional density of face patterns. One drawback is that the network architecture has to be extensively tuned (number of layers, number of nodes, learning rates, etc) to get exceptional performance. Garcia and Delakis proposed a convolutional neural network architecture to recognize strongly variable face patterns directly from pixel images with no preprocessing [GD].

Appearance-Based Methods: Support Vector Machines (4.2)
Support Vector Machines (SVMs) can be considered as a new paradigm to train polynomial function, neural networks (NN) or radial basis function (RBF) classifiers. While most methods for training a classifier (e.g., Bayesian, NN, and RBF) are based on of minimizing the training error, i.e., empirical risk, SVMs operate on another induction principle, called structural risk minimization that aims to minimize an upper bound on the expected generalization error. An SVM classifier is a linear classifier where the separating hyperplane is chosen to minimize the expected classification error of the unseen patterns. This optimal hyperplane is defined by a weighted combination of small subset of the training vectors, called support vectors. Ma and Ding [MD], and Sahbi and Boujemaa [SB] proposed methods based on hierarchical SVMs respectively. Ai et al. presented a subspace approach with SVMs [AYX]. Xi and Lee developed a coordinate system and several SVMs to detect faces and extract facial features [XL].

Appearance-Based Methods: Ranklets (4.3) (¤¤¤)
Smeraldi introduced a family of multiscale, orientation-selective, non-parametric features (“ranklets”) modeled on Haar wavelets [S]. He clarified their relation to the Wilcoxon rank-sum test and the rank transform and provided a scheme for computation based on the Mann-Whitney statistics. It was claimed that these ranklets outperform other rank features, Haar wavelets, SNoW and linear SVMs in face detection experiments over the 24,045 test images in the MIT-CBCL database.

Appearance-Based Methods: Distribution-Based Methods (4.4)
Pattern detection problems require a separation between two classes, Target and Clutter, where the probability of the former is substantially smaller compared to that of the latter. Elad et al. proposed the maximal rejection classifier (MRC) based on linear successive rejection operations [EHK]. It requires that the Clutter class and the convex hull of the Target class are disjoint. It is an iterative rejection based classification algorithm. The idea is to apply at each iteration a linear projection followed by a thresholding, the projection vector and the corresponding thresholds are chosen such that at each iteration the MRC attempts to maximize the number of rejected Clutter samples. The process continues with the remaining Clutter samples, again searching for a linear projection vector and thresholds that maximizes the rejection of Clutter points from the remaining set. The process is repeated iteratively until a small number or none of the Clutter points remain. The remaining samples at the final stage are considered as Targets. False alarms may exist due to the suboptimal approach that neglects multi-dimensional moments higher than the second.

Appearance-Based Methods: Region-Based Face Detection by Ayinde and Yang (4.5)
It uses few positive and negative training images and a predefined model for the face pattern, and can handle multiple faces [AY]. The initial size of the window used to scan the image is N*N pixels. The system scans for faces at one-pixel increments horizontally and vertically. Whenever a face pattern is detected, the size of the current search window and the location are noted. The size of the window is gradually enlarged to scan the image at different scales until the size of the window has reached the maximum size. For every enlargement step, the size is increased by a factor horizontally and vertically. Each window that is to be analysed for the presence or absence of a face pattern has to be normalized at three steps: Resizing based on linear interpolation from the original (M*M)-pixel image to an (N*N)-pixel image; Histogram equalization compensating for changes in illumination brightness and differences in camera input gains; Masking with a binary circular mask to eliminate the pixels that are most likely affected by background information.

Appearance-Based Methods: Region-Based Face Detection (4.5 cont.)
The positive training set includes upright faces and slightly tilted faces that will capture the shapes and patterns of commonly found faces. The negative training examples are obtained from some of the false detection results produced by the system during some preliminary runs. 29 positive training and 124 negative training examples are used. Correlation values between the search window pattern and training samples are calculated. Symmetry measure is used for face classification. The pattern defined for the face makes use of an assumption that the upper part of the window corresponding to eyes and eyebrows is darker than the region corresponding to the cheeks. The merging process is designed to average windows that do not have significant displacements with respect to one another. Different sets of windows can be merged to give overlapping windows in the final output. The elimination process is aimed at discarding windows that can distort the detection result if averaged with other overlapping windows. The main problem with this approach is the processing time, because of the sliding windows.

Appearance-Based Methods: Bayes Classifier (4.6)
Chengjun Liu presented a Bayesian Discriminating Features (BDF) method for multiple frontal face detection [L]. The BDF method, trained on images from only one database, works on test images from diverse sources, displays robust generalization performance. It integrates the discriminating feature analysis of the input image, the statistical modeling of face and nonface classes, and the Bayes classifier for multiple frontal face detection: (1) Feature analysis derives a discriminating feature vector by combining the input image, its 1D Harr wavelet representation, and its amplitude projections. (2) Statistical modeling estimates the conditional probability density functions (PDFs) of the face and nonface classes. The face class is modeled as a multivariate normal distribution. The nonface class includes “the rest of the world”. a subset of the nonfaces that lies closest to the face class is derived and then modeled as a multivariate normal distribution. (3) The Bayes classifier applies the estimated conditional PDFs to detect multiple frontal faces in an image. Experimental results using 887 images (containing a total of 1,034 faces) from diverse image sources show the feasibility of the BDF method. In particular, the BDF method achieves 98.5 percent face detection accuracy with one false detection.

Appearance-Based Methods: Information-Theoretical Approach (4.7)
The spatial property of face pattern can be modeled through different aspects. The contextual constraint, among others, is a powerful one and has often been applied to texture segmentation. The contextual constraints in a face pattern are usually specified by a small neighbourhood of pixels. Markov random field (MRF) theory provides a convenient and consistent way to model context-dependent entities such as image pixels and correlated features. This is achieved by characterizing mutual influences among such entities using conditional MRF distributions. Dass et al. proposed MRFs as viable stochastic models for the spatial distribution of gray levels for images of human faces by the first- and second-order neighbourhood systems [DJL]. These models are trained using database for face and nonface images, then used for detecting human faces. Pham et al. presented aggregated Bayesian network classifiers for face detection by using forest-structured Bayesian networks, which maximize the Kullback-Leiber divergence between the class conditional probability distribution functions of face and nonface classes [PWS].

Face Image Databases Most face detection methods require a training data set of face images. The face image databases used by the referenced papers are: (1) BioID 2003 [BI] (2) the Champion Database [CD] (3) the Database of Faces [DF] (4) MIT-CBCL [MC] (5) PICS 2003 [PIC] (6) the FERET database [PWHR] (7) Yahoo News Photos [YNP]

A Bayesian Discriminating Features Method (BDF) – one of the appearance-based methods developed by Chengjun Liu Discriminating Feature Analysis Statistical Modeling of Face and Nonface Classes The Bayesian Classifier for Multiple Frontal Face Detection Experiments

Discriminating Feature Analysis
(a) (b) Fig. 1 Face and natural images. (a) Some examples of the training faces that have been normalized to the standard resolution, 16*16; (b) An example natural image.

Discriminating Feature Analysis (Cont.)
The discriminating feature analysis derives a feature vector with enhanced discriminating power for face detection, by combining the input image, its 1D Harr wavelet representation and its amplitude projections. (a) (b) Fig. 2 Discriminating feature analysis of the mean face and the mean nonface: (a) The first image is the mean face, the second and the third images are its 1D Harr wavelet representation, and the last two bar graphs are its amplitude projections. (b) The mean nonface, its 1D Harr wavelet representation, and its amplitude projections.

Let I(i,j)m*n represent an input image, and Xmn be the vector formed by concatenating the rows (or columns) of I(i,j). The 1D Harr representation of I(i,j) yields two images, Ih(i,j)(m-1)*n and Iv(i,j)m*(n-1), corresponding to the horizontal and vertical difference images, respectively. Ih(i,j) = I(i+1,j) - I(i,j), 1≤ i < m, 1 ≤ j ≤ n (1) Iv(i,j) = I(i,j+1) - I(i,j), 1 ≤ i ≤ m, 1 ≤ j < n. (2) Let Xh(m-1)n and Xvm(n-1) be the vectors formed by concatenating the rows (or columns) of Ih(i,j) and Iv(i,j). The amplitude projections of I(i,j) along its rows and columns form the horizontal (row) and vertical (column) projections, Xrm and Xcn respectively.

The vectors X, Xh, Xv, Xr, and Xc are normalized by subtracting the means of their components and dividing by their standard deviations, respectively, to get the normalized vectors: A new feature vector Ỹ is defined as the concatenation of the normalized vectors: where t is the transpose operator and N=3mn is the dimensionality of the feature vector Ỹ.

The normalized vector Y of Ỹ defines the discriminating feature vector, Y  N, which is the feature vector for the multiple frontal face detection system, and which combines the input image, its 1D Harr wavelet representation, and its amplitude projections for enhanced discriminating power: where μ and σ are the mean and the standard deviation of Ỹ.

Statistical Modeling of Face and Nonface Classes
The main objective of statistical modeling of face and nonface classes is to estimate the conditional probability density functions (PDFs) of these two classes. The face class contains only faces; the nonface class encompasses all the other objects. The BDF method derives a subset of nonfaces that lies closest to the face class. The BDF method models faces and this particular subset of nonfaces as multivariate normal distributions respectively.

Statistical Modeling of Face and Nonface Classes: Face Class Modeling
The conditional density function of the face class, f , is modeled as a multivariate normal distribution: where Mf  N and ∑f  N*N are the mean and the covariance matrix of face class. Take the natural logarithm on both sides, we have

Face Class Modeling (Cont.)
The covariance matrix, ∑f N*N, can be factorized into the following form using the principle component analysis (PCA): Where f N*N: an orthogonal eigenvector matrix, ΛfN*N: a diagonal eigenvalue matrix with diagonal elements (eigenvalues) in decreasing order (λ1≥λ2≥…≥λN), IN N*N : an identity matrix. An important property of PCA is its optimal signal reconstruction in the sense of minimum mean-square error when only a subset of principle components is used to represent the original signal.

The principle components are defined by the following vector, ZN: It then follows from (8), (9), and (10) that Applying the optimal signal reconstruction property of PCA, we use only the first M (M≤N) principal components to estimate the conditional density function. A model by Moghaddam and Pentland estimates the remaining N-M eigenvalues, (λM+1, λM+2,…, λN), by the average of those values:

It then follows from (11) and (12) that (¤¤¤) where ||·|| denotes the norm operator, and zis are the components of Z defined by (10). Eq. (13) states that the conditional density function of face class can be estimated using the first M principle components, the input image, the mean face, and the eigenvalues of the face class.

Statistical Modeling of Face and Nonface Classes: Nonface Class Modeling
The nonface class modeling starts with the generation of nonface samples by applying (13) to natural images that do not contain any human faces at all. Those subimages of the natural scene that lie closest to the face class are chosen as training samples for the estimation of the conditional density function of the nonface class, n, which is also modeled as a multivariate normal distribution: where MnN and ∑nN*N are the mean and the covariance matrix of nonface class.

Nonface Class Modeling (Cont.)
Factorize the covariance matrix, ∑n, using PCA: where nN*N is an orthogonal eigenvector matrix, ΛnN*N a diagonal eigenvalue matrix with diagonal elements in decreasing order, and INN*N an identity matrix. The principle components are defined by the following vector, UN: Estimate the remaining N-M eigenvalues, , by the average of those values:

Nonface Class Modeling (Cont.)
The conditional density function of the nonface class can be estimated as follows: where uis are the components of U defined by (16). Eq. (18) states that the conditional density function of nonface class can be estimated using the first M principle components, the input image, the mean nonface, and the eigenvalues of the nonface class.

The Bayesian Classifier for Multiple Frontal Face Detection
Let YN be the discriminating feature vector constructed from an input pattern, i.e., a subimage of some test image. Let the a posteriori probabilities of face class and nonface class given Y be P(f|Y) and P(n|Y). The pattern is classified to the face class or the nonface class according to the Bayes decision rule for minimum error:

The Bayesian Classifier for Multiple Frontal Face Detection (Cont.)
The a posteriori probabilities, P(f|Y) and P(n|Y), can be computed from the conditional PDFs using the Bayes theorem: where P(f) and P(n) are the a priori probabilities of face class and nonface class, and p(Y) is the mixture density function.

From (13), (18), and (20), the Bayes decision rule for face detection is then defined as follows: where δf , δn, τ are as follows:

δf and δn can be calculated from the input pattern Y, the face class parameters (the mean face, the first M eigenvectors, and the eigenvalues), and the nonface class parameters (the mean nonface, the first M eigenvectors, and the eigenvalues). τ is a constant, which functions as a control parameter — the larger the value is the fewer the false detections are (¤¤¤). To further control the false detection rate, the BDF method introduces another control parameter, θ, to the face detection system, such that The control parameters, τ and θ , are empirically chosen for the face detection system.

Experiments: Data The training data for the BDF method consist of 600 FERET frontal face images from Batch 15 [PWHR] and nine natural images. The face class thus contains 1,200 face samples for training after including the mirror images of the FERET data. The nonface class consists of 4,500 nonface samples, which are generated by choosing the subimages that lie closest to the face class from the nine natural images. The BDF method is applied to detect frontal faces from three testing data sets: SET1, SET2, and SET3. SET1, consisting of all frontal face images of Batches 12, 13, and 14 from the FERET database, contains mainly head or head and shoulder pictures.

Data (Cont.) SET2, consisting of all frontal face images from the FERET Batch 2, contains upper body pictures and faces with glasses even having bright reflections. SET1 and SET2 consist of 511 and 296 images, respectively. Each image contains only one face (Fig. 3). SET3 is created from the MIT-CMU test sets [RBK] that contain frontal faces from diverse sources (Fig. 4): the World Wide Web, photographs and newspaper pictures, broadcast television. It consists of 80 images of total 227 faces, containing many different sized faces, rotated faces, very large faces, very small faces, low quality face images, partially occluded faces, or slightly pose-angled faces.

Data (Cont.) (a) (b) Fig. 3(a,b) Face detection examples. A square indicates a face region successfully detected. The resolution of the images is 256*384, and the faces are detected at different scales. (a) From SET1. (b) From SET2: Some images contain faces with glasses having bright reflections.

Data (Cont.) Fig. 4(a - g). Face detection samples from SET3. (a) Multiple frontal faces; (b) Multiple frontal faces with rotations; (c) Large frontal face; (d) Small frontal face; (e) Face in low quality image; (f) Partially occluded face; (g) Slightly pose-angled face.

Experiments: Statistical Learning of the BDF Method
Learning the face class parameters: The statistical modeling of the face and the nonface classes requires the estimation of the parameters of these two classes from the training images. They are calculated as follows: (1) Normalize the 600 FERET images to a spatial resolution of 16*16 based on the fixed eye locations and interocular distance (Fig. 1a). (2) Add the mirror images of the 600 FERET faces to the face training set and increase the number of training samples to 1,200. (3) Derive the discriminating feature vectors. (4) Derive the face class parameters: the mean face, the face class eigenvectors, eigenvalues, and M. A good choice of M is to balance the face detection performance and the computational complexity. M=10, empirically.

Statistical Learning of BDF (Cont.)
Learning the nonface class parameters. It starts with the generation of nonface samples from the nine natural images (Fig. 1b). The nonface images, chosen from the subimages of these nine natural images, have the standard spatial resolution of 16*16 and lie closest to the face class. 4,500 nonface images are generated from the nine natural images (Fig. 2b). After the generation of the nonface samples, the nonface class parameters can be calculated in the same way as the face class parameters are computed. Setting the two control parameters τ and θ.To control the false detection rate, these two control parameters are empirically chosen and are set as τ = 300 and θ = 500.

Experiments: Testing Performance of BDF
The BDF method successfully detects 507 faces from the 511 images in SET1 without any false detection, 290 faces from the 296 faces in SET2 with no false detection (Fig. 3). SET3 is used to test the generalization performance of the BDF method. Fig. 4 shows part of the detection results. In Fig. 4a, three faces are successfully detected at the scales 20 and 26, respectively. The scale 20 means that the original image is resized by a ratio, 16/20. Note that one face with a large pose is not detected, since the BDF method is trained to detect multiple frontal faces. The BDF method, trained only on the upright frontal faces, can also detect rotated faces by means of rotating the test images to a number of predefined degrees, such as ±5o, ± 10o, ± 15o, and ± 20o. In Fig. 4b, two scales (30, 38) and one rotation -20o are required.

Testing Performance of BDF (Cont.)
The BDF method is also tested on images that contain very large or very small faces. Figs. 4c-d show the real face detection performance on these test images. The generalization performance of the BDF method is further tested using low quality face images, partially occluded faces, and slightly pose-angled faces (Figs. 4e-g). The successful performance shows the robustness of the BDF method in real face detection. In SET3, there are six faces that are not detected by the BDF method: three pose-angled faces, a baby face, a masked face, and one in a low quality image. Fig. 5 shows some examples of missed faces and false detection: A low resolution face in Fig. 5a and a slightly pose-angled face in Fig. 5b. Also a false detection occurs in Fig. 5b.

(a) (b) Fig.5 Examples of missed faces and false detection. A face in (a) and a slightly pose-angled face in (b). A false detection occurs in (b).

The experimental results using 80 test images (containing in total 227 faces) from MIT-CMU test sets show that the BDF method detects 221 out of the 227 faces in these images with one false detection. The following table summarizes the detection performance of the BDF method for the testing data sets: SET1, SET2, and SET3. The overall face detection performance of the BDF method using 887 images containing a total of 1,034 faces is 98.5 percent correct face detection rate with one false detection. data sources images faces detected false detection SET1 FERET Batches 12, 13, and 14 511 507 FERET 2 296 290 MIT-CMU Test Sets 80 227 221 1 Total — 887 1,034 1,018

Conclusions I attempt to provide a survey of research on face detection after 2001, by classifying methods over the about 20 papers from several sources into four main categories. However, some methods can be classified into more than one category. For example, template matching methods usually use a face model and subtemplates to extract facial features [P], and then use these features to locate or detect faces. Furthermore, the boundary between knowledge-based methods and some template matching methods is blurry since the latter usually implicitly applies human knowledge to define the face templates [P]. I have detailed one method while reporting the performance of the other relative methods. But there is a lack of uniformity in how methods are evaluated and, it is imprudent to explicitly declare which methods indeed have the lowest error rates. Although significant progress has been made in the past, there is still work to be done, and we believe that a robust face detection system should be effective under full variation in: (1) lighting conditions; (2) orientation, pose, and partial occlusion; (3) facial expression; (4) presence of structural components, facial hair, and a variety of hair style.

Face Detection - Advanced Topics in Information Processing Junhong Liu

Similar presentations

Presentation on theme: "Face Detection - Advanced Topics in Information Processing Junhong Liu"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Face Detection - Advanced Topics in Information Processing Junhong Liu

Similar presentations

Presentation on theme: "Face Detection - Advanced Topics in Information Processing Junhong Liu"— Presentation transcript:

Similar presentations

About project

Feedback