Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of Sciences, China Jian SunMicrosoft Research Asia

1. Introduction 2. Overview of framework 3. Learning-based descriptor extraction 4. Pose-adaptive matching 5. Experimental results 6. Conclusion and discussion

 LBP, SIFT or HOG are effective descriptors using handcrafted encoding.  However, existing handcrafted encoding methods suffer two drawbacks:  Manually getting an optimal encoding method is difficult.  Handcrafted codes are usually unevenly distributed distribution of code emergence frequency in 1000 face images

 learning-based encoding method uses unsupervised learning methods to encode the local microstructures of the face into a set of discrete codes.  Apply PCA and proper normalization mechanism to improve the discriminative ability of the code histogram.  training a set of pose-specific classifiers (each for one specific pose combination) to make the final decision. (1000 face images)

“pose-adaptive face matching” pipeline “learning-based descriptor” pipeline

 Sampling and normalization  sample r*8 neighboring pixels at even intervals on the ring of radius r to form a low-level feature vector.  normalize the sampled feature vector into unit length. (1)R1 = 1, with center; (2)R1 = 1,R2 = 2, with center; (3)R1 = 3, no center; (4)R1 = 4,R2 = 7, no center.

 Learning-based encoding and histogram representation  three unsupervised learning methods: ▪ K-means ▪ PCA tree ▪ Random-projection tree  encoding method is applied to encode the normalized feature vector into discrete codes and then get local filter response codebook.

 After the encoding, the input image is turned into a “code” image.  Divide the encoded image into a grid of patches and compute a histogram of the LE codes for each patch. ▪ e.g. 5×7 patches for the holistic face (84×96)  Concatenate all patch histograms to form the descriptor of the whole face image.

 Select 1,000 images from the LFW training set  LE descriptors start to beat existing descriptors when the code number reaches 32.

 PCA dimension reduction  resulting face feature may be too large. ▪ e.g. 256 codes × 35 patch = 8,960  400 dimension  normalization is applied after the PCA compression improves the performance.  Multiple LE descriptors  Generally, training a linear SVM to combine the similarity scores generated by different LE descriptors can always achieve better result.

 choose 256 code and 400 PCA-dimension as our default setting  The recognition rate of PCA with L1 or L2 normalization version can be higher than non PCA and PCA only version.

 the combination of four LE descriptors obtained the best performance on the LFW.

 Component-level face alignment  Use 9 face components alignment to replace holistic alignment separately using similarity transform.  face similarity score is the sum of similarities between corresponding components.  more accurately align each component without balancing across the whole face and the negative effect of landmark error will also be reduced

 Pose-adaptive matching  each component contributes differently for the recognition when the pose combination of the matching pair is different. ▪ e.g. the right eye is less effective when we match a frontal face and a right-turned face  categorize the pose of the input face to one of three poses (frontal (F), left (L), and right (R)).  Select three gallery images from the Multi-PIE dataset and measure the similarity between the probe face and them.  pose label of the most alike gallery image is assigned to the probe face.

 pose combinations of a face pair could be {FF, LL, RR, LR (RL), LF (FL), RF (FR)}.  each by a subset of training pairs with a specific pose combination trained a linear SVM classifier by a subset of training pairs.  final pose-adaptive classifier consists of 6 linear SVM classifiers.  The “best-fit” classifier having the same pose combination with the input matching pair makes the final decision.

 Randomly sampling 3,000 intra-/extra-personal pairs from LFW for each pose combination. ▪ e.g. pair number is 3, 000 × 6 = 18, 000 Before: 76.20 ％ ±0.41 ％ After: 78.30 ％ ±0.42 ％

 Results on the LFW benchmark

 Results on the Multi-PIE  The default descriptors trained on the LFW benchmark are adopted in the experiments.  randomly generate 10 subsets of face images with Multi-PIE, each has 300 intra-personal and 300 extra-personal image pairs.

 face recognition using learning-based (LE) descriptor and pose-adaptive matching do well on the LFW benchmark.  excellent generalization ability on Multi-PIE.  Replace manually designed pattern sampling by automating may produce a more powerful descriptor for face recognition.

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Similar presentations

Presentation on theme: "Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Similar presentations

Presentation on theme: "Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of."— Presentation transcript:

Similar presentations

About project

Feedback