Presentation on theme: "Oral Defense by Sunny Tang 15 Aug 2003"— Presentation transcript:
1 Oral Defense by Sunny Tang 15 Aug 2003 Face Recognition Committee Machine: Methodology, Experiments and A System ApplicationGood afternoon, Prof. Lyu and Prof. Sun. Thanks for your coming. My name is Sunny Tang. Today I am going to present my thesis titled “Face Recognition Committee Machine: Methodology Experiments and A System Application”Oral Defense by Sunny Tang15 Aug 2003
2 Outline Introduction Face Recognition Committee Machine Problems and ObjectivesFace Recognition Committee MachineCommittee MembersResult, Confidence and WeightStatic and Dynamic StructureHere is the presentation outline, I will first give a brief introduction of face recognition, and point out the problems of current face recognition algorithms and the objective of this thesis. Then I will introduce the face recognition committee machine, which is a committee machine for different face recognition algorithms. We explain the committee members, the elements in the committee machine: Result, Confidence and Weight. Then, we introduce the two structures: Static and Dynamic.
3 Outline Face Recognition System Experimental Results Conclusion Q & A System ArchitectureFace Recognition ProcessDistributed ArchitectureExperimental ResultsConclusionQ & AAfter that, we will talk about our implementation, a face recognition system, which is able to provide real-time face recognition. We will give a introduction of the overall system architecture, and explain the whole face recognition process and finally we will point out some problems of the system and propose a distributed architecture for the system. The distributed architecture is a feasible solution to enable face recognition application on mobile device now. Finally, we will show the experimental results on four face databases, and a conclusion is given.
4 Introduction: Face Recognition DefinitionA recognition process that analyzes facial characteristicsTwo modes of recognitionIdentification: “Who is this”Verification: “Is this person who she/he claim to be?”Briefly speaking, face recognition is a recognition process that analyzes our facial characteristics. There exists two modes of recognition, the first mode is identification, which is answering the question “Who is this”, the other mode of recognition is verification, just like answering “Is this person who she or he claim to be?”
5 Face Recognition Applications SecurityAccess control systemLaw enforcementMultimedia databaseVideo indexingHuman search engineFace recognition can be applied in wide area. The main application is security such as the access control system to keep our place away from intruder. Or we can have a law enforcement application to check the criminal’s identity. Besides security, face recognition can also be applied in some multimedia database such as video indexing to index huge amount of video according to the person insides, or we can search for a particular person’s information by providing the face or face image to the search engine.
6 Problems & Objectives Current problems of existing algorithms No objective comparisonAccuracy not satisfactoryCannot handle all kinds of variationsObjectivesProvide thorough and objectively comparisonPropose a framework to integrate different algorithms for better performanceImplement a real-time face recognition systemAfter discussing the potential applications of face recognition, we would like to stress the current problems. There are many face recognition algorithms proposed by researchers before. Most algorithms are claimed to have high accuracy. However there is still no comparison for the algorithms now. Even some algorithms are tested on certain well-known face database, due to different pre-processing of the database images, they are still lack of an objective comparison. Besides, as I said before, face recognition is still not so accurate as fingerprints. And finally, most face recognition can only recognize a face image in general situations. If there is some variations in face like strong lighting or large variation in expression, the algorithms may not have good performance. Currently, there is some algorithms which can handle a specific situation, but this kinds of algorithms perform poorly on other cases. Therefore, the main objectives for this thesis is to first study several face recognition algorithms and implement them for an objective and thorough comparison. Besides, we propose a framework called Face Recognition Committee Machine to integrate different algorithms for better performance. And finally, we implement a real-time face recognition system for different applications
7 Face Recognition Committee Machine (FRCM) MotivationAchieve better accuracy by combining predictions of different expertsTwo structures of FRCMStatic structure (SFRCM)Dynamic structure (DFRCM)Face recognition committee machine is a framework to integrate different face recognition algorithms. It combines predictions of different experts in the committee to achieve better accuracy. We propose two structure of FRCM, one is static structure and the other is dynamic structure.
8 Static vs. Dynamic Static structure Dynamic structure Ignore input signalsFixed weightsDynamic structureEmploy input signal to improve the classifiersVariable weightsThe main difference for both structure is the input image and the weight. For static structure, input signals are ignored in the determination of weight. And the weights are fixed. For dynamic structure, input signal are involved in a gating network to improve the overall result and the weight are varied.
9 Committee Members Template matching approach Machine learning approach EigenfaceFisherfaceElastic Graph Matching (EGM)Machine learning approachSupport Vector Machines (SVM)Neural Networks (NN)In the FRCM, we include five heterogeneous face recognition algorithms as committee members. The first three approaches, Eigenface, Fishferface and Elastic Graph Matching are template matching approach, i.e., a set of training images is stored as templates, for recognition, the test image is compared to each of the template to get the result. The last two approaches Support Vector Machine and Neural Networks are machine learning approaches. For these two approaches, test image is not compared to each template image, instead, the model will report its result.
10 Review: Eigenface & Fisherface Feature spaceEigenface: Principal Component Analysis (PCA)Fisherface: Fisher’s Linear Discriminant (FLD)Training & RecognitionProject images on feature spaceCompare Euclidean distance and choose the closest projectionWe will give a brief review of the five members now. Eigenface and Fisherface are similar approaches, both create a feature space to project images for comparison. Eigenface uses Principal Component Analysis (PCA) for better compression of the images while the Fisherface use Fisher’s Linear Discriminant (FLD) for better representation of images in terms of classification. For training and recognition, both methods project the images on the feature space found, the projected images are then compared by their Euclidean distance and the choose the closest projection.
11 Review: Elastic Graph Matching Based on dynamic link architectureExtract facial feature by Gabor wavelet transformFace is represented by a graph consists of nodes of jetsCompare graphs by cost functionEdge similarity Se and vertex similarity SvCost functionElastic Graph Matching is an approach that makes use of dynamic link architecture. It extracts some facial feature such as our features on eye, nose and month by Gabor wavelet transformation. A face is represented by a graph consists of nodes of jets which the edge distance and vertex information are stored in templates. To compare face images, a cost function is defined in terms of edge similarity and vertex similarity as shown in this equation. The minimum cost is selected as the recognized image.
12 Review: SVM & Neural Networks Look for a separating hyperplane which separates the data with the largest marginNeural NetworksAdjust neuron weights to minimize prediction error between the target and outputSVM and neural networks are machine learning methods that SVM look for a separating hyperplane which separates the data with the largest margin. The images are then classified by checking which side the images lies on. For neural networks, it works by adjusting the neuron weights to minimize the prediction error between the target and output
13 Result, Confidence & Weight Result of expertConfidenceConfidence of expert on its resultWeightWeight of expert’s result in ensembleFor the committee machine, we define three main elements to ensemble experts’ results. The first one is result, which means the result of an expert, the second one is confidence, which is a measure of the confidence of the expert on its result. The last one is weight, which is the weight of the expert’s result in the ensemble.
14 SFRCM ArchitectureLet’s take a look on the Static structure face recognition committee machine, we called SFRCM. As the diagram shown, input image is given to each of the expert separately, and the expert calculate their own result and confidence for that image, which are then given to the voting machine for ensemble. In the voting machine, weights are given to each expert result in the ensemble and a final recognized class ( the ensemble result) is reported.
15 Result & Confidence (1) Eigenface, Fisherface & EGM Result: Identification:Verification:Confidence:As the committee members are heterogeneous in nature, we have to define different ways to find the result and confidence of the experts accordingly. For template matching approaches, Eigenface, Fisherface and EGM, we use K-nearest neighbor classifier in identification, five nearest neighbor in the templates to the test image are selected and a vote is given to the class of that neighbor. The class with highest vote v are selected as the identification result, and the confidence is defined as the number of vote of the result class divided by K (i.e. 5) in our implementation. For verification, each of the claimed identity’s template image is compared to the test image, if the Euclidean distance or similarity is within a threshold, the template is regarded as authorized. If the total number of authorized templates is greater than half of the template images, the expert authorized that test image, and rejects it otherwise. The confidence of verification is defined as the number of template passed the threshold divided by the total number of images.
16 Result & Confidence (2) SVM One-against-one approach Result: Identification: SVM resultVerification: direct matchingConfidence:For SVM, as this approach is originally designed for two class classification, we use one-against-one approach. jC2 SVMs are constructed, the maximum number of vote class is selected as the SVM result and the confidence is defined as the number of vote for the class divided by J-1. For verification, we compared the SVM classification result to the identity, if the result matches the identity, the SVM authorized the identity and rejects it otherwise. The confidence is the same as identification.
17 Result & Confidence (3) Neural network A binary vector of size J for target representationResult:Identification:Verification:Confidence: output value ojFor neural network, we construct a binary vector of size J for target representation, as shown in the right diagram, the index of the target class is set to 1 and 0 for the others. For identification, the class with maximum output value is selected as the identification result. For verification, if the output value of the claimed class is greater than 0.5, the neural networks authorized the identity and rejects it otherwise. The confidence of the neural networks is the output value of the resulted class.
18 Weight Derived from performance of expert: Amplify the difference of the performance:Normalize in range [0, 1]:We now define the weight of the experts, which is derived from the performance of experts, the number of correct recognition divided by the total number of test images. However, as the performance of the experts may only differ by several percents, say 5-10%, if we simply take the performance as weight, the difference is not significant. Therefore, we amplify the difference by applying a scaling factor alpha and an exponential function. The alpha value scales the performance such that it is greater than one, which is suitable for the exponential function to maximize the difference. Finally, the weights are normalized to the range between 0 and 1.
19 Voting Machine Assemble result and confidence Score of expert’s result:Ensemble result:The result and confidence are assembled in the voting machine. The score of result class of each experts is calculated by this equation, which is the weight of the expert multiplied by its confidence . The final ensemble result, is the class with maximum score.
20 SFRCM Drawbacks Fixed weights under all situations No update mechanism The weights of the experts are fixed no matter which images are given.No update mechanismThe weights cannot be updated once the system is trainedIn this static committee machine mentioned, there are two drawbacks. The first one is the weights are fixed under all situation. The weights of the experts are fixed no matter which images are given. However, some algorithms may specialize in certain situations and performs poorly for others. We should give higher weight to that expert under that specialized situation and lower weight for other situations. Fixed weight is not desirable in this sense. Besides, there is no update mechanism for SFRCM, the weights cannot be updated once the system is trained. For SFRCM, the performance are determined by preliminary experiments. However, for real applications, the performance should be determined at the place where the system situated. A update mechanism can help to keep adjusting the performance of each experts.
21 DFRCM Architecture Gating network is included Image is involved in determination of weightTo tackle the two problems mentioned, we propose a dynamic structure committee machine, called DFRCM. This diagram gives its architecture. The main difference is the use of a gating network, which is a neural network. It accepts the input image and determine the situation of the image taken, then it gives the corresponding weight to the experts.
22 Gating NetworkKeep the performance of experts on different face databasesDetermine the database of input imageGive the corresponding weights of the experts for that databaseIn our experiment, we use different databases to model different situations. And we keep the performance of the experts on different face databases. The gating network then determines the database of input image and gives the corresponding weights of the experts for that database.
23 Feedback Mechanism Initialize ni,j and ti,j to 0 Train each expert i on different database jWhile TESTINGDetermine j for each test imageRecognize the image in each expert iIf ti,j != 0 then Calculate pi,jElse Set pi,j = 0Calculate wi,jDetermine ensemble resultIf FEEDBACK then Update ni,j and ti,jEnd whileAs I mentioned before, the dynamic FRCM also has a feedback mechanism to keep updating the weight of the experts. The feedback mechanism is simple. Firstly, we initialized all the number of correct prediction for expert i on database j ni,j and the total number of test image ti,j to zero, then we train each expert i separately on each database j. Through the testing, the gating network determine the database j for each test image and each expert recognize that image accordingly. And then calculate the performance pij and weight for that expert. If there is a feedback request, the system will update ni,j and ti,j in the testing until the testing finish. With this mechanism, we can keep update the performance of the expert.
24 Implementation: Face Recognition System Real-time face recognition systemImplementation of FRCMFace processingFace trackingFace detectionFace recognitionWe have implemented the FRCM in visual C++ under windows platform. We call it as Face Recognition System, it is a real-time face recognition system which is capable for face tracking, face detection and face recognition. As presented by Harvest, the face tracking and detection is implemented by Harvest and I ‘m responsible for the recognition modules.
25 System ArchitectureHere is the overall system architecture, an image is captured by the capture device, say a web camera. The image is then passed to the face detection module which will go through the color model, skin segmentation to extract any possible face region for the SVM to validate. Face image found is then normalized and passed to the face recognition module for recognition. The module is the implementation of FRCM, user can selectively choose any experts in the committee for recognition and the identity of the user will be reported.
26 Face Recognition Process EnrollmentCollect face images to train the expertsRecognitionIdentificationVerificationWe are now briefly introduce the face recognition process, to use the system, user should first provides eight images. These images should provide enough variations such as rotation, expression such that the template matching approaches can match the user even though the user has this kinds of variations in testing. The collected images would be used to train the experts in the committee separately. After the training, the system is able to provide the two mode of recognition, identification and verification.
27 System Snapshots Identification Verification Here is a system snapshots for identification and verification. The left hand side images show the identification result of the captured image on the right. In this system, the user can select the experts and the system reports the identity of the user. The right hand side is the verification, user should provide an identity to the system, the system reports whether to authorize or non-authorize the user.
28 Problems of FRCM on mobile device Memory limitationLittle memory for mobile devicesRequirement for recognitionCPU power limitationTime and storage overhead of FRCMThe system works fine for a desktop computer with high processing power. However, if we want to further apply the system on mobile devices, we face some problems. The first problem is the memory limitation, currently most mobile device only got limited memory, say several megabytes. However, face recognition requires quite a lots of memory spaces for some algorithms. This table gives a rough idea of the memory required to store the templates / models for each algorithm, especially for the SVM which stores a lot of information in the model. Another problem is the CPU power limitation. Committee machine needs much processing time than a single algorithms, which is the sum of the processing time of all the algorithms. And most mobile devices now do not have such high processing power to provide a real-time application.
29 Distributed Architecture ClientCapture imageEnsemble resultsServerRecognitionTherefore, we propose a distributed architecture for the face recognition system. We adopt the client and server approach to separate the load of the whole recognition process. As shown in this diagram, the client only needs to first capture image from a camera and then send the image to the server side, which consists of the experts in the committee for recognition. The result is then send back to the client for assembling and the client will display the final result. In this architecture, the mobile device only needs to handle some light weight process and the server will handle all the CPU and memory consuming tasks. Therefore, it is feasible to implement the system on the mobile devices nowadays.
30 Distributed System: Evaluation ImplementationDesktop (1400MHz), notebook (300MHz)S: Startup, R: RecognitionDistinct servers:For the distributed system, we do not implement any mobile client yet. Instead, we test the distributed architecture on two computer, the first one is a desktop machine with 1400MHz and a notebook with 300MHz CPU. The table lists the time required for startup , which includes loading of the models or templates from storage and recognition. In the recognition time, we do not see much improvement here. The distributed architecture and the notebook uses 2 second for recognition. However, we can still see the improvement in the startup time for the distributed architecture, the startup time for the distributed system is only 16s while the time for a notebook needs 93 second. We can also further improve the processing time by using distinct servers , that means we use one server for one expert so that recognition can be processed concurrently. The time required would be much shorter which is the sum of the transmission time and the max of the processing time for the expert i.
31 Experimental Results Databases used: Cross validation testing ORL from AT&T LaboratoriesYale from Yale UniversityAR from Computer Vision Center at U.A.BHRL from Harvard Robotics LaboratoryCross validation testingAnd here we come to the experimental part, we evaluate the five algorithms and the face recognition committee machines on four well-known face database. The first one is ORL from AT&T Laboratories, the second is Yale from Yale university, and AR from computer vision center at U.A.B, the final one is HRL from Harvard Robotics Laboratory. We use cross validation testing to evaluate their processing, we partition the data into three subset, one for training, one for validation to get their performance and the final subset for testing.
32 Preprocessing Apply median filter to reduce noise in background Apply Sobel filter for edge detectionCovert to a binary imageApply horizontal and vertical projectionFind face boundaryObtain the center of the face region.Crop the face region and resize itBefore we process the images, we need to preprocess the database images as some of them contains not only just the face but with some simple background. Therefore, we apply a simple ad-hoc preprocessing approach to get the face from the database image. We first apply a median filter to reduce some noise in the background and then we apply the sobel filter for edge detection and convert the image to a binary image. We get the face region by applying horizontal and vertical projection and get the center of the face region for extraction.
33 ORL Result ORL Face database 400 images 40 people Variations Position RotationScaleExpressionThis is the result of the experts and the committee machine on the ORL database. We can see that the Fisherface and SVM is quite accurate among the five experts and the DFRCM and SFRCM has comparable performance to the best experts in this testing.
34 Yale Result Yale Face Database 165 images 15 people Variations ExpressionLightingFor the yale testing, the DFRCM also gets higher accuracy than the others in the testing. And the DFRCM also achieves higher accuracy than the others.
35 AR Result AR Face Database 1300 images 130 people Variations ExpressionLightingOcclusionsFor the AR face database, most experts have poor result in this testing because this database contains a large variations in lighting and especially occlusion. However, Fisherface still gets high accuracy in this testing due to its better classification ability. In this testing, the dynamic structure gets the same result as the fisherface as it has much better results in the validation, the gating network gives higher weight for fisherface in the testing and thus the fisherface dominant the result in the committee. And SFRCM also gets better accuracy than most experts.
36 HRL Result HRL Face Database 345 images 5 people Variation Lighting For HRL testing, the DFRCM gets higher accuracy in this testing.
37 Average Running Time & Results And here comes the processing time required for the experiments and the average results of the experts on all testing, we can see that although we use the committee machine which will have certain overhead, the processing time for each image on the testing is still within one second which is acceptable for a real-time application. And in overall, we can see the improvement of the FRCMs over all the experts on average.Average experimental results
38 ConclusionMake a thorough comparison of five face recognition algorithmsPropose FRCM to integrate different face recognition algorithmsImplement a face recognition system for real-time applicationPropose a distributed architecture for mobile deviceAnd here we come to a conclusion, we make a thorough comparison of five recognition algorithms on four face databases. We propose a framework , FRCM to integrate different face recognition algorithms to improve the overall accuracy. We also implement a face recognition system for real-time application. And finally, we propose a distributed architecture for the system to make face recognition application feasible on mobile device. That’s all for my presentation.