Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.
Published byModified over 4 years ago
Presentation on theme: "Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This."— Presentation transcript:
Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This project effectively detects and tracks human faces. Using two cameras with different zoom levels -- one viewing an entire scene, one zoomed in on a human face – it is able to work through partial occlusion and slight illumination changes. Because of the color space that was used, this system has the ability to track people of all races. Utilizing multiple cameras in conjunction enables a more robust detection and tracking environment, while increasing the complexity of the design. The key elements of the design are the graphical user interface, the communications algorithm, the face detection algorithm, the tracking algorithm, and the camera view correspondence. Hardware Configuration Face Detection Face Tracking Camera View Correspondence Software Configuration Capture In order to perform object tracking in real-time, an algorithm that was not very computationally expensive was desired. The algorithm chosen to perform this task was the Continuously Adaptive Mean Shift (CAMSHIFT) tracking algorithm. CAMSHIFT is a modified version of the mean shift algorithm. The cameras are modeled using the pinhole camera model. A coordinate system is introduced for the translation of region of interest information between cameras. 3D depth information is extracted from 2D images based on the relationship between average face size and distance from the camera. Parameter computation for driving the pan and tilt angles of the OVC to the pixel center of the region of interest of the SVC is accomplished using geometric transformations and pixel-to- millimeter mapping information extracted from test images. The face detection was done using a Support Vector Machine, which is a learning machine that has the ability to classify complicated information, such as faces. The face detection was first trained using approximately 150 20x20 images of both non-faces and faces. Before classification and training, each image was converted to grayscale and resized to 20x20 pixels. The histogram of the image was then equalized to normalize brightness and increase contrast. The image was then masked on the edges to reduce the background noise. Original Grayscale Resized Equalized Masked Segment Face Detect Track with SVC Track with OVC From the Camera to the PC, the hardware utilized was: Sony EVI D100 Color Video Cameras SVC – Scene View Camera with wide angle view OVC – Object View Camera with narrow angle view 2 PCs with: Osprey 200 Frame grabber cards 2GB RAM Dual 2.8GHz Intel Xeons Each PC was running Gentoo Linux. The Intel OpenCV libraries were installed on both PCs. A GUI was created using OpenCV. The GUI displays the current image with detection and tracking information as well as a handy command window displaying all of the useful user interrupts. Contributors and Resources Contributors: Dr. Czernikowski – Thank you for your advice Dr. Savakis – Thank you for the project idea, equipment, and advice. Paul Mezzanini – Thank you for administering our computers. Yuriy Luzanov – Thank you for your guidance. All the people who allowed us to take their pictures. Resources: Intel OpenCV Library – http://www.intel.com/research/mrl/research/opencv/ SVM Light - http://svmlight.joachims.org/ First, the image is captured from the camera by the frame grabber card. Using OpenCV functions, the image is then converted from its native RGB color space to the HSI color space. This is done because the hue value of all humans with skin pigment is in a certain well-defined range. After being captured and converted, the image processing begins. It first goes through skin color filtering with the skin segmentation algorithm. Skin segmentation is done using a 2-dimensional histogram of hue and saturation values that was generated from a sample set of skin images. After everything but skin tones are filtered from the image, a scaled black and white image is created. This density map is 4x smaller than the original image. Each pixel is determined to be either black or white dependent upon the percentage of skin pixels in a 4x4 region. A connected components algorithm is run on the density map to bound the regions skin. Once the connected components algorithm has been run, rectangular regions of skin tones are generated. Each of these is run through a face detection algorithm to determine if it is a region of interest. The face detection algorithm then confidently classifies each region into either a face or non-face category. The region that the face detection algorithm most confidently classifies is passed to the tracking algorithm on the SVC. The SVC tracks the face until it leaves the scene or becomes occluded. After the SVC begins to track the face, it transmits the coordinates of the face to the OVC. The OVC then converts the coordinates to its own coordinate system, moves to find the face, and begins tracking the face with a higher zoom level. If it loses the face, it will notify the SVC and wait for a packet containing the latest coordinates of the face. It will then re-center the camera’s view on the face and, once again, begin tracking. The algorithm utilizes a 3-dimensional histogram based on the hue, saturation, and intensity values of a training set. Based on the histogram, a grayscale image is generated where each pixel represents the probability that that pixel contains skin. This image is used to find and resize a search window that, through successive frames, tracks the object of interest. System Setup SVC View OVC View BackprojectionOriginal