Presentation on theme: "Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis."— Presentation transcript:
Bayesian Decision Theory Case Studies CS479/679 Pattern Recognition Dr. George Bebis
Case Study I A. Madabhushi and J. Aggarwal, A bayesian approach to human activity recognition, 2nd International Workshop on Visual Surveillance, pp , June 1999.A bayesian approach to human activity recognition
Human activity recognition Recognize human actions using visual information. – Useful for monitoring of human activity in department stores, airports, high-security buildings etc. Building systems that can recognize any type of action is a difficult and challenging problem.
Goal Build a system that is capable of recognizing the following 10 (ten) actions, from a frontal or lateral view: sitting down standing up bending down getting up hugging squatting rising from a squatting position bending sideways falling backward walking
Rationale and Approach Rationale – People sit, stand, walk, bend down, and get up in a more or less similar fashion. – Human actions can be recognized by tracking various body parts. Head motion trajectory – The head of a person moves in a characteristic fashion during these actions. Recognition is formulated as Bayesian classification using the movement of the head over consecutive frames.
Strengths and Weaknesses Strengths – The system can recognize actions where the gait of the subject in the test sequence differs considerably from the training sequences. – Also, it can recognize actions for people of varying physical structure (i.e., tall, short, fat, thin etc.). Weaknesses – Only actions in the frontal or lateral view can be recognized successfully by this system. – Certain assumptions might not be valid.
Main Steps input output
Action Representation Estimate the centroid of the head in each frame: Find the absolute differences in successive frames: ||| |
Head Detection and Tracking The centroid of the head is tracked from frame to frame. Accurate head detection and tracking are crucial. – Detection was performed manually here.
Bayesian Formulation Given an input sequence, the posterior probabilities are computed for each action using the Bayes rule: Assumption:
Probability Density Estimation Feature vectors X and Y are assumed to be independent (valid?), following a multi-variate Gaussian distribution:
Probability Density Estimation (contd) The sample covariance matrices are used to estimate Σ X and Σ Y : Two distributions are estimated for each action corresponding to the frontal and lateral views (i.e., 20 densities total). ΣXΣX ΣYΣY
Recognition Given an input sequence, the posterior probabilities are computed for each of the stored actions (i.e., 20 values). The input action is classified based on the most likely action:
Discriminating Similar Actions In some actions, the head moves in a similar fashion, making it difficult to distinguish these actions from one another; for example: (1) The head moves downward without much sideward deviation in the following actions: * squatting * sitting down * bending down
Discriminating Similar Actions (contd) (2) The head moves upward without much sideward deviation in the following actions: * standing up * rising * getting up A number of heuristics are used to distinguish among these actions. – e.g., when bending down, the head goes much lower than when sitting down.
Training A fixed CCD camera working at 2 frames per second was used to obtain the training sequences. People of diverse physical appearance were used to model the actions. Subjects were asked to perform the actions at a comfortable pace.
Training (contd) To train the system, 38 sequences were taken of each person performing all the actions of interest in both the frontal and lateral views. It was found that each action can be completed within 10 frames. Only the first 10 frames from each sequence were used for training/testing (i.e., 5 seconds)
Testing For testing, 39 sequences were used. Of the 39 sequences, 31 were classified correctly. Of the 8 sequences classified incorrectly, 6 were assigned to the correct action but to the wrong view.
Practical Issues How would you find the first and last frames of an action in general (segmentation)? Is the system robust to recognizing an action from incomplete sequences (i.e., assuming that several frames are missing)? Current system is unable to recognize several actions at the same time.
Extension J. Usabiaga, G. Bebis, A. Erol, Mircea Nicolescu, and Monica Nicolescu, "Recognizing Simple Human Actions Using 3D Head Trajectories", Computational Intelligence, vol. 23, no. 4, pp , 2007.Recognizing Simple Human Actions Using 3D Head Trajectories
Case Study II J. Yang and A. Waibel, A Real-time Face Tracker, Proceedings of WACV'96, 1996.A Real-time Face Tracker
Goal and Steps Goal – Build a system that can detect and track a persons face while the person moves freely in a room. Main Steps (1) Detect arbitrary human faces in various environments using a generic skin-color model. (2) Track the face of interest by controlling the camera position and zoom. (3) Adapt skin-color model parameters based on individual appearance and lighting conditions.
System Components A probabilistic model to characterize skin- color distributions of human faces. A motion model to estimate human motion and to predict search window in the next frame. A camera model to predict camera motion (i.e., cameras response was much slower than frame rate).
Why Using Skin Color for Face Detection? Traditional systems performed face detection using template matching or facial features. Using skin-color leads to a faster and more robust approach compared to template matching or facial feature extraction.
Challenges Using Skin Color Human skin colors differ from person to person. The color representation of a face obtained by a camera is influenced by many factors (e.g., ambient light, motion etc.) Different cameras produce significantly different color values, even for the same person under the same lighting conditions.
Chromatic Color Space RGB is not the best color representation for characterizing skin-color (i.e., it represents not only color but also brightness). Represent skin-color in the chromatic space which is defined from the RGB space as follows: (the normalized blue component is redundant since r + g + b = 1)
Skin-Color Clustering Skin colors do not fall randomly in chromatic color space but form clusters at specific points.
Skin-Color Clustering (contd) Distributions of skin-colors of different people are clustered in chromatic color space – i.e., they differ much less in color than in brightness (skin-color distribution of 40 people - different races)
Skin-Color Model Experiments (i.e., assuming different lighting conditions and different persons) have shown that the skin-color distribution has a regular shape. Idea: represent skin-color distribution using a Gaussian with mean μ and covariance Σ:
Parameter Estimation Select skin-color regions from a set of face images. Estimate the mean and covariance of skin- color distribution using the sample mean and covariance:
Face detection using the skin-color model Each pixel x in the input image is converted into the chromatic color space and compared with the distribution of the skin-color model.
Dealing with skin-color-like objects It is impossible in general to detect only faces simply from the result of color matching – e.g., background may contain skin colors
Dealing with skin-color-like objects (contd) Additional information should be used for rejecting false positives(e.g., geometric features, motion etc.)
Skin-color model adaptation If a person is moving, the apparent skin colors change as the persons position relative to the camera or light changes. Idea: adapt model parameters to handle these changes.
Skin-color model adaptation (contd) N determines how long the past parameters will influence the current parameters. The weighting factors a i, b i, c i determine how much the past parameters will influence current parameters.
System initialization Automatic mode – A general skin-color model is used to identify skin- color regions. – Motion and shape information is used to reject non-face regions. – The largest face region is selected (face closest to the camera). – Skin-color model is adapted to the face being tracked.
System initialization (contd) Interactive mode – The user selects a point on the face of interest using the mouse. – The tracker searches around the point to find the face using a general skin-color model. – Skin-color model is adapted to the face being tracked.