Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Similar presentations


Presentation on theme: "Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction."— Presentation transcript:

1 Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction Motivation : Hand signals are commonly used for communication in noisy environments or when people are out of voice range. Examples include directing an airplane to the runway for take off, controlling traffic flow, basketball referee signals, etc. Computer Science Figure 1: Basketball Referee Hand Signal 2. Algorithm Overview : 2D vs 3D sequence alignment using Dynamic Time Warping Assumptions : We focus on the recognition part of the algorithm, thus we assume that the video sequence has been temporally segmented and the desired 2D feature locations can be reliably tracked over the whole sequence. Within each sequence of 2D features, we further assume that there is only one hand signal. Problem Definition : Given a sequence of tracked 2D feature locations, find the best matching 3D motion capture sequence from an archive. Table 1 : Confusion Matrix. Each row contains outcome of classifying queries drawn from the same category. Diagonal entries represent correct classification. Dissimilarity score : Given at least six pairs of 2D to 3D correspondences in a frame, the projection matrix M can be estimated. Given M, the back-projection error of the 3D points is used as the dissimilarity score. 2D vs 3D alignment : Once we are able to compute the dissimilarity between a frame of 2D and 3D features, the Dynamic Time Warping (DTW) algorithm can proceed as usual. The DTW algorithm finds the optimal alignment by minimizing the dissimilarity cost. Figure 2: An example of a DTW matching 2D feature locations in image sequence 3D feature locations in motion capture sequence Equation 2: Recursive solution for the DTW alignment Classifier : Experiments are conducted using the nearest neighbor classifier. Hence given a sequence of 2D features, the 3D motion sequence with the lowest alignment score is deemed the best match. Data : 45 motion capture sequences of basketball referee gestures: http://mocap.cs.cmu.edu. 2D image features were synthesized from the 3D motion capture sequences using a frontal view and scaled to unit height. Approximately half of the data were used as prototypes in the archive and the other half used for testing. Figure 4 : Classifier performance with respect to increasing noise Description of Experiments : Three sets of experiments are conducted with different set of features at different noise level. The first experiment uses all 31 feature shown in Fig 1, with increasing noise. The second experiment uses a set of more realistic features points indicated by the shaded points in Fig 1. The last experiment uses only shaded points in the upper body of Fig 1. Significance of Noise Parameter : In the synthesized images, the person is of unit height. Suppose we are tracking a person 300 pixels tall, an error margin of 0.06 in normalized coordinates simulates a tracker that reports tracked points within a 36 pixel radius 95% of the time. 3. Experiments and Results Contribution : No direct 3D structure estimation is needed. The most relevant work is Parameswaran and Chellapa [1]. We proposed a simpler alternative to the 2D-3D motion matching problem that also offers viewpoint invariance. 4. References [1] V. Parameswaran and R. Chellapa. View invariants for human action recognition. In CVPR 2003. Why DTW? The algorithm provides an optimal alignment between sequences thus we do not have to worry about variations in the speed of the motion. Equation 1: Dissimilarity Measure. P(.) applies the projection matrix. 3D motion capture sequence from an archive. 2D Features 2D-3D Matching Algorithm. Why 3D Motion capture archive? The representation is more complete than a 2D representation as there is no need to sample the motion from multiple views. Future Work : Currently there are no temporal constraints on computing the projection matrix from frame to frame. Temporal consistency can be enforced during the matching process to improve robustness.


Download ppt "Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction."

Similar presentations


Ads by Google