Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston

Online Motion Capture Marker Labeling for Multiple Articulated Interacting Targets
Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston 2 University of Southern California

Motivations Passive optical motion capture system Limitations
Calibrated cameras Strongly retro-reflective markers Automatically reconstruct 3D position of markers Limitations Sporadic marker missing Challenging when capture multiple interacting subjects due to occlusion and interaction First, I briefly talk about the widely used passive optical motion capture system. This system is also used in our work. The subjects are wearing strong retro-reflective makers in front of a bunch of co-worked calibrated cameras. The 3D position of these markers can be tracked and reconstructed by these cameras. The limitation of this system is that first sporadic markers are missing during the motion capture process, second when capture multiple interacting subjects, the marker correspondence across frames became an challenge to this system. So in our paper, we proposed a novel method to solve these problems.

Related Work Assume correct marker labeling are given
Novel human motion synthesis [Rose et al. 98, Pullen and Bregler 02, Arikan and Forsyth 03] Motion data reusing and editing [Witkin and Popovic 95, Gleicher 98, Arikan and Forsyth 02, Kovar et al. 02, Bregler et al. 02] Marker labeling for single subject [Herda et al. 00, Lasenby and Ringer 02] A multiple hypothesis tracking algorithm Recovering joints’ positions [Silaghi et al. 98, Kirk et al. 05, Lasenby and Ringer 02] Spectral clustering for one target Multiple subjects but little interaction [Guerra 05] Closest point heuristic rule or closest-point rule works Here are some related work. In graphics community, most of research efforts concerning motion capture is novel motion synthesis, motion data reusing and editing. These works always assume correct motion marker correspondences are given in the data. Some previous work was about recovering joints’ position for one articulated body, some work was recovering marker correspondence for single human or for little interaction between different humans. But all of these methods employ the closest point heuristic rule for determining the correspondence of markers. And these method can’t handle complex motion capture scenarios.

Our Work Track and label markers for multiple interacting subjects
Online marker labeling Tolerant to sporadic marker missing Here are some distinctions of our work. First, our method can track and label markers for multiple interacting subjects. Second, our algorithm is able to label markers online. Third, our algorithm is tolerant to marker missing. The left picture is a snapshot of an input frame. Ten person are hugging together there. The right picture show the animation result after the markers are labeled by our approach. Our approach

Two Cues Spatial Cue (structure model) Motion Cue (motion model)
Subjects are make of rigid bodies Automatically construct rigid bodies from input sequences Motion Cue (motion model) Motion smoothness of each marker Narrow down legitimate candidates for each marker in next frame Relative position on rigid body is fixed Standard deviation of marker-pair distance is close to zero In our approach, we apply two cues to infer marker correspondence cross frame. For the spatial cue, the captured subjects can be approximately regarded as combination of rigid bodies. Ideally, the relative position of the markers on a rigid body keep fixed overtime, that means the standard deviation of distance between markers on a rigid body is close to 0. Base on this, our approach automatically construct a set of rigid bodies from the data, and the rigidity of these bodies is update over time. The motion cue is the motion smoothness of a marker in short time span because of the high capture rate. This cue is used to narrow down legitimate candidates for the maker next frame.

Approach overview One Motion Capture Frame Structure Model Model
Training Here is the overview of our algorithm. There are two stages in this algorithm- model training stage and online labeling stage. In the model training stage, we construct structure model and motion model. During the on-line labeling stage, we label the makers frame by frame; we also update the two models at the labeling stage. Online Labeling Algorithm Labeled Markers Motion Model

Structure model Two matrices (updated when a new frame is available)
D – Distance matrix A – Standard deviation matrix - Distance between the marker and the marker The structural model is composed of 2 matrix-Distance matrix and standard deviation matrix. Dij means the distance between the ith marker and jth marker. The same to Aij. The value of Dij and Aij is computed by the following formulas. Here fik is the 3D position of marker i at time k, and Tao is a constant term representing the size of the training motion block. - Standard deviation between the marker and the marker

Structure model A rigid body is typically composed of several markers (three to eight) How to use? A matrix- construct rigid bodies D matrix- validate the marker correspondence between consecutive frames We cluster markers into a number of rigid bodies, based on the standard deviation matrix. We select groups of markers which have small group-internal standard deviation to form a rigid body. Distance matrix is used to validate the maker correspondence between consecutive frames. (since the marker-pair distances within one rigid body is fixed over a short time span.) The left image is a skeleton of one subject, the right one visualize the deviation of marker-pair distance, the brighter regions indicate the distance between two markers has smaller deviation. The different colors indicate different rigid bodies.

Motion Model Build the candidate set
Maximum candidate number per marker (K) Experimentally, it is usually small (from two to four) Estimate the marker’s position in next frame Kalman filter Legitimate candidates Estimated marker position The motion model is used to build the candidate set for each marker in the next frame. The figure illustrate how it works. First, we estimate the position of one marker in the next frame. Then we compute the Mahalanobis distance between the estimated position and the position of unlabeled points in the next frame. If the distance is less than a threshold, that point is added to the candidate set for this marker. We use Kalman filter to estimate the position of markers in the next frame. Mahalanobis distance

Training Stage Initial short sequence (first 50 frames)
No interaction among multiple subjects Build structure model

On-line labeling stage
Fitting rigid bodies Algorithm Input Labels of previous frames 3D positions of all markers in current frame The trained structure and motion models Output Marker labeling result of current frame The updated structure and motion models We use fitting rigid bodies algorithm to label the markers during on-line labeling stage. The input of this algorithm is the previous labeling result, 3d position of all marker in the new frame, the trained structure model. The output is the labeling result of the new frame, and the updated and motion model

Fitting rigid bodies Algorithm (pseudo code) Construct a candidate set for each marker Set the flag of every rigid body to “unlabeled” While at least one rigid body is “unlabeled” do For each “unlabeled” rigid body r - Enumerate and evaluate all possible assignments (based on the candidate set) - Keep the optimum assignment MaxScore(r) for the rigid body r End for Select the assignment with maximum score for all r, argmax(MaxScore(r)) Set the flag of this rigid body to “labeled” Update other unassigned markers’ candidate sets Update the structure and motion models End while The pseudo code shows how this algorithm works. When a new frame is coming, first we construct a candidate set for each marker. And build a flag vector for each rigid body, we set the flag to unassigned. For each unassigned rigid body, we enumerate and evaluate all possible assignments based on the candidate set. We select the optimum assignment for that rigid body, and set the flag of it to assigned. Then we update other unassigned markers’ candidate sets, update the structure and motion model. We repeat this loop, until every rigid body is assigned.

Measure how a rigid body is fit with a marker assignment Distance between marker-pairs in a rigid body is consistent in a short time span Now, I talk about how we evaluate the assignment for each rigid body. We compute a score for every possible assignment of a rigid body based on the fact that the distance between maker-pairs in a rigid body is consistent in a short time span. The formula is showing here. The less the score is, the better the assignment is. We chose the assignment which has the smallest score. Distance between marker i and j in structure model Distance between possible marker i and j in a possible assignment Standard deviation between marker i and j in structure model The number of links in the rigid body

Missing Marker Recovery
Remaining “unlabeled” rigid bodies in the previous “fitting-rigid-bodies algorithm” are regarded as the rigid bodies enclosing missing markers The displacement vectors between markers in a rigid body is fixed Displacement vectors is known from previous frames The position of missing markers can be estimated from other markers

Result & Evaluation Motion capture sequences (with 5 and 10 subjects)
Recorded with 120 frames/sec 45-49 markers are on each subject Each subject has different marker layout, and different total number of markers First 50 frames used as initial training (no interaction) Frame by Frame online labeling We apply our algorithm to 3 motion capture sequences. The original capture rate is 120 frame/sec markers are on each subject. The input is a set of unlabelled 3D points. We use the first 50 frame to train the structure and motion model, then we label the points frame by frame. Here showing two snapshoot of the input sequences. The left one shows 5 person dog pile sequence, and the right one is 10 person hug together sequence.

The Closest point method
Result & Evaluation Here showing some labeling results. The first one is a frame from 5 person dog pile sequence, and the second one is a frame from 10 person hug sequence. The Closest point method Input frame Our method

Result & Evaluation Our algorithm VS Closest point based approach
X axis is down sampling rate Y axis is the number of wrong marker labeling we compared our approach with the closest point based approach that assumes the correct correspondence is the closest point in the next frame.. We down sampled the capture rate, which is varying from 120 frames/sec to 30 frames/sec. The comparison result is showing here. As you can see, the labeling result of our approach (the red curves) is much better than those of the closest point-based approach. The gap is even enlarged if down-sampling is applied.

Result & Evaluation Missing marker recovery experiment
Randomly remove several markers in the middle of mocap sequences (up to 20 continuous frames) X axis is the missing length Y axis is “error over max distance in the rigid body” In order to evaluate the accuracy of missing marker recovery, we randomly remove some markers in the middle of the motion sequence. Then we applied our approach to recovery these missing markers. Here is the result. The different color of the curves represent the markers were missing on different part of the subject. We also varied the number of continuous missing frames to evaluate our approach. As you can see from these curves, The smaller the average distance in the rigid body which the missing marker belongs to, the lower the error ratio is. When the number of continuous missing frames increases, the error ratio rises.

Conclusions Adaptive Efficient Robust
Track and label varied markers for multiple interacting subjects Adaptively cluster markers into rigid bodies Efficient Online marker labeling, frame by frame Robust Automatically detect and recover sporadic missing markers Little error propagation (compare with the closest point approach) To summarize our work. First, our method can track and label markers for multiple interacting subjects. Second, our algorithm is able to label markers online. Third, our algorithm can recover missing markers.

Limitations If most of the markers in a rigid body are missing, they are hard to be recovered. Current missing marker auto-recovery mechanism depends on other markers in the same rigid body Not real-time on a single computer Currently, 227 millisecond per subject per frame on a PC (Intel Xeon 3.0GHz, 4G Memory) Limitations. First, If most of the markers in a rigid body are missing, they are hard to be recovered second, this algorithm can’t work real-time on a single computer

Future work Introduce specific human motion models
Eliminate candidates that conflict with reasonable human motion Add sample-based method Avoid enumerate the labeling for each rigid body Improve algorithm efficiency to achieve real-time performance GPU accelerated parallel computing Test more complex motion capture scenarios Future work. First, we want to introduce specific human motion models, add sample-based method to improve the efficiency of this algorithm in order to achieve real-time labeling. We’ll apply this algorithm to more complex motion capture scenarios. We’d like to extend this algorithm to facial motion capture labeling.

Acknowledgement Vicon Motion Capture Inc. University of Houston
Providing experimental motion capture data and relevant software support University of Houston

Thank you!

Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston

Similar presentations

Presentation on theme: "Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston

Similar presentations

Presentation on theme: "Qian Yu 1, 2, Qing Li 1, Zhigang Deng 1 1 University of Houston"— Presentation transcript:

Similar presentations

About project

Feedback