Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-Time Human Pose Recognition in Parts from Single Depth Images

Similar presentations


Presentation on theme: "Real-Time Human Pose Recognition in Parts from Single Depth Images"— Presentation transcript:

1 Real-Time Human Pose Recognition in Parts from Single Depth Images
Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchi Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper

2

3 OUTLINE Introduction Data Body Part Inference and Joint Proposals
Experiments Discussion

4 Introduction Robust interactive human body tracking
gaming, human-computer interaction, security, telepresence, health-care Real time depth cameras tracking from frame to frame but struggle to re-initialize quickly and so are not robust Our focus on per-frame initialization + tracking algorithm focus on pose recognition in parts 3D position candidates for each skeletal joint

5 Introduction appropriate tracking algorithm
Tracking people with twists and exponential maps (CVPR 1998) Tracking loose limbed people (CVPR 2004) Nonlinear body pose estimation from depth images (DAGM 2005) Real-time hand-tracking with a color glove (ACM 2009) Real time motion capture using a single time-of-flight camera (CVPR 2010)

6 Introduction inspired by recent object recognition work that divides objects into parts Object class recognition by unsupervised scale-invariant learning [CVPR 2003] The layout consistent random field for recognizing and segmenting partially occluded objects [CVPR 2006] Two key design goals Computational efficiency robustness

7 Introduction dense probabilistic body part labeling +
spatially localized near skeletal joints Depth Image 3D proposal segment generate

8 Introduction Training data
We treat the segmentation into body parts as a per-pixel classification task Evaluating each pixel separately Training data generate realistic synthetic depth images train a deep randomized decision forest classifier avoid overfitting

9 Introduction Overfitting
Simple, discriminative depth comparison image features maintaining high computational efficiency

10 Introduction For further speed, the classifier can be run in parallel on each pixel on a GPU mean shift resulting in the 3D joint proposals

11 Density GRADIENT Estimation
What is Mean Shift ? A tool for: Finding modes in a set of data samples, manifesting an underlying probability density function (PDF) in RN PDF in feature space Color space Scale space Actually any feature space you can conceive Non-parametric Density Estimation Data Discrete PDF Representation Non-parametric Density GRADIENT Estimation (Mean Shift) PDF Analysis

12 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

13 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

14 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

15 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

16 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

17 Intuitive Description
Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

18 Intuitive Description
Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls

19 Main contribution Treat pose estimation as object recognition
using a novel intermediate body parts representation spatially localize joints low computational cost and high accuracy

20 Experiments (i) synthetic depth training data is an excellent proxy for real data (ii) scaling up the learning problem with varied synthetic data is important for high accuracy (iii) our parts-based approach generalizes better than even an oracular exact nearest neighbor

21 Data Depth imaging and Motion capture data Pose estimation research
often focused on techniques lack of training data Two problems on depth image color pose

22 Depth image Use real mocap data
Retargetted to a variety of base character models to synthesize a large, varied dataset 640x480 image at 30 frames per second Depth cameras > Traditional intensity sensors working in low light levels giving a calibrated scale estimate resolving silhouette ambiguities in pose

23 Motion capture data capture a large database of motion capture (mocap) of human actions approximately 500k frames (driving, dancing, kicking, running, navigating menus) Need not record mocap with variation in rotation vertical axis, mirroring left-right, scene position body shape and size, camera pose all of which can be addedin (semi-)automatically

24 Motion capture data The classifier uses no temporal information
static poses not motion frame to the next are so small as to be insignificant using ‘furthest neighbor’ clustering algorithm where the distance between poses j mean body joints , Pi mean i pose Define distance more than 5 cm

25 Motion capture data necessary to iterate the process of motion capture
sampling from our model training the classifier testing joint prediction accuracy CMU mocap database

26 Generating synthetic data
build a randomized rendering pipeline sample fully labeled training images Goals realism and variety

27 Generating synthetic data
First : randomly samples a set of parameters Then uses standard computer graphics techniques render depth and body part images from texture mapped 3D meshes Use autodesk motionbulider slight random variation in height and weight give extra coverage of body shapes Others parameters

28 Generating synthetic data

29 Body Part Inference and Joint Proposals
Body part labeling Depth image features Randomized decision forests Joint position proposals

30 Body part labeling intermediate body part representation
as color-coded Some directly localize particular skeletal joints others fill the gaps transforms the problem into one that can readily be solved by efficient classification algorithms

31 Body part labeling The parts are specified in a texture map

32 Body part labeling 31 body parts: LU/RU/LW/RW head, neck,
L/R shoulder, LU/RU/LW/RW arm, L/R elbow, L/R wrist, L/R hand, LU/RU/LW/RW torso, LU/RU/LW/RW leg, L/R knee, L/R ankle, L/R foot (Left, Right, Upper, loWer)

33 Depth image features di (x) is the depth at pixel x in image I
Ө= (u, v) describe offsets u and v 1/di (x) ensures the features are depth invariant

34 Depth image features combination in a decision forest
Individually these features provide only a weak signal combination in a decision forest sufficient to accurately disambiguate all trained parts

35 Depth image features The design of these features was strongly motivated by their computational efficiency no preprocessing is needed read at most 3 image pixels at most 5 arithmetic operations straightforwardly implemented on the GPU

36 Randomized decision forests
fast and effective multi-class classifiers Implemented efficiently on the GPU 1

37 Randomized decision forests

38 Randomized decision forests

39 Joint position proposals
generate reliable proposals for the positions of 3D skeletal joints the final output of our algorithm used by a tracking algorithm to self initialize and recover from failure

40 Joint position proposals
A local mode-finding approach based on mean shift with a weighted Gaussian kernel ^xi is the reprojection of image pixel xi bc is a learned per-part bandwidth world space given depth dI (xi)

41 Non-Parametric Density Estimation
Assumption : The data points are sampled from an underlying PDF Data point density implies PDF value ! Assumed Underlying PDF Real Data Samples

42 Non-Parametric Density Estimation
Assumed Underlying PDF Real Data Samples

43 Non-Parametric Density Estimation
? Assumed Underlying PDF Real Data Samples

44 Parametric Density Estimation
Assumption : The data points are sampled from an underlying PDF Estimate Assumed Underlying PDF Real Data Samples

45 Joint position proposals
Wic considers both the inferred body part probability at the pixel and the world surface area of the pixel

46 Joint position proposals
The detected modes lie on the surface of the body pushed back into the scene by a learned z offset produce a final joint position proposal Bandwidth Bc = 0.065m Threshold λc = 0.14 Z offset = m Set = 5000 images by grid search

47 Joint position proposals

48 Experiments provide further results in the supplementary material
3 trees, 20 deep, 300k training images per tree 2000 training example pixels per image 2000 candidate features Ө 50 candidate thresholds ζ per feature

49 Experiments Test data Real test set
challenging synthetic and real depth images to evaluate our approach synthesize 5000 depth images Real test set 8808 frames of real depth images 15 different subjects 7 upper body joint positions

50 Experiments Error metric: quantify both classification
average of the diagonal of the confusion matrix between the ground truth part label and the most likely inferred part label Joint prediction accuracy generate recall-precision curvesas a function of confidence threshold quantify accuracy as average precision per joint

51 Experiments Error metric: This penalizes multiple spurious detections
Near the correct position which might slow a downstream tracking algorithm D = 0.1 m below closed real test data

52 Experiments

53 Experiments

54 Experiments

55 Experiments

56 Experiments

57 Experiments Real time motion capture using a single time-of-flight camera. [CVPR 2010]

58 Discussion accurate proposals body part recognition
for the 3D locations of body joints super real-time from single depth images body part recognition as an intermediate representation a highly varied synthetic training set train very deep decision forests Depth invariant features without overfitting

59 Future work study of the variability in the source mocap data
Generative model underlying the synthesis pipeline a similarly efficient approach directly regress joint positions remove ambiguities in local pose

60 Thank you


Download ppt "Real-Time Human Pose Recognition in Parts from Single Depth Images"

Similar presentations


Ads by Google