Download presentation

Presentation is loading. Please wait.

Published byJohnathon Pool Modified over 2 years ago

1
**Real-Time Human Pose Recognition in Parts from Single Depth Images**

Jamie Shotton Andrew Fitzgibbon Mat Cook Toby Sharp Mark Finocchi Richard Moore Alex Kipman Andrew Blake Microsoft Research Cambridge & Xbox Incubation CVPR 2011 Best Paper

3
**OUTLINE Introduction Data Body Part Inference and Joint Proposals**

Experiments Discussion

4
**Introduction Robust interactive human body tracking**

gaming, human-computer interaction, security, telepresence, health-care Real time depth cameras tracking from frame to frame but struggle to re-initialize quickly and so are not robust Our focus on per-frame initialization + tracking algorithm focus on pose recognition in parts 3D position candidates for each skeletal joint

5
**Introduction appropriate tracking algorithm**

Tracking people with twists and exponential maps (CVPR 1998) Tracking loose limbed people (CVPR 2004) Nonlinear body pose estimation from depth images (DAGM 2005) Real-time hand-tracking with a color glove (ACM 2009) Real time motion capture using a single time-of-flight camera (CVPR 2010)

6
Introduction inspired by recent object recognition work that divides objects into parts Object class recognition by unsupervised scale-invariant learning [CVPR 2003] The layout consistent random field for recognizing and segmenting partially occluded objects [CVPR 2006] Two key design goals Computational efficiency robustness

7
**Introduction dense probabilistic body part labeling +**

spatially localized near skeletal joints Depth Image 3D proposal segment generate

8
**Introduction Training data**

We treat the segmentation into body parts as a per-pixel classification task Evaluating each pixel separately Training data generate realistic synthetic depth images train a deep randomized decision forest classifier avoid overfitting

9
**Introduction Overfitting**

Simple, discriminative depth comparison image features maintaining high computational efficiency

10
Introduction For further speed, the classifier can be run in parallel on each pixel on a GPU mean shift resulting in the 3D joint proposals

11
**Density GRADIENT Estimation**

What is Mean Shift ? A tool for: Finding modes in a set of data samples, manifesting an underlying probability density function (PDF) in RN PDF in feature space Color space Scale space Actually any feature space you can conceive … Non-parametric Density Estimation Data Discrete PDF Representation Non-parametric Density GRADIENT Estimation (Mean Shift) PDF Analysis

12
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

13
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

14
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

15
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

16
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

17
**Intuitive Description**

Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls

18
**Intuitive Description**

Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls

19
**Main contribution Treat pose estimation as object recognition**

using a novel intermediate body parts representation spatially localize joints low computational cost and high accuracy

20
Experiments (i) synthetic depth training data is an excellent proxy for real data (ii) scaling up the learning problem with varied synthetic data is important for high accuracy (iii) our parts-based approach generalizes better than even an oracular exact nearest neighbor

21
**Data Depth imaging and Motion capture data Pose estimation research**

often focused on techniques lack of training data Two problems on depth image color pose

22
**Depth image Use real mocap data**

Retargetted to a variety of base character models to synthesize a large, varied dataset 640x480 image at 30 frames per second Depth cameras > Traditional intensity sensors working in low light levels giving a calibrated scale estimate resolving silhouette ambiguities in pose

23
Motion capture data capture a large database of motion capture (mocap) of human actions approximately 500k frames (driving, dancing, kicking, running, navigating menus) Need not record mocap with variation in rotation vertical axis, mirroring left-right, scene position body shape and size, camera pose all of which can be addedin (semi-)automatically

24
**Motion capture data The classifier uses no temporal information**

static poses not motion frame to the next are so small as to be insignificant using ‘furthest neighbor’ clustering algorithm where the distance between poses j mean body joints , Pi mean i pose Define distance more than 5 cm

25
**Motion capture data necessary to iterate the process of motion capture**

sampling from our model training the classifier testing joint prediction accuracy CMU mocap database

26
**Generating synthetic data**

build a randomized rendering pipeline sample fully labeled training images Goals realism and variety

27
**Generating synthetic data**

First : randomly samples a set of parameters Then uses standard computer graphics techniques render depth and body part images from texture mapped 3D meshes Use autodesk motionbulider slight random variation in height and weight give extra coverage of body shapes Others parameters

28
**Generating synthetic data**

29
**Body Part Inference and Joint Proposals**

Body part labeling Depth image features Randomized decision forests Joint position proposals

30
**Body part labeling intermediate body part representation**

as color-coded Some directly localize particular skeletal joints others fill the gaps transforms the problem into one that can readily be solved by efficient classification algorithms

31
Body part labeling The parts are specified in a texture map

32
**Body part labeling 31 body parts: LU/RU/LW/RW head, neck,**

L/R shoulder, LU/RU/LW/RW arm, L/R elbow, L/R wrist, L/R hand, LU/RU/LW/RW torso, LU/RU/LW/RW leg, L/R knee, L/R ankle, L/R foot (Left, Right, Upper, loWer)

33
**Depth image features di (x) is the depth at pixel x in image I**

Ө= (u, v) describe offsets u and v 1/di (x) ensures the features are depth invariant

34
**Depth image features combination in a decision forest**

Individually these features provide only a weak signal combination in a decision forest sufficient to accurately disambiguate all trained parts

35
Depth image features The design of these features was strongly motivated by their computational efficiency no preprocessing is needed read at most 3 image pixels at most 5 arithmetic operations straightforwardly implemented on the GPU

36
**Randomized decision forests**

fast and effective multi-class classifiers Implemented efficiently on the GPU 1

37
**Randomized decision forests**

38
**Randomized decision forests**

39
**Joint position proposals**

generate reliable proposals for the positions of 3D skeletal joints the final output of our algorithm used by a tracking algorithm to self initialize and recover from failure

40
**Joint position proposals**

A local mode-finding approach based on mean shift with a weighted Gaussian kernel ^xi is the reprojection of image pixel xi bc is a learned per-part bandwidth world space given depth dI (xi)

41
**Non-Parametric Density Estimation**

Assumption : The data points are sampled from an underlying PDF Data point density implies PDF value ! Assumed Underlying PDF Real Data Samples

42
**Non-Parametric Density Estimation**

Assumed Underlying PDF Real Data Samples

43
**Non-Parametric Density Estimation**

? Assumed Underlying PDF Real Data Samples

44
**Parametric Density Estimation**

Assumption : The data points are sampled from an underlying PDF Estimate Assumed Underlying PDF Real Data Samples

45
**Joint position proposals**

Wic considers both the inferred body part probability at the pixel and the world surface area of the pixel

46
**Joint position proposals**

The detected modes lie on the surface of the body pushed back into the scene by a learned z offset produce a final joint position proposal Bandwidth Bc = 0.065m Threshold λc = 0.14 Z offset = m Set = 5000 images by grid search

47
**Joint position proposals**

48
**Experiments provide further results in the supplementary material**

3 trees, 20 deep, 300k training images per tree 2000 training example pixels per image 2000 candidate features Ө 50 candidate thresholds ζ per feature

49
**Experiments Test data Real test set**

challenging synthetic and real depth images to evaluate our approach synthesize 5000 depth images Real test set 8808 frames of real depth images 15 different subjects 7 upper body joint positions

50
**Experiments Error metric: quantify both classification**

average of the diagonal of the confusion matrix between the ground truth part label and the most likely inferred part label Joint prediction accuracy generate recall-precision curvesas a function of confidence threshold quantify accuracy as average precision per joint

51
**Experiments Error metric: This penalizes multiple spurious detections**

Near the correct position which might slow a downstream tracking algorithm D = 0.1 m below closed real test data

52
Experiments

53
Experiments

54
Experiments

55
Experiments

56
Experiments

57
Experiments Real time motion capture using a single time-of-flight camera. [CVPR 2010]

58
**Discussion accurate proposals body part recognition**

for the 3D locations of body joints super real-time from single depth images body part recognition as an intermediate representation a highly varied synthetic training set train very deep decision forests Depth invariant features without overfitting

59
**Future work study of the variability in the source mocap data**

Generative model underlying the synthesis pipeline a similarly efficient approach directly regress joint positions remove ambiguities in local pose

60
Thank you

Similar presentations

OK

Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)

Computational Photography lecture 19 – How the Kinect 1 works? CS 590 Spring 2014 Prof. Alex Berg (Credits to many other folks on individual slides)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Download ppt on phase controlled rectifiers Ppt on computer languages and platforms Ppt on dc motor working principles Ppt on addition of integers Ppt on australian continent images Ppt on standing order definition Ppt on forest society and colonialism class 9 Ppt on switching devices for communicating Ppt on first conditional exercise Ppt on 3g mobile technology