Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Jae Sung Park

Similar presentations


Presentation on theme: "Presenter: Jae Sung Park"— Presentation transcript:

1 Presenter: Jae Sung Park
COMP Human Pose Recognition Real-Time Human Pose Recognition in Parts from Single Depth Images Presenter: Jae Sung Park

2 Joint Position Proposal Results
Introduction Data Acquisition Body Part Inference Joint Position Proposal Results In today’s talk, I am going to start with introduction, then go over the details of the paper, And finally I will show some results of this paper.

3 Poselets [Bourdev & Malik, 2009]
Motivation Human body tracking has many applications gaming, telepresence, security, … RGB or intensity camera? Higher computational cost Human joint tracking can be used in many applications. Gaming is one of these applications like this. And the authors mention that using RGB or intensity camera is not appropriate For real time applications Because of higher computational cost. One example of human pose recognition using RGB camera Is an algorithm called Poselets, Poselets [Bourdev & Malik, 2009] use RGB camera Kinect game

4 Motivation: Kinect Kinect sensor Gets color & depth images
~30Hz frame rate Color sensor Depth sensor Color image Depth image Point cloud Position Color This is how the Kinect sensor gets color and depth images. The frame rate is about 30 frames per second, Designed for real time application.

5 Goal Body joint position proposal from single depth images
The goal of this paper is to find the 3D joint positions from a single depth image. For training, depth images are synthesized of captured through the real Kinect, And body parts are labelled by per-pixel operations, And finally the 3D positions are proposed using mean-shift algorithm.

6 Joint Position Proposal Results
Introduction Data Acquisition Body Part Inference Joint Position Proposal Results The first step of this algorithm is data acquisition.

7 Need of Data Synthesis Lack of real world data
Variation of clothing, hair, body shape, camera position Generally, it is difficult to have a huge dataset from real Kinect. It would be really tedious task for human to label each body part in every images. Also, the shape of clothing, hair style, body shape and camera position Will make different looking depth images, To deal with every aspects, the size of dataset should be large.

8 Generation of Depth Images
Camera position CMU Mocap Depth rendering So they synthesized depth images and body part labels from 3D character models and motions. CMU motion capture data or CMU mocap is used for motion generations. The motions are defined by joint angles of human. They used various character models with various clothing and hair styles, Then retargeted to the mocap data. Also they used different virtual camera locations To get depth images and body part labelled color images. Retarget Color rendering Via Texture mapping Character Models

9 Joint Position Proposal Results
Introduction Data Acquisition Body Part Inference Joint Position Proposal Results Now I am going to talk about body part inference method.

10 Body Part Labeling 31 Body parts Variable depending on applications
5 for head and neck 16 for upper body 10 for lower body Variable depending on applications Sufficiently small, not too large They used 31 different body parts labelling. The labelling scheme can be different depending on applications. The number of different parts should be sufficiently small In order to localize body joints, And it also should be not too large Because the large number of body parts labelling will waste capacity of the classifier.

11 Depth Image Features where : feature function
: depth value at pixel x in image I : two offsets in pixel space : normalization factor They defined the feature as this equation. D I of x is the depth value in the depth image at 2D pixel location x U and v are 2d offset vectors in the pixel space 1 over d of x is used for normalization.

12 Depth Image Features: Examples
For example, The yellow cross denotes x Two arrows for theta 1 and theta 2 denote the offset vectors u and v The feature measures the depth difference between the two circled points. The left figure shows large responses because they measure depth difference between human and background depth The right figure shows small responses because of small difference between two human body pixels or between two background pixels. Large responses Small responses

13 Depth Image Features: Properties
ensures depth invariant One of the property of this feature is that It is a depth invariant feature. To make it depth invariant, normalization factor is multiplied by u and v As you can see in this figure the offset length has been changed.

14 Depth Image Features: Properties
2. Translation invariant measures depth difference Uses two offset vectors 3. Computationally efficient 3 image pixel reads, 5 arithmetic operations Direct implementation on GPU It is also translational invariant because First it measures depth difference, And second it uses two offset vectors. The most important thing is that this feature is computationally efficient It uses 3 image pixel fetches and 5 arithmetic operations. Which can be directly implemented on GPU.

15 Randomized Decision Forests
Each internal node has Feature Threshold Each leaf node has distribution over body-part label They used randomized decision forests for classification. Each internal node has feature and threshold. Evaluate the feature f sub theta and If the value is less than threshold, move to the left node Otherwise move to the right node And repeat recursively until reaching to a leaf node. Leaf node has probabilistic distribution of body part label given pixel x.

16 Randomized Decision Forests: Learning
Random subset of pixels Random subset of splitting candidates where Partition pixels into left/right subsets The learning process is as follows, First, pick a random subset of pixel locations Second, pick random candidate pairs of theta and tau, Third, partition the pixels into left and right subsets

17 Randomized Decision Forests: Learning
Compute the giving the largest information gain where is entropy Recurse for left/right subsets Repeat 1-5 for generating several trees The next step is to find the best pair of theta and tau Which gives the largest information gain These steps are repeated until terminating condition For example reaching maximum depth. Then finally generate multiple trees.

18 Randomized Decision Forests: Inference
Starting from each root Move to left or right node according to feature and threshold until reaching leaf node Average distribution over all trees Body part label inference is followed as I said, Evaluate function value and decide which direction to move For each tree, Then average the probabilistic distributions over all trees.

19 Inference Results Synthetic data Real data
These are some inference results. The colors are showing the label color which has the highest probability.

20 Inference Results

21 Joint Position Proposal Results
Introduction Data Acquisition Body Part Inference Joint Position Proposal Results The final step is joint position proposal.

22 Joint Position Proposal
“Local mode-finding approach based on mean shift with weighted Gaussian kernel” Mean shift is an algorithm for locating the maxima of a density function. where : pre-learned per-part bandwidth

23 Joint Position Proposal
Modes are on surface of body Push back in z direction by a pre-learned parameter = 0.039m = 0.065m

24 Introduction Data Acquisition Body Part Inference Joint Position Proposal Results

25 Speed Learning: Inference: 1 day on a 1000 core cluster
training 3 trees, depth 20 from 1 million images Inference: Under 5ms per frame on Xbox 360 GPU

26 Depth of trees

27 Maximum Probe Offset Maximum Probe Offset = max value for offset u and v

28 Comparison with Other Methods
Comparison with [Ganapathi et al., 2010]


Download ppt "Presenter: Jae Sung Park"

Similar presentations


Ads by Google