Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Tsz-HoYu Danhang Tang T-KKim Sponsored by.

Similar presentations


Presentation on theme: "Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Tsz-HoYu Danhang Tang T-KKim Sponsored by."— Presentation transcript:

1 Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Tsz-HoYu Danhang Tang T-KKim Sponsored by

2 2

3 Motivation Multiple cameras with invserse kinematics [Bissacco et al. CVPR2007] [Yao et al. IJCV2012] [Sigal IJCV2011] Learning-based (regression) [Navaratnam et al. BMVC2006] [Andriluka et al. CVPR2010] Specialized hardware (e.g. structured light sensor, TOF camera) [ Shotton et al. CVPR’11] [Baak et al. ICCV2011] [Ye et al. CVPR2011] [Sun et al. CVPR2012]

4 Motivation Discriminative approaches (RF) have achieved great success in human body pose estimation.  Efficient – real-time  Accurate – frame-basis, not rely on tracking  Require a large dataset to cover many poses  Train on synthetic, test on real data  Didn’t exploit kinematic constraints Examples: Shotton et al. CVPR’11, Girshick et al. ICCV’11, Sun et al. CVPR’12

5 Challenges for Hand? Labeling is difficult and tedious! Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body

6 Our method Hierarchical Hybrid Forest Transductive Learning Semi- supervised Learning Labeling is difficult and tedious! Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body

7 Existing Approaches Generative approaches Model-fitting No training is required Oikonomidis et al. ICCV2011 De La Gorce et al. PAMI2010 Hamer et al. ICCV2009 Motion capture Ballan et al. ECCV 2012 Slow Needs initialisation and tracking Discriminative approaches Similar solutions to human body pose estimation Performance on real data remains challenging Wang et al. SIGGRAPH2009 Stenger et al. IVC 2007 Keskin et al. ECCV2012 Discriminative approaches Similar solutions to human body pose estimation Performance on real data remains challenging Xu and Cheng ICCV 2013

8 Our method Hierarchical Hybrid Forest Labeling is difficult and tedious! Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body

9 Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Viewpoint Classification: Qa Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V

10 Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Viewpoint Classification: Qa Finger joint Classification: Qp Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V

11 Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) Viewpoint Classification: Qa Finger joint Classification: Qp Pose Regression: Qv Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V

12 Hierarchical Hybrid Forest STR forest: Qa – View point classification quality (Information gain) Qp – Joint label classification quality (Information gain) Qv – Compactness of voting vectors (Determinant of covariance trace) (α,β) – Margin measures of view point labels and joint labels Viewpoint Classification: Qa Finger Joint Classification: Qp Pose Regression: Qv Q apv = αQ a + (1-α)βQ P + (1-α)(1-β)Q V

13 Our method Transductive Learning Semi- supervised Learning Labeling is difficult and tedious! Viewpoint changes and self occlusions Discrepancy between synthetic and real data is larger than human body

14 Transductive learning Training data D = {R l, R u, S}: labeled unlabeled Target space (Realistic data R) Realistic data R: »Captured from Primesense depth sensor »A small part of R, R l are labeled manually (unlabeled set R u ) Source space (Synthetic data S ) Synthetic data S: »Generated from an articulated hand model. All labeled.

15 Transductive learning Training data D = {R l, R u, S}: Realistic data R: »Captured from Kinect »A small part of R, R l are labeled manually (unlabeled set R u ) Synthetic data S: »Generated from a articulated hand model, where |S| >> |R| Source space (Synthetic data S ) Target space (Realistic data R)

16 Transductive learning Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Source space (Synthetic data S ) Target space (Realistic data R)

17 Semi-supervised learning Training data D = {R l, R u, S}: Similar data-points in R l and S are paired(if separated by split function give penalty) Introduce a semi-supervised term to make use of unlabeled real data when evaluating split function Source space (Synthetic data S ) Target space (Realistic data R)

18 Kinematic refinement

19 Experiment settings 19 Evaluation data: Three different testing sequences 1.Sequence A --- Single viewpoint(450 frames) 2.Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames) 3.Sequence C --- Multiple viewpoints, with fast hand movements(240 frames) Training data: »Synthetic data(337.5K images) »Real data(81K images, <1.2K labeled)

20 20

21 Self comparison experiment Self comparison(Sequence A): »This graph shows the joint classification accuracy of Sequence A. »Realistic and synthetic baselines produced similar accuracies. »Using the transductive term is better than simply augmented real and synthetic data. »All terms together achieves the best results.

22 Multiview experiments Multi view experiment (Sequence C):

23 Conclusion A 3D hand pose estimation algorithm STR forest: Semi-supervised and transductive regression forest A data-driven refinement scheme to rectify the shortcomings of STR forest »Real-time (25Hz on Intel i7 PC without CPU/GPU optimisation) »Works better than state-of-the-arts »Makes use of unlabelled data, required less manual annotation. »More accurate in real scenario

24 Video demo

25 25 Thank you!


Download ppt "Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Tsz-HoYu Danhang Tang T-KKim Sponsored by."

Similar presentations


Ads by Google