1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Neural Networks and Kernel Methods

Applications of one-class classification

Bayesian Belief Propagation

Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.

Active Appearance Models

CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

Pattern Recognition and Machine Learning

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Computer vision: models, learning and inference Chapter 8 Regression.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Modeling the Shape of People from 3D Range Scans

3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)

Lecture Pose Estimation – Gaussian Process Tae-Kyun Kim 1 EE4-62 MLCV.

Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.

1 Learning Dynamic Models from Unsequenced Data Jeff Schneider School of Computer Science Carnegie Mellon University joint work with Tzu-Kuo Huang, Le.

Visual Recognition Tutorial

SA-1 Body Scheme Learning Through Self-Perception Jürgen Sturm, Christian Plagemann, Wolfram Burgard.

A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.

Pattern Recognition and Machine Learning

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

3D Human Body Pose Estimation using GP-LVM Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

An Iterative Optimization Approach for Unified Image Segmentation and Matting Hello everyone, my name is Jue Wang, I’m glad to be here to present our paper.

Visual Recognition Tutorial

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.

Radial Basis Function Networks

Crash Course on Machine Learning

Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington,

Cao et al. ICML 2010 Presented by Danushka Bollegala.

PATTERN RECOGNITION AND MACHINE LEARNING

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

Computer vision: models, learning and inference Chapter 19 Temporal models.

Learning Human Pose and Motion Models for Animation Aaron Hertzmann University of Toronto.

A General Framework for Tracking Multiple People from a Moving Camera

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Multifactor GPs Suppose now we wish to model different mappings for different styles. We will add a latent style vector s along with x, and define the.

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.

Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hybrid Bayesian Linearized Acoustic Inversion Methodology PhD in Petroleum Engineering Fernando Bordignon Introduction Seismic inversion.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

High Dimensional Probabilistic Modelling through Manifolds

Neil Lawrence Machine Learning Group Department of Computer Science

Neil Lawrence Machine Learning Group Department of Computer Science

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Computer vision: models, learning and inference

Compositional Human Pose Regression

Machine Learning Basics

Dynamical Statistical Shape Priors for Level Set Based Tracking

CSCI 5822 Probabilistic Models of Human and Machine Learning

Combining Geometric- and View-Based Approaches for Articulated Pose Estimation David Demirdjian MIT Computer Science and Artificial Intelligence Laboratory.

Presentation transcript:

1 Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions Taipeng Tian, Rui Li and Stan Sclaroff Computer Science Dept. Boston University

2 Introduction Motivating application –Gesture Recognition –Fixed Gesture Lexicon. –For example : Aircraft Signaler hand gestures Traffic Controller hand Signals Basketball Referee hand Signals

3 Pose Estimation Problem Definition 2D Projected Marker Positions Input (Observation)Output Silhouette (Alt Moments)

4 Related Work : Pose Estimation from a Single Image Geometry Based –Taylor CVIU ’01 –Barron & Kakadiaris IVC ’01 –Parameswaran & Chellappa CVPR ‘04 Learning Based –Rosales & Sclaroff HUMO ’00 –Agarwal & Triggs CVPR ’04 Others –Lee & Cohen CVPR ’04 –Shakhnarovich, Viola, Darrell ICCV ’03 –Mori, Ren, Efros and Malik CVPR ‘04 –Many more …

5 Idea 1 : Learning Mappings Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Image Features Pose

6 Idea 1 : Learning Mappings Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Image Features Pose

7 Idea 2 : Exploring the Solution Space Simulated Annealing [Deutscher et al. CVPR ’00] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc …

8 Idea 2 : Exploring the Solution Space Simulated Annealing [Deutscher et al. CVPR ’00] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc … Accurate model and typically with high DOF. Exploring the pose space for a solution consistent with observations. Difficult for high DOF. Computationally intensive.

9 Key Observations We have a constrained set of poses. Not necessary to explore the full parameter space. Combine two ideas –Learn Mappings –Explore a constrained space (i.e. learned model of body poses) Aircraft Signaler hand gestures Traffic Controller hand Signals Basketball Referee hand Signals

10 Overview of Framework Learn the rendering function Φ(.) Learn a model of human body poses 12 Y: Training Data Learning Phase Pose Inference Phase Input SilhouetteOutput Pose X: Latent Space

11 Learning a Model of Human Poses Gaussian Process Latent Variable Model (GPLVM) [Neil Lawrence NIPS ’04] is used. GPLVM originally used for visualizing high dimensional data Grochow et al. (SIGGRAPH ’03) uses it to solve the inverse kinematics problem for human motion animation. Currently we use it for automated articulated body pose inference

12 Gaussian Process Latent Variable Model (GPLVM) Overview Higher Dimensional Lower Dimensional / Latent Space Probabilistic Mapping

13 GPLVM Training : Learning a Model of Body Poses Given : training set of 2D projected marker positions {y i } (each y i is of D dimension) Goal : Learn parameters Corresponding latent variable values for each training data point Variables related to the Kernel

14 Kernel Function Also known as covariance function. Measures the similarity of the latent variables x and x’. For a data set of size N, we form an N by N kernel matrix K, in which K i,j = k(x i, x j ).

15 For a single dimension, the likelihood of y given the Gaussian Process (GP) model parameters is: Joint likelihood for D dimensions is: GPLVM Training : Learning a Model of Body Poses

16 To learn GPLVM from the training set {yi}, we maximize the following posterior: And placing the priors Negative Log

17 To learn GPLVM from the training set {yi}, we maximize the following posterior: Negative Log Computationally Intensive. A subset is chosen to compute the kernel matrix. This subset of poses is called the Active Set.

18 For a new pair (x,y) we can predict using This eqn. can be used to solve for x given y or y given x, via gradient descent.

19 GPLVM

20 GPLVM

21 GPLVM Left hand raised silhouettes tend to be clustered together

22 GPLVM Does not always do a good job

23 About GPLVM Allows mapping to and from the lower dimensional space. Allows smooth parameterization (i.e. allows derivatives) in lower dimensional space. Two dimensions work well for our data set. (Growchow et al. uses 2-5)

24 Input 2D Pose Silhouettes (Represented using Alt Moments) Learning the Forward/Rendering Function Similar to Rosales and Sclaroff

25 Overview of Framework Learn the rendering function Φ(.) Learn a model of human body poses 12 Y: Training Data Learning Phase Pose Inference Phase Input SilhouetteOutput Pose X: Latent Space

26 Pose Inference Typical Regularization (Also used by Agarwal and Triggs)

27 Data Term Forward function (Rendering function) 2D Projected Marker Positions Silhouette (Alt Moments)

28 Regularization Term Replace with prior knowledge term (i.e the learned model of poses) Independent of feature s

29 Pose Inference Solution obtained using Conjugate Gradient - Initialization using Active Set Also need to talk about initialization

30 Data Collection 12 gestures in the flight director lexicon Synthesize 6000 pairs of (Silhouette, Pose) pairs using Poser 3000 training (Male model) 3000 testing (Female model) 3D Pose Synthesized Silhouettes sampled Uniformly over the frontal view-sphere

31 (a) Silhouette images generated by Poser 5 (Test Set) Experiments (Synthetic Data) (c) Our Approach (b) Estimation from SMA (Specialized Mapping Architecture) (d) Ground Truth

32 Comparison with SMA

33 Additional Constraints Additional constraints can be added to achieve more accurate estimate, e.g. temporal consistency

34 Experiments (Real Data) (d) Our Approach (With Temporal Consistency) (a) Silhouette images of real person (b) SMA (Specialized Mapping Architecture) (c) Our Approach (Without Temporal Consistency)

35 Experiments (Real Data) (a) Silhouette images of real person (b) SMA (Specialized Mapping Architecture) (c) Our Approach (Without Temporal Consistency) (d) Our Approach (With Temporal Consistency)

36 Conclusion Proposed a novel method for Pose estimation for a pre-defined gesture lexicon. Interesting to note that two dimension is enough in our case. Technique is fast. (about 0.1 sec per frame in Matlab) Tracking as an extension. [video]

37 Thank You

38 Comments after the talk Related Works –Bullets / Summary of Strength vs Weakness –Why we need this work? Include year of publication for the related work (eg Rosales Sclaroff work not mentioned, Smichisecu work not mentioned) Order the related work temporally? Include an introduction slide and motivating slide –How to Motivate this work? –State of the art is so and so… We found this common weakness. So we proposed this work.. Human Pose not mentioned in Intro At the end of the talk say why use this work over the others Why GPLVM and not other reduction techniques? Like LLE/PCA/ISOMAP etc Give a top overview of the algorithm. A flow chart view? Explain the L(x,y) mapping using an illustration like the mapping between two planes. Clearly say what is high dimension y and what is low dimension x Give reference for GPLVM or website link. Add a slide on Math of GPLVM The Tikhonov regularization approach of minimizing ||phi(y)-s|| + regularization term. Usually the regularization term is ||Dx|| but now we chose L(x,y). Explain why Slide to talk about temporal constraint. Why learn the rendering function? i.e because we want to take the derivative… Give the numbers for the training set and this gives an idea how good are the quantitative results

39 Related Work Model Based Simulated Annealing [Deutscher et al CVPR ’00] Kinematic Jump Processes [Sminchisescu and Triggs CVPR ’03] Monte Carlo Markov Chain [Lee and Cohen CVPR ‘04] etc … Learning Based Specialized Mapping Architechture (SMA) [Rosales and Sclaroff NIPS ‘01] Relevance Vector Regression [Agarwal and Triggs CVPR ‘04] Parameter Sensitive Hashing [Shakhnarovich et al CVPR ‘03 ] etc …

40 To learn GPLVM from the training set {yi}, we maximize the following posterior: Negative Log

41 Overview of Framework (Learning Phase) Learn the Rendering Function Φ(.) Learning a model of human body poses (Using GPLVM) 12

42 Overview of Framework (Estimation Phase) Input Silhouette Output Pose Search over learned model of human body pose for solution consistent with observation

43 Kernel Function measures the similarity of the latent variables x and x’. For a data set of N, we can form a N by N kernel matrix K, in which Ki,j = k(xi, xj). how correlated x, x’ are in general spread of the function noise in the prediction

44 To learn the parameters of the GPLVM from the training set {yi}, we maximize the following posterior: And placing the priors GPLVM Training : Learning a Model of Body Poses

45 Gaussian Process Latent Variable Model (GPLVM) Low dimensional parameterization Original space representation Express how well the two value matches Space of Feasible Poses

46 For a new pair (x,y) we can predict using