Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Is Random Model Better? -On its accuracy and efficiency-
Random Forest Predrag Radenković 3237/10
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
The Shape Boltzmann Machine S. M. Ali Eslami Nicolas Heess John Winn CVPR 2012 Providence, Rhode Island A Strong Model of Object Shape.
Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.
Real-Time Human Pose Recognition in Parts from Single Depth Images
Semantic Texton Forests for Image Categorization and Segmentation We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד.
Po-Hsiang Chen Advisor: Sheng-Jyh Wang 2/13/2012.
Face Alignment with Part-Based Modeling
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.
AdaBoost & Its Applications
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Kinect Case Study CSE P 576 Larry Zitnick
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.
Ensemble Learning: An Introduction
Evaluating Hypotheses
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Sean Ryan Fanello. ^ (+9 other guys. )
Bag of Video-Words Video Representation
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Miguel Reyes 1,2, Gabriel Dominguez 2, Sergio Escalera 1,2 Computer Vision Center (CVC) 1, University of Barcelona (UB) 2
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Part 2: Change detection for SAR Imagery (based on Chapter 18 of the Book. Change-detection methods for location of mines in SAR imagery, by Dr. Nasser.
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
CLASSIFICATION: Ensemble Methods
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Human pose recognition from depth image MS Research Cambridge.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Expectation-Maximization (EM) Case Studies
3d Pose Detection Used by Kinect
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱 陳思安.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
COMP24111: Machine Learning Ensemble Models Gavin Brown
11/05/15 How the Kinect Works Computational Photography Derek Hoiem, University of Illinois Photo frame-grabbed from:
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Presenter: Jae Sung Park
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
Microsoft Kinect Jason Wong Pierce Nichols Rick Berggreen Tri Le.
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Computer Vision Lecture 12: Image Segmentation II
Real-Time Human Pose Recognition in Parts from Single Depth Image
Dynamical Statistical Shape Priors for Level Set Based Tracking
Eric Grimson, Chris Stauffer,
Roberto Battiti, Mauro Brunato
Grouping/Segmentation
Multiple Organ detection in CT Volumes Using Random Forests - Week 5
Presentation transcript:

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed

Pose Estimation Input: image or video containing people Output: joints locations (pose)

Applications Gaming Human-computer interaction Security Telepresence Health-care

Related Work No previous “interactive” pose estimation from depth cameras.

Approach Focus: detecting from a single depth image a small set of 3D position candidates for each skeletal joint Treat the segmentation into body parts as a per-pixel classification task Training data: generate realistic synthetic depth images of humans of many shapes and sizes in highly varied poses sampled from a large motion capture database

Approach Train a deep randomized decision forest classifier which avoids overfitting by using hundreds of thousands of training images (can use GPUs to speed up the classification) Spatial modes of the inferred per-pixel distributions are computed using mean shift resulting in the 3D joint proposals. 200 Frame/Second on Xbox 360 GPU

Synthetic data Use Motion capture data (mocap) The database consists of approximately 500k frames in a few hundred sequences of driving, dancing, kicking, running, navigating menus, etc. Use a subset of 100k poses such that no two poses are closer than 5cm. uses standard computer graphics techniques to render depth

Synthetic data

Depth Image Features Simple depth comparison features

Depth Image Features

Randomized Decision Forests

A forest is an ensemble of T decision trees, each consisting of split and leaf nodes. Each split node consists of a feature f Ɵ and a threshold T. To classify pixel x in image I, one starts at the root and repeatedly evaluates Eq. 1, branching left or right according to the comparison to threshold. At the leaf node reached in tree t, a learned distribution P t (c|I, x) over body part labels c is stored. The distributions are averaged together for all trees in the forest to give the final classification

Training the forest Each tree is trained on a different set of randomly synthesized images. A random subset of 2000 example pixels from each image is chosen to ensure a roughly even distribution across body parts.

Training the forest

To keep the training times down they employ a distributed implementation. Training 3 trees to depth 20 from 1 million images takes about a day on a 1000 core cluster.

Joint positions proposals Body part recognition as described above infers per-pixel information. This information must now be pooled across pixels to generate reliable proposals for the positions of 3D skeletal joints. A simple option is to accumulate the global 3D centers of probability mass for each part, using the known calibrated depth. However, outlying pixels severely degrade the quality of such a global estimate.

Joint positions proposals Instead they employ a local mode-finding approach based on mean shift with a weighted Gaussian kernel. They define a density estimator per body part as:

Joint positions proposals

Experiments and Discussion Still