Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
CISC 489/689 Spring 2009 University of Delaware
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Face Alignment with Part-Based Modeling
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Yuanlu Xu Human Re-identification: A Survey.
Modeling the Shape of People from 3D Range Scans
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Templates, Image Pyramids, and Filter Banks Slides largely from Derek Hoeim, Univ. of Illinois.
A Study of Approaches for Object Recognition
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Automatic Face Recognition for Film Character Retrieval in Feature-Length Films Ognjen Arandjelović Andrew Zisserman.
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
Spatial Pyramid Pooling in Deep Convolutional
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Lecture 6: Feature matching and alignment CS4670: Computer Vision Noah Snavely.
Overview Introduction to local features
The Beauty of Local Invariant Features
Multiclass object recognition
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Computer vision.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Recognition and Matching based on local invariant features Cordelia Schmid INRIA, Grenoble David Lowe Univ. of British Columbia.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Reporter: Fei-Fei Chen. Wide-baseline matching Object recognition Texture recognition Scene classification Robot wandering Motion tracking.
Database-Assisted Low-Dose CT Image Restoration Klaus Mueller Computer Science Lab for Visual Analytics and Imaging (VAI) Stony Brook University Wei Xu,
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.
Lecture 7: Features Part 2 CS4670/5670: Computer Vision Noah Snavely.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Looking at people and Image-based Localisation Roberto Cipolla Department of Engineering Research team
CENG 789 – Digital Geometry Processing 04- Distances, Descriptors and Sampling on Meshes Asst. Prof. Yusuf Sahillioğlu Computer Eng. Dept,, Turkey.
Features, Feature descriptors, Matching Jana Kosecka George Mason University.
MIT AI Lab / LIDS Laboatory for Information and Decision Systems & Artificial Intelligence Laboratory Massachusetts Institute of Technology A Unified Multiresolution.
Local features: detection and description
Distinctive Image Features from Scale-Invariant Keypoints
Predicting Post-Operative Patient Gait Jongmin Kim Movement Research Lab. Seoul National University.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Fast Human Detection in Crowded Scenes by Contour Integration and Local Shape Estimation Csaba Beleznai, Horst Bischof Computer Vision and Pattern Recognition,
Lecture 13: Feature Descriptors and Matching
Lecture 07 13/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
TP12 - Local features: detection and description
Learning Mid-Level Features For Recognition
Recognizing Deformable Shapes
Video Google: Text Retrieval Approach to Object Matching in Videos
Saliency detection Donghun Yeo CV Lab..
Local features: detection and description May 11th, 2017
Paper Presentation: Shape and Matching
Feature description and matching
Training Techniques for Deep Neural Networks
A Tutorial on HOG Human Detection
Features Readings All is Vanity, by C. Allan Gilbert,
CAP 5415 Computer Vision Fall 2012 Dr. Mubarak Shah Lecture-5
Brief Review of Recognition + Context
The SIFT (Scale Invariant Feature Transform) Detector and Descriptor
KFC: Keypoints, Features and Correspondences
SIFT keypoint detection
Video Google: Text Retrieval Approach to Object Matching in Videos
Feature descriptors and matching
Recognition and Matching based on local invariant features
Templates and Image Pyramids
Presentation transcript:

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers

3D Human Pose Inference Difficulties Towards automatic monocular methods Background clutter Geometric transforms –Scale & viewpoint change Illumination changes Fast motions (Self-)occlusion Variability in the human body proportions

Standard Discriminative Approach Train structured model to predict 3D human poses given image descriptor inputs - Multi-valued predictor necessary for multiple plausible pose interpretations Train using images and corresponding 3D human poses

Problems… we address Lack of typical training data (lab, quasi-real) Image descriptors are unstable w.r.t. to geometric deformations and background clutter Clean Quasi-real (QReal) Real Predictor cannot generalize!

This Talk 1.Learning hierarchical image descriptors –Multi-level / coarse-to-fine encodings stable w.r.t. deformation and misalignment in the training set –Metrics for noise suppression (clutter removal) 2.Semi-supervised generalization to multi-valued prediction Hierarchical Image Encodings Distance Metric Learning Multi-Valued Semi-supervised Learning

Why do we need better features? Global histograms (bag of features) are robust to local deformation but sensitive to background clutter Regular grid descriptors can be made robust to clutter but are sensitive to training set misalignments & local deformations

Hierarchical Image Descriptors Coarse-to-fine encodings designed to represent multiple degrees of selectivity & invariance Progressively relax rigid local spatial encodings to weaker models of geometry accumulated over larger regions –Layered encodings (e.g. spatial pyramid) or bottom- up, hierarchical aggregation based on successive template matching + max pooling (e.g. HMAX)

Hierarchical Image Descriptors HMAX (Poggio et al, ) S1 16 Gabor Filter response C1 Select Patches / Object Parts to match against results from previous layer Ω Ω = MAX Ω S2 Ω Encoding C2

Multilevel Image Descriptors SIFT Descriptor, followed by Vector quantization Votes accumulation in a spatial region Concatenate to Descriptor Spatial Pyramid (Lazebnik et al, 2006)

Dealing with Background Clutter Multi-level encodings are still perturbed by background Need to suppress noise Feature selection based on e.g. sparse linear regression tends to be ineffective for global descriptors

Distance Metric Learning Learn Mahalanobis distance that maximizes similarity within chunklets = sets of images of people in similar poses, but differently proportioned and placed on different backgrounds –Relevant Component Analysis (Hillel et. al. 2003) Chunklet 1 Chunklet 2

Suppressing Background Clutter The distance between the learned descriptors computed on different backgrounds is diminished Clean Quasi-real (QReal) Real

This talk … Flexible training 1.Learning hierarchical image descriptors –Multi-level / coarse-to-fine encodings stable w.r.t. deformation and misalignment in the training set –Metrics for noise suppression & clutter removal 2.Semi-supervised generalization to multi- valued prediction Hierarchical Image Encodings Distance Metric Learning Multi-Valued Semi-supervised Learning

x – 3D Human Pose r – Image Descriptor r1r1 r2r2 x1x1 x2x2 Semi-supervised Multi-valued Prediction Manifold Assumption If two image descriptors are close in their intrinsic geometry (e.g. encoded by the graph Laplacian), their 3D outputs should vary smoothly

x – 3D Human Pose r – Image Descriptor r3r3 r1r1 r2r2 x1x1 x2x2 x3x3 Expert Ranking Assumption (Mixture of experts) If two image descriptors are close in their intrinsic image geometry (graph Laplacian), their 3D outputs should be smooth only if predicted by the same expert (prevent smoothing across partitions) Semi-supervised Multi-valued Prediction

Experiments Multi-level encodings –5 hierarchical descriptors –~1500d image descriptor Dataset of human poses –56d human joint angle state vector –5 Motions obtained with motion capture Walk, Pantomime, Bending Pickup, Dancing and Running –3247 x 3 images unlabeled images Multi-valued predictor uses 5 experts

Prediction Accuracy Multilevel vs. Global Descriptors Multilevel / hierarchical descriptors perform significantly better than global histograms or single-layer (fine) grids of local descriptors

Prediction Accuracy Before / after Metric Learning Metric learning improves the prediction error for global histogram descriptors

Run Lola Run Movie Automatic 3D Pose Reconstruction Integrated scanning detection window + 3D prediction Notice: scale change, occlusions (trees), self-occlusion, fast motion

Run Lola Run Movie 3D Pose Reconstruction Integrated scanning detection window + 3D prediction Notice: scale change, fast motion, transparencies

We have argued for 1.Learning of hierarchical image descriptors for better generalization under shape variability and background clutter 2.Semi-supervised generalization to multi- valued prediction Ongoing work –Jointly learn the features and the predictor –Scaling to large datasets (> 500K samples)