Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct. 2010.

Slides:

Advertisements

Similar presentations

Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.

Advertisements

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

A Discriminative Key Pose Sequence Model for Recognizing Human Interactions Arash Vahdat, Bo Gao, Mani Ranjbar, and Greg Mori ICCV2011.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Limin Wang, Yu Qiao, and Xiaoou Tang

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Parsing Clothing in Fashion Photographs

Bangpeng Yao and Li Fei-Fei

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Silhouette Lookup for Automatic Pose Tracking N ICK H OWE.

Human Action Recognition by Learning Bases of Action Attributes and Parts.

2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.

Good morning, everyone, thank you for coming to my presentation.

Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,

LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.

Spatial Pyramid Pooling in Deep Convolutional

Generic object detection with deformable part-based models

ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

Category Discovery from the Web slide credit Fei-Fei et. al.

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

INTRODUCTION Heesoo Myeong and Kyoung Mu Lee Department of ECE, ASRI, Seoul National University, Seoul, Korea Tensor-based High-order.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr

Project 3 Results.

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Describing People: A Poselet-Based Approach to Attribute Classification.

More sliding window detection: Discriminative part-based models

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Action Recognition ECE6504 Xiao Lin.

Recognizing Humans: Action Recognition

CS 1674: Intro to Computer Vision Scene Recognition

Rob Fergus Computer Vision

Bilinear Classifiers for Visual Recognition

“The Truth About Cats And Dogs”

Weakly Supervised Action Recognition

KFC: Keypoints, Features and Correspondences

Part-based visual tracking with online latent structural learning -Rui Yao et al. ICCV 2013 Cvlab Jung ilchae.

Presentation transcript:

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct. 2010

Problem Definition How can we recognize human action from a single Image? ?

Problem Definition Pose as a Latent Valiable

Application News/sports image retrieval and analysis An important cue for video-based action recognition

Previous Works Global template-based representation HOG by Dalal and Triggs. And, Ikizler-Cinbis et al. ICCV09 5 Action Label Pose estimation -> action recognition e.g. Ramanan and Forsyth NIPS03, Ferrari et al. CVPR09

Our Work Examplar based representation Using Poselet as a new definition of a part 6 Pose estimation + action recognition

Discriminative Pose Golfing? Walking? All elements of pose are not equally important Develop integrated learning framework to estimate pose for action recognition

Pose Representation 8 We use a coarse non-parametric pose representation – An action-specific variant of the poselet [Bourdev&Malik ICCV09] A poselet is a set of patches not only with similar pose configuration, but also from the same action class.

Poselets 9 Poselets obtained by clustering ground-truth joint positions of body parts for each action

Poselets Visualization of the Poselets for Running images For Every Action Class: 1.Devide annotation to 4 parts 2.Cluster on normalized x,y 3.Remove small clusters 4.Crop that part of image Learn SVM for every Poselet with HoG +: From that action -:same part from other action 5 (Actions) x 4 (Parts) x 5 (Clusters) = 100 – 10 (Remove) = 90 = 26 (leg) + 20 (L-arm) + 20 (R-arm) + 24 (Upper body)

Model Formation ⌂ Using Pictorial Structure Model of Pedro Felzenswalb Training: Test:

Model Formation Develop a scoring function – Should have high score for correct action label – Low score for other action labels – Model parameters

Model Formation 13 Pose Action Label Image Choose best pose L

Model Formation 14 Pose Action Label Image Running Large score for

Model Formation 15 Pose Action Label Image Sitting Small score for

Model Formulation Pairwise Relation Part Appearance

Part Appearance Potential 17 Pose Action Label Image Poselet matching

Pairwise Potential 18 Pose Action Label Image Relative body part locations

Full Model 19 Pose Action Label Image Model parameters learned using max-margin

Learning and Inference

Latent SVM We Should Minimize Loss Function !

Latent SVM

Results 23 Still image action dataset (Internet Image) – Five action categories – 2458 images total – Train using 1/3 of images from each category

Visualization of latent pose 24 Successful classification examples Unsuccessful classification examples

Any question ?