Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Slides:

Advertisements

Similar presentations

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Advertisements

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

Lecture 31: Modern object recognition

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Human Action Recognition by Learning Bases of Action Attributes and Parts.

1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Large-Scale, Real-World Face Recognition in Movie Trailers Week 2-3 Alan Wright (Facial Recog. pictures taken from Enrique Gortez)

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection CVPR2013 POSTER.

Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.

Fast intersection kernel SVMs for Realtime Object Detection

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.

Good morning, everyone, thank you for coming to my presentation.

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

Spatial Pyramid Pooling in Deep Convolutional

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Image Classification using Sparse Coding: Advanced Topics

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Generic object detection with deformable part-based models

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

The Three R’s of Vision Jitendra Malik.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.

Object Detection Sliding Window Based Approach Context Helps

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Week 9 Presented by Christina Peterson. Recognition Accuracies on UCF Sports data set Method Accuracy (%)DivingGolfingKickingLiftingRidingRunningSkating.

Object Detection with Discriminatively Trained Part Based Models

Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Semantic Embedding Space for Zero Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.

Locality-constrained Linear Coding for Image Classification

Training and Evaluating of Object Bank Models Presenter ： Changyu Liu Advisor ： Prof. Alex Interest ： Multimedia Analysis May 16 th, 2013.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

Recognition Using Visual Phrases

Sparse Granger Causality Graphs for Human Action Classification Saehoon Yi and Vladimir Pavlovic Rutgers, The State University of New Jersey.

Describing People: A Poselet-Based Approach to Attribute Classification.

Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Sreekanth Vempati ( ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,

Object detection with deformable part-based models

Learning Mid-Level Features For Recognition

Action Recognition ECE6504 Xiao Lin.

Thesis Advisor : Prof C.V. Jawahar

Learning Object Context for Dense Captioning

Presentation transcript:

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei 1 Stanford University

2 Action Classification in Still Images Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 Riding bike

3 Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

4 Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

5 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

6 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

7 Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Farhadi et al., 2009 Lampert et al., 2009 Berg et al., 2010 Parikh & Grauman, 2011 Gupta et al., 2009 Yao & Fei-Fei, 2010 Torresani et al., 2010 Li et al., 2010 Yang et al., 2010 Maji et al., 2011 Liu et al., 2011 Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 8

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 9

10 Action Attributes and Parts Attributes: …… semantic descriptions of human actions

11 Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Lampert et al., 2009 Berg et al., 2010 Discriminative classifier, e.g. SVM

12 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector Object Bank, Li et al., 2010 Poselet, Bourdev & Malik, 2009

13 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector

14 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector … Action bases Φ

15 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ

16 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ

17 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector

18 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector Sparse Encodes context Robust to initially weak detections

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 19

20 Bases of Atr. & Parts: Training w Φ a Input: Output: sparse L1 regularization, sparsity of W Elastic net, sparsity of [Zou & Hasti, 2005] Accurate approximation Jointly estimate and : ΦW Optimization: stochastic gradient descent. Φ …

21 Bases of Atr. & Parts: Testing … w Φ a Input: Output:sparse Estimate w : Optimization: stochastic gradient descent. L1 regularization, sparsity of W Accurate approximation

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 22

23 PASCAL VOC 2010 Action Dataset Figure credit: Ivan Laptev 9 classes, trainval / testing images per class 14 attributes – trained from the trainval images; 27 objects – taken from Li et al, NIPS 2010; 150 poselets – taken from Bourdev & Malik, ICCV 2009.

24 VOC 2010: Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP … w Φ a

25 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer VOC 2010: Classification Result

26 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases

27 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases

28 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases

29 VOC 2010: Control Experiment … w Φ a Mean average precision Use “a” Use “w” A: attribute O: object P: poselet

30 PASCAL VOC 2011 Result Our method ranks the first in nine out of ten classes in comp10. Others’ best in comp9 Others’ best in comp10 Our method Jumping Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking

31 PASCAL VOC 2011 Result Others’ best in comp9 Others’ best in comp10 Our method Jumping Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Our method achieves the best performance in five out of ten classes if we consider both comp9 and comp10.

32 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper 40 actions classes, 9532 real world images from Google, Flickr, etc.

33 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper 40 actions classes, 9532 real world images from Google, Flickr, etc. Riding bike Fixing bike

34 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper 40 actions classes, 9532 real world images from Google, Flickr, etc. Writing on board Writing on paper

35 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper 40 actions classes, 9532 real world images from Google, Flickr, etc. Drinking Gardening Smoking Cigarette

36 Stanford 40 Actions: Result We use 45 attributes, 81 objects, and 150 poselets. Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision

37 Stanford 40 Actions: Result Average precision

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 38

39 Conclusion Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector

40 Acknowledgement