School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
Lecture 31: Modern object recognition
Data-driven Visual Similarity for Cross-domain Image Matching
Steerable Part Models Hamed Pirsiavash and Deva Ramanan
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Human Action Recognition by Learning Bases of Action Attributes and Parts.
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Robust Object Tracking via Sparsity-based Collaborative Model
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Discriminative Segment Annotation in Weakly Labeled Video Kevin Tang, Rahul Sukthankar Appeared in CVPR 2013 (Oral)
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Distributed Representations of Sentences and Documents
Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.
Recognizing Daily Routines Through Activity Spotting Ulf Blanke and Bernt Schiele Computer Science Department, TU Darmstadt.
Bag-of-Words based Image Classification Joost van de Weijer.
Bag of Video-Words Video Representation
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Why Categorize in Computer Vision ?. Why Use Categories? People love categories!
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Spam Detection Ethan Grefe December 13, 2013.
Locality-constrained Linear Coding for Image Classification
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
Image Classification over Visual Tree Jianping Fan Dept of Computer Science UNC-Charlotte, NC
Recognition Using Visual Phrases
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
Extracting Simple Verb Frames from Images Toward Holistic Scene Understanding Prof. Daphne Koller Research Group Stanford University Geremy Heitz DARPA.
Describing People: A Poselet-Based Approach to Attribute Classification.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
SHAHAB iCV Research Group.
Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,
Deeply learned face representations are sparse, selective, and robust
Guillaume-Alexandre Bilodeau
Object detection with deformable part-based models
Thesis Advisor : Prof C.V. Jawahar
“The Truth About Cats And Dogs”
Multiple Feature Learning for Action Classification
Presentation transcript:

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping

Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

Low level feature Riding bike

Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level featureHigh-level representation Riding bike

Low level feature - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike

Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

Action Attributes and Parts Attributes: …… semantic descriptions of human actions

Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Discriminative classifier, e.g. SVM

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector … Action bases Φ

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a: Image feature vector … Action bases Φ

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a: Image feature vector … Action bases Φ

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector SVM

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector Riding bike

Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

Bases of Atr. & Parts: Training w Φ a Input: Output: sparse Jointly estimate and : ΦW …

Bases of Atr. & Parts: Testing … w Φ a Input: Output: sparse Estimate w:

Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

1. PASCAL Action Dataset /

1. PASCAL Action Dataset Contain 9 classes, there are 21,738 images in total; Randomly select 50% of each class for training/validation and the remain images for testing; 14 attributes, 27 objects, 150 poselets; The number of action bases are set to 400 and 600 respectively. The  and  values are set to 0.1 and 0.15.

Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” POSELETS SURREY_MK UCLEAR_DOSP … w Φ a

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” POSELETS SURREY_MK UCLEAR_DOSP Average precision Using computer Classification Result

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result

Control Experiment … w Φ a Use “a” Use “w” A: attribute O: object P: poselet

2. Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper

2. Stanford 40 Actions contains 40 diverse daily human actions; 180 ∼ 300 images for each class, 9532 real world images in total; All the images are obtained from Google, Bing, and Flickr; large variations in human pose, appearance, and background clutter. Cutting vegetables DrinkingFeeding horse Fixing bike GardeningHolding umbrella Playing guitar Playing violin Pouring liquid ReadingRepairing car Riding bike Shooting arrow Smoking cigarette Taking photo Walking dog Washing dishes Watching television Drinking Gardening Smoking Cigarette

35 Result: Randomly select 100 images in each class for training, and the remaining images for testing. 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The  and  values are set to 0.1 and Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision

Control Experiment … w Φ a A: attribute O: object P: poselet Use “a” Use “w”

Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

Partwise Bag-of-Words (PBoW) Representation:  Local feature  Body part localization  PBoW generation head-wise BoW limb-wise BoW leg-wise BoW foot-wise BoW

Local Action Attribute Method: 1. Label the action samples according to different parts static vertical move horizontal move Head static swing … Limb … For each part, we define a new set of low-level semantic to re- class the training action samples static … Leg … static … Foot …

Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of semantic we define. for each part train … … …

Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle- level representation through the framework as follow: Head-wise BoW Limb-wise BoW Leg-wise BoW Foot-wise BoW Combine this four part to built a new histogram representation of the sample One action sample

Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of action samples. It can be used to classify. Training set Testing set SVM K-NN Training set Testing set

School of Electronic Information Engineering, Tianjin University Thank you