School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping

Outline ： 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images

Low level feature Riding bike

Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike

Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level featureHigh-level representation Riding bike

Low level feature - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike

Action Attributes and Parts Attributes: …… semantic descriptions of human actions

Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Discriminative classifier, e.g. SVM

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector … Action bases Φ

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a: Image feature vector … Action bases Φ

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector SVM

Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector Riding bike

Bases of Atr. & Parts: Training w Φ a Input: Output: sparse Jointly estimate and : ΦW …

Bases of Atr. & Parts: Testing … w Φ a Input: Output: sparse Estimate w:

1. PASCAL Action Dataset http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008 /

1. PASCAL Action Dataset Contain 9 classes, there are 21,738 images in total; Randomly select 50% of each class for training/validation and the remain images for testing; 14 attributes, 27 objects, 150 poselets; The number of action bases are set to 400 and 600 respectively. The  and  values are set to 0.1 and 0.15.

Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” POSELETS SURREY_MK UCLEAR_DOSP … w Φ a

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” POSELETS SURREY_MK UCLEAR_DOSP Average precision Using computer Classification Result

… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result

Control Experiment … w Φ a Use “a” Use “w” A: attribute O: object P: poselet

2. Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html

2. Stanford 40 Actions contains 40 diverse daily human actions; 180 ∼ 300 images for each class, 9532 real world images in total; All the images are obtained from Google, Bing, and Flickr; large variations in human pose, appearance, and background clutter. Cutting vegetables DrinkingFeeding horse Fixing bike GardeningHolding umbrella Playing guitar Playing violin Pouring liquid ReadingRepairing car Riding bike Shooting arrow Smoking cigarette Taking photo Walking dog Washing dishes Watching television Drinking Gardening Smoking Cigarette

35 Result: Randomly select 100 images in each class for training, and the remaining images for testing. 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The  and  values are set to 0.1 and 0.15. Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision

Control Experiment … w Φ a A: attribute O: object P: poselet Use “a” Use “w”

Partwise Bag-of-Words (PBoW) Representation:  Local feature  Body part localization  PBoW generation head-wise BoW limb-wise BoW leg-wise BoW foot-wise BoW

Local Action Attribute Method: 1. Label the action samples according to different parts static vertical move horizontal move Head static swing … Limb … For each part, we define a new set of low-level semantic to re- class the training action samples static … Leg … static … Foot …

Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of semantic we define. for each part train … … …

Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle- level representation through the framework as follow: Head-wise BoW Limb-wise BoW Leg-wise BoW Foot-wise BoW Combine this four part to built a new histogram representation of the sample One action sample

Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of action samples. It can be used to classify. Training set Testing set SVM K-NN Training set Testing set

School of Electronic Information Engineering, Tianjin University Thank you

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

Similar presentations

Presentation on theme: "School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.

Similar presentations

Presentation on theme: "School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping."— Presentation transcript:

Similar presentations

About project

Feedback