School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping
Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images
Low level feature Riding bike
Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level featureHigh-level representation Riding bike
Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike
Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level featureHigh-level representation Riding bike
Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level featureHigh-level representation Riding bike
Low level feature - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike
Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images
Action Attributes and Parts Attributes: …… semantic descriptions of human actions
Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Discriminative classifier, e.g. SVM
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a: Image feature vector … Action bases Φ
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a: Image feature vector … Action bases Φ
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a: Image feature vector … Action bases Φ
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector SVM
Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a: Image feature vector Riding bike
Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images
Bases of Atr. & Parts: Training w Φ a Input: Output: sparse Jointly estimate and : ΦW …
Bases of Atr. & Parts: Testing … w Φ a Input: Output: sparse Estimate w:
Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images
1. PASCAL Action Dataset /
1. PASCAL Action Dataset Contain 9 classes, there are 21,738 images in total; Randomly select 50% of each class for training/validation and the remain images for testing; 14 attributes, 27 objects, 150 poselets; The number of action bases are set to 400 and 600 respectively. The and values are set to 0.1 and 0.15.
Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” POSELETS SURREY_MK UCLEAR_DOSP … w Φ a
… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” POSELETS SURREY_MK UCLEAR_DOSP Average precision Using computer Classification Result
… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result
… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result
… w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets Classification Result
Control Experiment … w Φ a Use “a” Use “w” A: attribute O: object P: poselet
2. Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper
2. Stanford 40 Actions contains 40 diverse daily human actions; 180 ∼ 300 images for each class, 9532 real world images in total; All the images are obtained from Google, Bing, and Flickr; large variations in human pose, appearance, and background clutter. Cutting vegetables DrinkingFeeding horse Fixing bike GardeningHolding umbrella Playing guitar Playing violin Pouring liquid ReadingRepairing car Riding bike Shooting arrow Smoking cigarette Taking photo Walking dog Washing dishes Watching television Drinking Gardening Smoking Cigarette
35 Result: Randomly select 100 images in each class for training, and the remaining images for testing. 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The and values are set to 0.1 and Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision
Control Experiment … w Φ a A: attribute O: object P: poselet Use “a” Use “w”
Outline : 3 Experiments: PASCAL & Stanford 40 Actions 4 Intuition: Action Attributes and Parts 2 5 Algorithm: Learning Bases of Attributes and Parts Conclusion 1 Action Classification in Still Images
Partwise Bag-of-Words (PBoW) Representation: Local feature Body part localization PBoW generation head-wise BoW limb-wise BoW leg-wise BoW foot-wise BoW
Local Action Attribute Method: 1. Label the action samples according to different parts static vertical move horizontal move Head static swing … Limb … For each part, we define a new set of low-level semantic to re- class the training action samples static … Leg … static … Foot …
Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of semantic we define. for each part train … … …
Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle- level representation through the framework as follow: Head-wise BoW Limb-wise BoW Leg-wise BoW Foot-wise BoW Combine this four part to built a new histogram representation of the sample One action sample
Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of action samples. It can be used to classify. Training set Testing set SVM K-NN Training set Testing set
School of Electronic Information Engineering, Tianjin University Thank you