Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct. 2010
Problem Definition How can we recognize human action from a single Image? ?
Problem Definition Pose as a Latent Valiable
Application News/sports image retrieval and analysis An important cue for video-based action recognition
Previous Works Global template-based representation HOG by Dalal and Triggs. And, Ikizler-Cinbis et al. ICCV09 5 Action Label Pose estimation -> action recognition e.g. Ramanan and Forsyth NIPS03, Ferrari et al. CVPR09
Our Work Examplar based representation Using Poselet as a new definition of a part 6 Pose estimation + action recognition
Discriminative Pose Golfing? Walking? All elements of pose are not equally important Develop integrated learning framework to estimate pose for action recognition
Pose Representation 8 We use a coarse non-parametric pose representation – An action-specific variant of the poselet [Bourdev&Malik ICCV09] A poselet is a set of patches not only with similar pose configuration, but also from the same action class.
Poselets 9 Poselets obtained by clustering ground-truth joint positions of body parts for each action
Poselets Visualization of the Poselets for Running images For Every Action Class: 1.Devide annotation to 4 parts 2.Cluster on normalized x,y 3.Remove small clusters 4.Crop that part of image Learn SVM for every Poselet with HoG +: From that action -:same part from other action 5 (Actions) x 4 (Parts) x 5 (Clusters) = 100 – 10 (Remove) = 90 = 26 (leg) + 20 (L-arm) + 20 (R-arm) + 24 (Upper body)
Model Formation ⌂ Using Pictorial Structure Model of Pedro Felzenswalb Training: Test:
Model Formation Develop a scoring function – Should have high score for correct action label – Low score for other action labels – Model parameters
Model Formation 13 Pose Action Label Image Choose best pose L
Model Formation 14 Pose Action Label Image Running Large score for
Model Formation 15 Pose Action Label Image Sitting Small score for
Model Formulation Pairwise Relation Part Appearance
Part Appearance Potential 17 Pose Action Label Image Poselet matching
Pairwise Potential 18 Pose Action Label Image Relative body part locations
Full Model 19 Pose Action Label Image Model parameters learned using max-margin
Learning and Inference
Latent SVM We Should Minimize Loss Function !
Latent SVM
Results 23 Still image action dataset (Internet Image) – Five action categories – 2458 images total – Train using 1/3 of images from each category
Visualization of latent pose 24 Successful classification examples Unsuccessful classification examples
Any question ?