Describing objects by their attributes, A Farhadi, I Endres, D Hoiem, D Forsyth Computer Vision and Pattern Recognition, 2009. CVPR 2009 Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, S. Harmeling, CVPR 2009 Many others
Computer Vision image features classification [.1 -.9.1.231 -.1] ?
Computer Vision image features classification What feature representation should we use?
Computer Vision image features classification [.1 -.9.1.231 -.1] Now we can talk… attributes Has hair, has skin, has ear, has eye, has arms
Attributes Properties shared by many objects Explicit semantics Facilitate human-CPU communication Materials (glass, fur, wood, etc.) Parts (has wheel, has tail, etc.) Shape (boxy, cylindrical, etc.) 11 Based on a slide by David Forsyth
Example Attributes Face Tracer Image Search Smiling Asian Men With Glasses Kumar et al., 2008 12
Example Attributes Lampert et al. 2009 14 Slide credit: Devi Parikh
Example Attributes Welinder et al. 2010 15 Slide credit: Devi Parikh
Attribute Models Classifiers for binary attributes Kumar et al. 2010 16 Slide credit: Devi Parikh
Why attributes? How humans naturally describe visual concepts Image search 17 I want elegant silver sandals with high heels Slide credit: Devi Parikh
Example Attributes Verification classifier SAME Kumar et al., 2010
Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 19
Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 20
Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia) 21
Zero-shot Learning Aye-ayes Are nocturnal Live in trees Have large eyes Have long middle fingers Which one of these is an aye-aye? Humans can learn from descriptions (zero examples). Slide adapted from Christoph Lampert by Devi Parikh 22
Is this a giraffe?No. Is this a giraffe?Yes.Is this a giraffe?No. 23 Slide credit: Devi Parikh
I think this is a giraffe. What do you think? No, its neck is too short for it to be a giraffe. Ah! These must not be giraffes either then. [Animals with even shorter necks] …… Current belief Focused feedback Knowledge of the world Feedback on one, transferred to many Learner learns better from its mistakes Accelerated discriminative learning with few examples Learner learns better from its mistakes Accelerated discriminative learning with few examples Parkash and Parikh, 2012 24 Slide credit: Devi Parikh
Which Attributes to Describe? 25 (a) (b) (c) (d) (e) (f) Please choose a person to the left of the person who is frowning Sadovnik et al. 2013
Related Work Describing objects by attributes Learn semantic attributes for object classification [Farhadi et. al., 2009] Clothing recognition with collar, sleeve length, placket, etc. [Zhang et. al. 2008]
Related Work Person identification with clothing Bounding box under face [Anguelov, 2007] Clothing segmentation [Gallagher, 2008]
Dataset Preparation 1856 people from the web. Images are unconstrained.
Dataset Preparation $400 spent for collecting 283,107 labels on Amazon Mechanical Turk (AMT).
The System Pose estimation Feature extraction & quantization Attribute classifier 1 Attribute classifier 2 Attribute classifier M … Multi-attribute CRF inference Feature 1 Feature N … SVM 1 SVM N … Combine features SVM Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve A: attribute F: feature A2A2 A1A1 A3A3 F1F1 F2F2 F3F3 F4F4 A4A4 … …
Pose Estimation [Eichner et. al., 2010] Perform upper body detection, by using complementary results from face detector and deformable part models. Foreground highlighting within the enlarged upper body bounding box. Parse the upper body into head, torso, upper and lower parts of the left and right arms.
SIFT descriptor extracted over the sampling grid. Similar procedure for the arm regions. Feature Extraction
Maximum Response Filters [Varma 2005] LAB color Skin probability RGB image Skin probability MRF bank
Feature Extraction Raw features are quantized using soft K- means (K=5 in our implementation). Quantized features are aggregated over various body regions, by max or average pooling. For learning color attributes, the feature is LAB color aggregated from non-skin regions. Feature typeRegionPooling method SIFTTorsoAverage TextureLeft upper armMax ColorRight upper arm Skin probabilityLeft lower arm Right lower arm
Feature Fusion SVM is a kernel-based classification technique. Feature fusion solution: combined SVM is trained using weighted sum of the kernels. Combining features consistently outperforms the single best feature. SVM 1 SVM 2 SVM N K1K1 K2K2 KNKN Predict accuracy 2 K1K1 K2K2 KNKN SVM Combined Predict accuracy 1 … Predict accuracy N Attribute prediction
Recap Pose estimation Feature extraction & quantization Attribute classifier 1 Attribute classifier 2 Attribute classifier M … Multi-attribute CRF inference Feature 1 Feature N … SVM 1 SVM N … Combine features SVM Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve A: attribute F: feature A2A2 A1A1 A3A3 F1F1 F2F2 F3F3 F4F4 A4A4 … …
Attribute Inference with CRF Each attribute is a node. All nodes are pair-wise connected. The edge connecting 2 nodes corresponds to the joint probability of these 2 attributes. A i : Attribute i F i : Features for A i A6A6 A6A6 F6F6 F6F6 A2A2 A2A2 A1A1 A1A1 A3A3 A3A3 A5A5 A5A5 A4A4 A4A4 F1F1 F1F1 F2F2 F2F2 F3F3 F3F3 F4F4 F4F4 F5F5 F5F5
CRF for Attribute Learning 44 For a fully connected CRF, we maximize: The CRF potential is maximized using standard belief propagation technique [Tappen et. al. 2003]. [Following CRF model] A1A1 AMAM F1F1 FMFM A2A2 F2F2 … … Node potentialEdge potential
No necktie (Wear necktie) Has collar Mens Has placket Low exposure No scarf Solid pattern Black Short sleeve (Long sleeve) V-shape neckline Dress (Suit) Wear necktie Has collar Mens Has placket High exposure (Low exposure) No scarf Solid pattern Gray & black Long sleeve V-shape neckline Suit No necktie Has collar Mens Has placket Low exposure Wear scarf Solid pattern Brown & black No sleeve (long sleeve) V-shape neckline Tank top (outerwear)
Experimental Results Questions that we are interested in: Does combining features improve performance? Does the pose model help? Does the CRF work?
Pose Vs No Pose - Experiment Setup Positive and negative examples are balanced. SVM classification Chi-squared kernel Leave-1-out cross validation Comparison with attribute learning without pose model. Features are extracted within a scaled clothing mask under the face. Evaluation performed under the same experiment settings. The clothing mask [Gallagher 2008]
Steve Jobs: solid pattern, mens clothing, black color, long sleeves, round neckline, outerwear, wearing scarf
The predicted dressing style of weddings: Male: solid pattern, suit, long-sleeves, V- shape neckline, wearing necktie, wearing scarf, has collar, has placket Female: high skin exposure, no sleeves, dress, other neckline shapes, white, >2 colors, floral pattern
Gender Recognition Face-based: Project faces in the Fisher space. Clothing-based: The gender output of our system. Better gender recognition is achieved by combining face and clothing.
Conclusions Clothing attributes can be better learned with a human pose model. CRF offers improved performance by exploring attribute relations. Proposed novel applications that exploit the predicted attributes.
Future Work Expect even better performance by using the (almost) ground truth pose estimated by Kinect sensors [Shotton et. al., Best Paper CVPR 2011]. Incorporate clothing information in person identification.
68 The Loop Images and Computer Vision What we know about people
69 The Loop: This talk Examples of how social data has helped understand images of people Some things Ive learned about people from computer vision
70 What Is Computer Vision? Vision is the process of discovering from images what is present in the world, and where it is. -- David Marr, Vision (1982)
71 What Is Computer Vision? Vision deals with: Uncertainty and Probability (What is present) Geometry (Where it is) Humans are really good at this!