Using Attributes to Describe What People Wear

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
5.1 Rules for Exponents Review of Bases and Exponents Zero Exponents
3.6 Support Vector Machines
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
In-Home Pantry Inventory Updated: November Background and Methodology Background In 1996 a National Eating Trends (NET) pantry survey found that.
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
EuroCondens SGB E.
Reinforcement Learning
Slide 1Fig 26-CO, p.795. Slide 2Fig 26-1, p.796 Slide 3Fig 26-2, p.797.
Basic Steps 1.Compute the x and y image derivatives 2.Classify each derivative as being caused by either shading or a reflectance change 3.Set derivatives.
Sequential Logic Design
1 Changing Profile of Household Sector Credit and Deposits in Indian Banking System -Deepak Mathur November 30, 2010.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
Bayesian network for gene regulatory network construction
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
1 OFDM Synchronization Speaker:. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outline OFDM System Description Synchronization What is Synchronization?
Attributes for Classifier Feedback Amar Parkash and Devi Parikh.
The basics for simulations
A sample problem. The cash in bank account for J. B. Lindsay Co. at May 31 of the current year indicated a balance of $14, after both the cash receipts.
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Matthias Wimmer, Bernd Radig, Michael Beetz Chair for Image Understanding Computer Science TU München, Germany A Person and Context.
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Presenter: Duan Tran (Part of slides are from Pedro’s)
Biology 2 Plant Kingdom Identification Test Review.
Multi-Attribute Spaces: Calibration for Attribute Fusion and Similarity Search University of Oxford 5 th December 2012 Walter Scheirer, Neeraj Kumar, Peter.
Name of presenter(s) or subtitle Canadian Netizens February 2004.
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Adding Unlabeled Samples to Categories by Learned Attributes Jonghyun Choi Mohammad Rastegari Ali Farhadi Larry S. Davis PPT Modified By Elliot Crowley.
Group Meeting Presented by Wyman 10/14/2006
Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Classification Classification Examples
A Unified Framework for Context Assisted Face Clustering
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Machine learning continued Image source:
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Affordance Prediction via Learned Object Attributes Tucker Hermans James M. Rehg Aaron Bobick Computational Perception Lab School of Interactive Computing.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
© 2013 IBM Corporation Efficient Multi-stage Image Classification for Mobile Sensing in Urban Environments Presented by Shashank Mujumdar IBM Research,
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Enhancing Human-Machine Communication via Visual Attributes Devi Parikh Virginia Tech.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Describing People: A Poselet-Based Approach to Attribute Classification.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
SHAHAB iCV Research Group.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Nonparametric Semantic Segmentation
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Attributes and Simile Classifiers for Face Verification
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Using Attributes to Describe What People Wear Andy Gallagher October 14, 2013 with Huizhong Chen and Bernd Girod

Objective Attribute learning List of attributes Men’s Black color Sweater Long sleeve Solid pattern Low skin exposure … Attribute learning

3

Outline Attributes Describing Clothing with Attributes ! Miscellaneous Topics !

Attributes

Attributes Describing objects by their attributes, A Farhadi, I Endres, D Hoiem, D Forsyth Computer Vision and Pattern Recognition, 2009. CVPR 2009 Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer, C. Lampert, H. Nickisch, S. Harmeling, CVPR 2009 Many others

Computer Vision image features classification

Computer Vision image ? [ .1 -.9 .1 .231 -.1] features classification

What feature representation should we use? Computer Vision image What feature representation should we use? features classification

Computer Vision image features attributes classification [ .1 -.9 .1 .231 -.1] features Now we can talk… Has hair, has skin, has ear, has eye, has arms attributes classification

Attributes Properties shared by many objects Explicit semantics Facilitate human-CPU communication Materials (glass, fur, wood, etc.) Parts (has wheel, has tail, etc.) Shape (boxy, cylindrical, etc.) Based on a slide by David Forsyth 11

Example Attributes Face Tracer Image Search “Smiling Asian Men With Glasses” Kumar et al., 2008

Example Attributes Farhadi et al. 2009

Slide credit: Devi Parikh Example Attributes Lampert et al. 2009 Slide credit: Devi Parikh

Slide credit: Devi Parikh Example Attributes Welinder et al. 2010 Slide credit: Devi Parikh

Slide credit: Devi Parikh Attribute Models Classifiers for binary attributes Kumar et al. 2010 Slide credit: Devi Parikh

Why attributes? How humans naturally describe visual concepts Image search I want elegant silver sandals with high heels Slide credit: Devi Parikh

Example Attributes Verification classifier SAME Kumar et al., 2010

Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia)

Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia)

Why attributes? An okapi is a mammal with a reddish dark back, with striking horizontal white stripes on the front and back legs. (Wikipedia)

Zero-shot Learning Aye-ayes Are nocturnal Live in trees Have large eyes Have long middle fingers Which one of these is an aye-aye? Humans can learn from descriptions (zero examples). Slide adapted from Christoph Lampert by Devi Parikh

Slide credit: Devi Parikh Is this a giraffe? No. Is this a giraffe? Yes. Is this a giraffe? No. In the traditional active learning setting, after looking at a few examples, the learner identifies an image that is confusing and asks the teacher for a label. “Is this a giraffe?”. In this case, the teacher says no. The learner updates her model, and identifies another confusing image and asks, “Is this a giraffe?”. The teacher says yes. Is this a giraffe? No. Slide credit: Devi Parikh

Learner learns better from its mistakes Parkash and Parikh, 2012 Focused feedback Knowledge of the world Current belief I think this is a giraffe. What do you think? No, its neck is too short for it to be a giraffe. Learner learns better from its mistakes Accelerated discriminative learning with few examples [Animals with even shorter necks] …… The learner, picks a confusing image. Then instead of just demanding a label for the image, the learner gives the example some thought and determines its belief about the example, and communicates it to the teacher. If it thinks this image is a giraffe, it says “I think this is a giraffe. What do you think?” The teacher says “No this is not a giraffe, because its neck is too short for it to be a giraffe.”. With this, the learner realizes that if this animal’s neck is too short for it to be a giraffe, than all animals with even shorter necks than the query image must not be giraffes either. Hence resulting in a much better understanding of giraffes. At a high-level, In our proposed active learning paradigm, the learner conveys his/her current belief about an actively chosen query. If wrong, the supervisor provides focused feedback that conveys the teacher’s knowledge about the world. The learner takes the feedback provided on one image and transfers it to many previously unlabeled images. This results in (1) the classifier learning better from its mistakes and (2) accelerated learning with few labeled examples. Ah! These must not be giraffes either then. Feedback on one, transferred to many Slide credit: Devi Parikh

Which Attributes to Describe? (f) Please choose a person to the left of the person who is frowning 25 Sadovnik et al. 2013

Describing Clothing with Attributes

Objective Attribute learning List of attributes Men’s Black color Sweater Long sleeve Solid pattern Low skin exposure … Attribute learning

Recommend and Analyze Recommendations Formal Sport

Person Identification

Related Work Describing objects by attributes Learn semantic attributes for object classification [Farhadi et. al., 2009] Clothing recognition with collar, sleeve length, placket, etc. [Zhang et. al. 2008] Gender recognition: use adaboost and random forest with HOG feature to classify male/female

Related Work Person identification with clothing Bounding box under face [Anguelov, 2007] Clothing segmentation [Gallagher, 2008] Gender recognition: use adaboost and random forest with HOG feature to classify male/female

Dataset Preparation 1856 people from the web. Images are unconstrained.

Dataset Preparation $400 spent for collecting 283,107 labels on Amazon Mechanical Turk (AMT).

Dataset Statistics 23 Binary 3 Multiclass

The System … … A: attribute F: feature Multi-attribute CRF inference Pose estimation Feature extraction & quantization Attribute classifier 1 Attribute classifier 2 Attribute classifier M … Multi-attribute CRF inference Feature 1 Feature N SVM1 SVMN Combine features SVM Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve A: attribute F: feature A2 A1 A3 F1 F2 F3 F4 A4 …

Pose Estimation [Eichner et. al., 2010] Perform upper body detection, by using complementary results from face detector and deformable part models. Foreground highlighting within the enlarged upper body bounding box. Parse the upper body into head, torso, upper and lower parts of the left and right arms. The upper body detector is based on the successful part-based object detection framework [1] and contains a model to detect near-frontal upper-bodies [1] Object Detection with Discriminatively Trained Part Based Models, PAMI 2009

Feature Extraction SIFT descriptor extracted over the sampling grid. Similar procedure for the arm regions.

Feature Extraction Maximum Response Filters [Varma 2005] LAB color Skin probability RGB image Skin probability The RFS filter bank consists of 2 anisotropic filters (an edge and a bar filter, at 6 orientations and 3 scales), and 2 rotationally symmetric ones (a Gaussian and a Laplacian of Gaussian). MRF bank Detecting Text in Natural Images

Feature Extraction Raw features are quantized using soft K-means (K=5 in our implementation). Quantized features are aggregated over various body regions, by max or average pooling. For learning color attributes, the feature is LAB color aggregated from non-skin regions. Feature type Region Pooling method SIFT Torso Average Texture Left upper arm Max Color Right upper arm Skin probability Left lower arm Right lower arm

Feature Fusion SVM is a kernel-based classification technique. Feature fusion solution: combined SVM is trained using weighted sum of the kernels. Combining features consistently outperforms the single best feature. SVM 1 SVM 2 SVM N K1 K2 KN Predict accuracy 2 SVM Combined Predict accuracy 1 … Predict accuracy N Attribute prediction

Recap … … A: attribute F: feature Multi-attribute CRF inference Pose estimation Feature extraction & quantization Attribute classifier 1 Attribute classifier 2 Attribute classifier M … Multi-attribute CRF inference Feature 1 Feature N SVM1 SVMN Combine features SVM Predictions Blue Solid pattern Outerwear Wear scarf Long sleeve A: attribute F: feature A2 A1 A3 F1 F2 F3 F4 A4 …

Attribute Dependencies Necktie and T-Shirt?

Attribute Inference with CRF Each attribute is a node. All nodes are pair-wise connected. The edge connecting 2 nodes corresponds to the joint probability of these 2 attributes. A6 F6 A2 A1 A3 A5 A4 F1 F2 F3 F4 F5 Ai: Attribute i Fi: Features for Ai

CRF for Attribute Learning [Following CRF model] A1 AM F1 FM A2 F2 … For a fully connected CRF, we maximize: The CRF potential is maximized using standard belief propagation technique [Tappen et. al. 2003] . Node potential Edge potential

No necktie (Wear necktie) Has collar Men’s Has placket Low exposure No scarf Solid pattern Black Short sleeve (Long sleeve) V-shape neckline Dress (Suit) Wear necktie High exposure (Low exposure) Gray & black Long sleeve Suit No necktie Wear scarf Brown & black No sleeve (long sleeve) Tank top (outerwear) 1. Man in dress; 2. Suit but high skin exposure; 3. No sleeve but wearing scarf Detecting Text in Natural Images

Experimental Results Questions that we are interested in: Does combining features improve performance? Does the pose model help? Does the CRF work?

Pose Vs No Pose - Experiment Setup Positive and negative examples are balanced. SVM classification Chi-squared kernel Leave-1-out cross validation Comparison with attribute learning without pose model. Features are extracted within a scaled clothing mask under the face. Evaluation performed under the same experiment settings. The clothing mask [Gallagher 2008]

Multiclass Confusion Matrix

Unbalanced data classification: G-mean Recap: our CRF model uses the priors of the attributes. Evaluate CRF performance on the full dataset requires unbalanced data classification.

Steve Jobs: “solid pattern, men’s clothing, black color, long sleeves, round neckline, outerwear, wearing scarf”

The predicted dressing style of weddings: Male: “solid pattern, suit, long-sleeves, V-shape neckline, wearing necktie, wearing scarf, has collar, has placket” Female: “high skin exposure, no sleeves, dress, other neckline shapes, white, >2 colors, floral pattern”

Gender Recognition Face-based: Project faces in the Fisher space. Clothing-based: The gender output of our system. Better gender recognition is achieved by combining face and clothing.

Conclusions Clothing attributes can be better learned with a human pose model. CRF offers improved performance by exploring attribute relations. Proposed novel applications that exploit the predicted attributes.

Miscellaneous 56

What do you have? 57

58

59

AutoCropping 60

AutoCropping Auction Probability: 97% 61

AutoCropping Eigenvector Quantized Eigenvector 62

63

How do photos affect value? Angled, high contrast: ~$115 64

How do photos affect value? Frontal, Flash reflection ~$88 65

Thank You! 66

Future Work Expect even better performance by using the (almost) ground truth pose estimated by Kinect sensors [Shotton et. al., Best Paper CVPR 2011]. Incorporate clothing information in person identification.

What we know about people The Loop What we know about people What do we mean by “context”? We can interpret the H or A based on context. Example from “Cognition in Action” Smyth Collins Morris Levy, 1994, LEA Publishers. Images and Computer Vision 68

The Loop: This talk Examples of how social data has helped understand images of people Some things I’ve learned about people from computer vision What do we mean by “context”? We can interpret the H or A based on context. Example from “Cognition in Action” Smyth Collins Morris Levy, 1994, LEA Publishers. 69

What Is Computer Vision? Vision is the process of discovering from images what is present in the world, and where it is. -- David Marr, Vision (1982) Humans can perceive and interpret images very fast and accurately. 70

What Is Computer Vision? Vision deals with: Uncertainty and Probability (What is present) Geometry (Where it is) Humans are really good at this! Humans can perceive and interpret images very fast and accurately. 71

Measurement vs. Perception Visual system tries to undo the measured brightness into the reflectance and illumination and estimate the reflectance that is inherent to the object. 72

Measurement vs. Perception 73

Measurement vs. Perception Müller-Lyer Illusion Our perception of geometric properties is affected by our interpretation. 74 http://www.michaelbach.de/ot/sze_muelue/index.html

What is context? 75 What do we mean by “context”? We can interpret the H or A based on context. Example from “Cognition in Action” Smyth Collins Morris Levy, 1994, LEA Publishers. 75

Context We ourselves are susceptible to clutter as well. This is a problem where computer might do faster than human. 76

Which monster is larger? Shepard RN (1990) Mind Sights: Original Visual Illusions, Ambiguities, and other Anomalies, New York: WH Freeman and Company We can’t help but to integrate perspective cues into our interpretation of the image. 77

Your brain specializes in faces 78

Find The Face In the beans: We ourselves are susceptible to clutter as well. This is a problem where computer might do faster than human. 79 http://www.michaelbach.de/ot/sze_muelue/index.html

Understanding images of people We use many different clues to discover identity and infer about people. What cues do we use to understand this image? How do we know this is a family? What we call “intuition” is often data that exists in the public domain. This thesis is to describe the progress we’ve made towards the objective of providing the computer with the same information that we have when understanding images. The goal of this thesis is to provide computers with that same intuition that computers have. 80