Parsing Clothing in Fashion Photographs

Slides:



Advertisements
Similar presentations
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Advertisements

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Patch to the Future: Unsupervised Visual Prediction
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Face Detection, Pose Estimation, and Landmark Localization in the Wild
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
2.Our Framework 2.1. Enforcing Temporal Consistency by Post Processing  Human Detection from Yang and Ramanan [1] Articulated Pose Estimation using Flexible.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
1 P. Arbelaez, M. Maire, C. Fowlkes, J. Malik. Contour Detection and Hierarchical image Segmentation. IEEE Trans. on PAMI, Student: Hsin-Min Cheng.
Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Robust Higher Order Potentials For Enforcing Label Consistency
Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.
1 Learning to Detect Natural Image Boundaries David Martin, Charless Fowlkes, Jitendra Malik Computer Science Division University of California at Berkeley.
CVR05 University of California Berkeley 1 Familiar Configuration Enables Figure/Ground Assignment in Natural Scenes Xiaofeng Ren, Charless Fowlkes, Jitendra.
CVR05 University of California Berkeley 1 Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes, Jitendra Malik.
Presented by Zeehasham Rasheed
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Building Face Dataset Shijin Kong. Building Face Dataset Ramanan et al, ICCV 2007, Leveraging Archival Video for Building Face DatasetsLeveraging Archival.
A General Framework for Tracking Multiple People from a Moving Camera
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Tracking People by Learning Their Appearance Deva Ramanan David A. Forsuth Andrew Zisserman.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.
Associative Hierarchical CRFs for Object Class Image Segmentation
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
A New Method for Automatic Clothing Tagging Utilizing Image-Click-Ads Introduction Conclusion Can We Do Better to Reduce Workload?
NEUROAESTHETICS IN FASHION: MODELING THE PERCEPTION OF FASHIONABILITY EDGAR SIMO-SERRA SANJA FIDLER FRANCESC MORENO-NOGUER RAQUEL URTASUN CVPR2015.
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Edge Preserving Spatially Varying Mixtures for Image Segmentation Giorgos Sfikas, Christophoros Nikou, Nikolaos Galatsanos (CVPR 2008) Presented by Lihan.
Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
Modeling Perspective Effects in Photographic Composition Zihan Zhou, Siqiong He, Jia Li, and James Z. Wang The Pennsylvania State University.
Stochastic Context-Free Grammars for Modeling RNA
Liang Lin, Shuicheng Yan
Nonparametric Semantic Segmentation
Accounting for the relative importance of objects in image retrieval
Attributes and Simile Classifiers for Face Verification
Stochastic Context-Free Grammars for Modeling RNA
Radio Propagation Simulation Based on Automatic 3D Environment Reconstruction D. He A novel method to simulate radio propagation is presented. The method.
Outline Background Motivation Proposed Model Experimental Results
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
Learning to Navigate for Fine-grained Classification
Presentation transcript:

Parsing Clothing in Fashion Photographs Kota Yamaguchi M. Hadi Kiapour Luis E. Ortiz Tamara L. Berg Stony Brook University Stony Brook, NY 11794, USA {kyamagu, mkiapour, leortiz, tlberg}@cs.stonybrook.edu

Outline Introduction Fashionista Dataset & Labeling Tools Clothing parsing Experimental Results Conclusions

Introduction A novel dataset for studying clothing parsing, consisting of 158,235 fashion photos with associated text annotations, and web-based tools for labeling. An effective model to recognize and precisely parse pictures of people into their constituent garments. Initial experiments on how clothing prediction might improve state of the art models for pose estimation. A prototype visual garment retrieval application that can retrieve matches independent of pose.

Fashionista Dataset Chictopia.com 158,235 photographs

Labeling Tools As a training and evaluation set, select 685 photos with good visibility of the full body and covering a variety of clothing items. 2 Amazon Mechanical Turk jobs The first Turk job gathers ground truth pose annotations for the usual 14 body parts [27]. The second Turk job gathers ground truth clothing labels on superpixel regions.

Clothing parsing clothing parsing pipeline proceeds as follows: 1. Obtain superpixels {si} from an image I 2. Estimate pose configuration X using P(X|I) 3. Predict clothes L using P(L|X, I) 4. Optionally, re-estimate pose configuration X using model P(X|L, I)

Superpixels We use a recent image segmentation algorithm [1] to obtain superpixels. This process typically yields between a few hundred to a thousand regions per image, depending on the complexity of the person and background appearance. [1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour detection and hierarchical image segmentation. PAMI, 33(5):898 –916, 2011.

Pose estimation We begin our pipeline by estimating pose using P(X|I): The scoring function used to evaluate pose [27] is: [27] Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In CVPR, pages 1385 –1392,2011.

Clothing labeling Once we obtain the initial pose estimate , we can proceed to estimating the clothing labeling: We model the probability distribution P(L|X, I) with a second order conditional random field (CRF):

Pose re-estimation Given the predicted clothing labels , we try to improve our prior MAP pose assignment by computing the posterior MAP conditioned on in (1):

Training Training of our clothing parser includes parameter learning of the pose estimator P(X|I) and P(X|L, I), learning of potential functions in P(L|X, I), and learning of CRF parameters in (4).

Experimental Results Clothing Parsing Accuracy

garment meta-data and without meta-data

Pose Re-Estimation Accuracy

Retrieving Visually Similar Garments

Conclusions This paper proposes an effective method to produce an intricate and accurate parse of a person’s outfit. Two scenarios are explored: parsing with meta-data provided garment tags, and parsing with unconstrained label sets. A large novel data set and labeling tools are also introduced. We demonstrate intriguing initial experiments on using clothing estimates to improve human pose prediction, and a prototype application for visual garment search.