PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook.

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Classification spotlights

Limin Wang, Yu Qiao, and Xiaoou Tang

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

ImageNet Classification with Deep Convolutional Neural Networks

Weiwei Zhang, Jian Sun, and Xiaoou Tang, Fellow, IEEE.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Steerable Part Models Hamed Pirsiavash and Deva Ramanan

1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.

K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.

Methods in Leading Face Verification Algorithms

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Spatial Pyramid Pooling in Deep Convolutional

Generic object detection with deformable part-based models

Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.

Comp 5013 Deep Learning Architectures Daniel L. Silver March,

Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,

Kuan-Chuan Peng Tsuhan Chen

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

A shallow introduction to Deep Learning

Detection, Segmentation and Fine-grained Localization

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Object detection, deep learning, and R-CNNs

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

Neural networks in modern image processing Petra Budíková DISA seminar,

Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Describing People: A Poselet-Based Approach to Attribute Classification.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Unsupervised Visual Representation Learning by Context Prediction

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Fine-grained Fine-grained Recognition( 细粒度分类 ) 沈志强.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Convolutional Neural Networks

Recent developments in object detection

Compact Bilinear Pooling

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Goodfellow: Chap 1 Introduction

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Goodfellow: Chap 1 Introduction

Computer Vision James Hays

Attributes and Simile Classifiers for Face Verification

Image Classification.

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Neural Networks Geoff Hulten.

Outline Background Motivation Proposed Model Experimental Results

Heterogeneous convolutional neural networks for visual recognition

Course Recap and What’s Next?

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Department of Computer Science Ben-Gurion University of the Negev

Automatic Handwriting Generation

Human-object interaction

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

Adrian E. Gonzalez , David Parra Department of Computer Science

Presentation transcript:

PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang1;2, Manohar Paluri1, Marc’Aurelio Ranzato1, Trevor Darrell2, Lubomir Bourdev1 1: Facebook AI Research, 2: EECS, UC Berkeley {fnzhang, {mano, ranzato,

Outline Introduction Related work Pose Aligned Networks for Deep Attribute modeling Datasets Results Conclusion

Introduction Recognizing human attributes, such as gender, age, hair style, and clothing style, has many applications. The signal associated with some attributes is subtle and the image is dominated by the effects of pose and viewpoint.

Deep learning methods, and in particular convolutional nets [20], have achieved very good performance on several tasks. Moreover, Donahue et al. [8] show that features extracted from the deep convolutional network trained on large datasets are generic and can help in other visual recognition problems. [20] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. E. Howard,W. Hubbard, and L. D. Jackel. Backpropagation applied to hand-written zip code recognition. In Neural Computation, , 2 [8] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In arXiv: , , 4, 5 Introduction

We conjecture that available training data, even ImageNet-scale, is presently insufficient for learning pose normalization in a CNN. Part-based methods have gained significant recent attention as a method to deal with pose variation and are the state-of- the-art method for attribute prediction today. Introduction

Our method can use other parts and we show the performance using DPM [12] as well. We demonstrate the effectiveness of PANDA on attribute classification problems and present state-of- the-art experimental results on three datasets [12] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, , 2 Introduction

Related work 2.1. Attribute classification Attributes are used as an intermediate representation for knowledge transfer in [17, 10] for object recognition tasks. There is also some related work in automatic attribute discovery: Berg et al. [1] proposed automatic attribute vocabularies discovery by mining unlabeled text and image data sampled from the web. [10] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing Objects by their Attributes. In CVPR, [17] C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to Detect Unseen Object Classes by Between-Class Attribute Transfer. In CVPR, [1] T. L. Berg, A. C. Berg, and J. Shih. Automatic attribute discovery and characterization from noisy web data. In ECCV,

Related work In [16], facial attributes such as gender, mouth shape, facial expression, are learned for face verification and image search tasks. A very closely related work on attribute prediction is Bourdev et al. [4], which is a three-layer feed forward classification system. [16] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. Attribute and simile classifiers for face verification. In ICCV, , 6, 7 [4] L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In ICCV, , 2, 4, 5, 6

Related work 2.2. Deep learning The most popular deep learning method for vision, namely the convolutional neural network (CNN) Although very successful when provided very large labeled datasets, convolutional nets usually generalize poorly on smaller datasets

Pose Aligned Networks for Deep Attribute modeling We explore part-based models, specifically poselets, and deep learning. Our goal is to use poselets for part localization and incorporate these normalized parts into deep convolutional nets in order to extract pose normalized representations.

Pose Aligned Networks for Deep Attribute modeling Towards this goal, we leverage both the power of convolutional nets for learning discriminative features from data and the ability of poselets to simplify the learning task by decomposing the objects into their canonical poses. Specifically, we start from poselet patches, resize them to 64x64 pixels (Figure 3), randomly jitter each patch and flip it horizontally with probability 0.5 to improve generalization, and train a CNN for each poselet.

Pose Aligned Networks for Deep Attribute modeling

The whole network is trained jointly by standard back propagation of the error [24] and stochastic gradient descent[2] using as a loss function the sum of the log-losses of each attribute for each training sample. The details of the layers are given in Figure 2 and further implementation details can be found in [15]. [24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. In Nature, [2] L. Bottou. Stochastic Gradient Descent Tricks. In G. Montavon,G. Orr, and K.-R. M¨uller, editors, Neural Networks: Tricks of the Trade, volume 7700 of Lecture Notes in Computer Science, pages 421–436. Springer Berlin Heidelberg, [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNetClassification with Deep Convolutional Neural Networks. In NIPS, , 3, 4

Pose Aligned Networks for Deep Attribute modeling The overall convolutional net architecture is shown in Figure 2.

Pose Aligned Networks for Deep Attribute modeling Based on our experiments, we find a more complex net is needed for the whole-person region than for the part regions. We extract deep convolutional features from the model trained on Imagenet [15] using the open source package provided by [8] as our deep representation of the full image patch. [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, , 3, 4 [8] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In arXiv: , , 4, 5

Datasets 4.1. The Berkeley Human Attributes Dataset We tested our method on the Berkeley Human Attributes Dataset [4]. This dataset consists of 4013 training, and 4022 test images collected from PASCAL and H3D datasets. [4] L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In ICCV, , 2, 4, 5, 6

Datasets 4.2. Attributes 25K Dataset Unfortunately the training portion of the Berkeley dataset is not large enough for training our deep-net models (they severely overfit when trained just on these images). We collected an additional dataset from Facebook of people split into 8737 training, 8737 validation and 7489 test examples.

Results [4] L. Bourdev, S. Maji, and J. Malik. Describing people: A poselet-based approach to attribute classification. In ICCV, , 2, 4, 5, 6 [27] N. Zhang, R. Farrell, F. Iandola, and T. Darrell. Deformable Part Descriptors for Fine- grained Recognition and Attribute Prediction. In ICCV, , 4, 5

Results

Conclusion We presented a method for attribute classification of people that improves performance compared with previously published methods. Our feature representation is generic and we achieve state-of- the-art results on the Berkeley Attributes of People dataset and on LFW even if we train our CNNs on a different dataset.