Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Limin Wang, Yu Qiao, and Xiaoou Tang
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Machine learning continued Image source:
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning in NLP Word representation and how to use it for Parsing
Large-Scale Object Recognition with Weak Supervision
Discriminative and generative methods for bags of features
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Distributed Representations of Sentences and Documents
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Spatial Pyramid Pooling in Deep Convolutional
Unsupervised Category Modeling, Recognition and Segmentation Sinisa Todorovic and Narendra Ahuja.
What, Where & How Many? Combining Object Detectors and CRFs
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Learning Based Hierarchical Vessel Segmentation
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
City University of Hong Kong 18 th Intl. Conf. Pattern Recognition Self-Validated and Spatially Coherent Clustering with NS-MRF and Graph Cuts Wei Feng.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
Associative Hierarchical CRFs for Object Class Image Segmentation
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Recognition Using Visual Phrases
Semantic Compositionality through Recursive Matrix-Vector Spaces
Feedforward semantic segmentation with zoom-out features
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Learning Hierarchical Features for Scene Labeling
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.
Image segmentation.
Understanding and Predicting Image Memorability at a Large Scale A. Khosla, A. S. Raju, A. Torralba and A. Oliva International Conference on Computer Vision.
Neural Machine Translation
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Feedforward Networks
Object Detection based on Segment Masks
Deep Learning Amin Sobhani.
Recursive Neural Networks
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Learning Mid-Level Features For Recognition
Finding Things: Image Parsing with Regions and Per-Exemplar Detectors
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
Combining CNN with RNN for scene labeling (segmentation)
Nonparametric Semantic Segmentation
Part-Based Room Categorization for Household Service Robots
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Recursive Structure.
Distributed Representation of Words, Sentences and Paragraphs
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Human-object interaction
“Traditional” image segmentation
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Deep Structured Scene Parsing by Learning with Image Descriptions
Presentation transcript:

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF CHIUNG-YU LIN ANDREW Y. NG CHRISTOPHER D. MANNING 1

Outline Introduction Mapping Segments and Words into Syntactico-Semantic Space Recursive Neural Networks (RNN) for Structure Prediction Experiments Conclusion 2

Introduction Convolutional Neural Network (CNN) V.S. Recursive Neural Networks (RNN) 3

Introduction 4

5

Helps in understanding and classifying scene images 6

Introduction Images are oversegmented into small regions which often represent parts of objects or background. From these regions we extract vision features and then map these features into a “semantic” space using a neural network. ◦(i) a score that is higher when neighboring regions should be merged into a larger region, ◦(ii) a new semantic feature representation for this larger region, and ◦(iii) its class label. After regions with the same object label are merged, neighboring objects are merged to form the full scene image. 7 Natural Scenes

Introduction Words are first mapped into a semantic space. Then they are merged into phrases in a syntactically and semantically meaningful order. The RNN computes the same three outputs and attaches them to each node in the parse tree. ◦The class labels are phrase types such as noun phrase (NP) or verb phrase (VP). 8 Natural Language Sentences

Introduction Contributions. This is the first deep learning method to achieve state-of-the- art results on segmentation and annotation of complex scenes. For scene classification, our learned features outperform state of the art methods such as Gist descriptors[1]. 9 [1] Aude, O. and Torralba, A. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. IJCV, 42, 2001.

Outline Introduction Mapping Segments and Words into Syntactico-Semantic Space Recursive Neural Networks (RNN) for Structure Prediction Experiments Conclusion 10

Mapping Segments How they are mapped into the space in which the RNN operates. We closely follow the procedure described in ([2]Gould et al., 2009) to compute image features. First, we oversegment an image x into superpixels (also called segments) using the algorithm from ([3]Comaniciu & Meer, 2002). Instead of computing multiple oversegmentations, we only choose one set of parameters. In our dataset, this results in an average of 78 segments per image. We compute 119 features for the segments. 11 [2]Gould, S., Fulton, R., and Koller, D. Decomposing a Scene into Geometric and Semantically Consistent Regions. In ICCV, [3]Comaniciu, D. and Meer, P. Mean shift: a robust approach toward feature space analysis. IEEE PAMI, May 2002.

Mapping Segments Next, we use a simple neural network layer to map these features into the “semantic” n-dimensional space in which the RNN operates. 12 Features, i = 1,…,N segs activation vectors {a1,…,aN segs } bias

Mapping Segments Neural language models ([4]Bengio et al., 2003; [5]Collobert & Weston, 2008) map words to a vector representation. Each word i = 1,….,N words has an associated vocabulary index k into the columns of the embedding matrix. 13 [4]Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. JMLR, 3, [5]Collobert, R. and Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. In ICML, binary vector

Outline Introduction Mapping Segments and Words into Syntactico-Semantic Space Recursive Neural Networks (RNN) for Structure Prediction Experiments Conclusion 14

Recursive Neural Networks (RNN) In our discriminative parsing architecture, the goal is to learn a function f : X → Y, where Y is the set of all possible binary parse trees. (i)A set of activation vectors {a1,…, aN segs }, which represent input elements such as image segments or words of a sentence. (ii)A symmetric adjacency matrix A, where A(i, j) = 1, if segment i neighbors j. This matrix defines which elements can be merged. 15 input x

Recursive Neural Networks (RNN) When training the visual parser, we have labels l for all segments. Using these labels, we can define an equivalence set of correct trees Y (x, l). A visual tree is correct if all adjacent segments that belong to the same class are merged into one super segment 16

Recursive Neural Networks (RNN) 17 [6]Taskar, B., Klein, D., Collins, M., Koller, D., and Manning, C. Max-margin parsing. In EMNLP, 2004.

Recursive Neural Networks (RNN) function f with small expected loss on unseen inputs The score of a tree y is high if the algorithm is confident that the structure of the tree is correct. 18 for all training instances (x i, l i ), i = 1,...., n

Recursive Neural Networks (RNN) Minimizing this objective maximizes the correct tree’s score and minimizes (up to a margin) the score of the highest scoring but incorrect tree. 19

Recursive Neural Networks (RNN) Using the adjacency matrix A, the algorithm finds the pairs of neighboring segments and adds their activations to a set of potential child node pairs: Example: {[a1; a2]; [a1; a3]; [a2; a1]; [a2; a4]; [a3; a1]; [a3; a4]; [a4; a2]; [a4; a3]; [a4; a5]; [a5; a4]} 20

Recursive Neural Networks (RNN) 21

Recursive Neural Networks (RNN) 22

Recursive Neural Networks (RNN) Example: {[a1; a2]; [a1; a3]; [a2; a1]; [a2; a4]; [a3; a1]; [a3; a4]; [a4; a2]; [a4; a3]; [a4; a5]; [a5; a4]} merge [a4; a5] {[a1; a2]; [a1; a3]; [a2; a1]; [a2; p(4;5)]; [a3; a1]; [a3; p(4;5)]; [p(4;5); a2]; [p(4;5); a3]} 23

Recursive Neural Networks (RNN) 24

Recursive Neural Networks (RNN) We can leverage this representation by adding to each RNN parent node (after removing the scoring layer) a simple softmax layer to predict class labels, such as visual or syntactic categories: 25

Outline Introduction Mapping Segments and Words into Syntactico-Semantic Space Recursive Neural Networks (RNN) for Structure Prediction Experiments Conclusion 26

Experiments 27

Experiments 28 [7]Tighe, Joseph and Lazebnik, Svetlana. Superparsing: scalable nonparametric image parsing with superpixels. In ECCV, [7]

Experiments 29

Experiments 30

Outline Introduction Mapping Segments and Words into Syntactico-Semantic Space Recursive Neural Networks (RNN) for Structure Prediction Experiments Conclusion 31

Conclusion We have introduced a recursive neural network architecture which can successfully merge image segments or natural language words based on deep learned semantic transformations of their original features. Our method outperforms state-of-the-art approaches in segmentation, and scene classification. 32