Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides & Speech: Rui Zhang

Outline Motivation & Contribution Recursive Neural Network
Scene Segmentation using RNN Learning and Optimization Language Parsing using RNN Experiments

Motivation Data naturally contains recursive structures
Image: Scenes split into objects, objects split into parts Language: A noun phrase contains a clause which contains noun phrases of its own

Motivation The recursive structure helps to
Identify components of the data Understand how the components interact to form the whole data

Contribution First deep learning method to achieve state-of-art performance on scene segmentation and annotation Learned deep features outperform hand-crafted ones(e.g. Gist) Can be generalized for other tasks, e.g. language parsing

Recursive Neural Network
Similar to one-layer full-connected network Models transformation from children nodes to parent node Recursively applied to tree structure Parent of one layer become child of the upper layer Parameters shared across layers 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 𝑐 3

Recursive vs. Recurrent NN
There are two models called RNN: Recursive and Recurrent Similar Both have shared parameter which are applied in a recursive style Different Recursive NN applies to trees, while Recurrent NN applies to sequences Recurrent NN could be considered as Recursive NN for one-way trees

Scene Segmentation Pipeline
Over segment image into superpixels Extract feature of superpixels Map feature onto semantic space Compute score for each merge with RNN Permute possible merges Merge pair of nodes with highest score Repeat until only one node is left

Input Data Representation
Image Over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer to obtain feature vector Each superpixel has a class label

Tree Construction Scene parse trees are constructed in bottom-up style
Leaf nodes are over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer Each leaf has a feature vector An adjacency matrix records neighboring relations 𝐴 𝑖𝑗 =𝑓 𝑥 = 0, 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 &1, 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 Adjacency Matrix

Greedy Merging Nodes are merged in a greedy style In each iteration
Permute all possible merge(pairs of adjacent nodes) Compute score for each possible merge Full-connection transformation upon ℎ Merge the pair with highest score 𝑐 1 and 𝑐 2 replaced by new node 𝑐 12 ℎ 12 becomes feature for 𝑐 12 Union of neighbors of 𝑐 1 and 𝑐 2 becomes neighbors of 𝑐 12 Repeat until only one node is left 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 12 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑠𝑐𝑜𝑟𝑒

Training(1) Max Margin Estimation Structured Margin Loss ∆
Penalize merging a segment with another one of a different label before merging with all its neighbors of the same label Number of sub-trees not appearing in correct trees Tree Score 𝑠 Sum of merge scores on all non-leaf nodes Class Label Softmax upon node feature vector Correct Trees Adjacent nodes with same label are merged first One image may have more than one correct tree

Training(2) Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin △ Formulation Margin Loss Function 𝑟 𝑖 𝜃 is minimized 𝑑 is a node in the parse tree 𝑁 ∙ is the set of nodes 𝜃 is all model parameters 𝑖 is index of training image 𝑥 𝑖 is training image 𝑖 𝑙 𝑖 is labels of 𝑥 𝑖 𝑌 𝑥 𝑖 , 𝑙 𝑖 is set of correct trees of 𝑥 𝑖 Τ 𝑥 𝑖 is all possible trees of 𝑥 𝑖 𝑠 ∙ is the tree score function

Training(3) Label of node is predicted by softmax
The margin △ is no differentiable Therefore only a sub-gradient is computed 𝜕𝑠 𝜕𝜃 is obtained by back-propagation Gradient of label prediction is also obtained by back-propagation 𝑐 1 𝑐 2 𝑊 𝑟𝑒𝑐𝑢𝑟 𝑥 ℎ 12 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑠𝑐𝑜𝑟𝑒 𝑊 𝑙𝑎𝑏𝑒𝑙 𝑙𝑎𝑏𝑒𝑙

Language Parsing Language parsing is similar to scene parsing
Differences Input is natural language sentence Adjacency is strictly left and right Class labels are syntactical classes Word Level Phrase Level Clause(从句) Level Each sentence has only one correct tree

Experiments Overview Image Language Scene Segmentation and Annotation
Scene Classification Nearest Neighbor Scene Subtree Language Supervised Language Parsing Nearest Neighbor Phrases

Scene Segmentation and Annotation
Dataset Stanford Background Dataset Task: Segment and label foreground and different types of background pixelwise Result 78.1% pixelwise accuracy 0.6% above state-of-art

Scene Classification Dataset Task Method Result Discussion
Stanford Background Dataset Task Three classes: city, countryside, sea-side Method Feature: Average of all node features/top node feature only Classifier: Linear SVM Result 88.1% accuracy for average feature 4.1% above Gist, the state-of-art feature 71.0% accuracy for top feature Discussion Learned RNN feature can better capture semantic info of scene Top feature losses some lower level info

Nearest Neighbor Scene Subtrees
Dataset Stanford Background Dataset Task Retrieve similar segments from all images Subtrees whose nodes have the same label corresponds to a segment Method Feature: Top node feature of the subtree Metrics: Euclidean Distance Result Similar segments are retrieved Discuss RNN feature can capture segment level characteristics

Supervised Language Parsing
Dataset Penn Treebank Wall Street Journal Section Task Generate parse tree with labeled node Result Unlabeled bracketing F-measure 90.29%, comparable to 91.63% of Berkley Parser

Nearest Neighbor Phrases
Dataset Penn Treebank Wall Street Journal Section Task Retrieve nearest neighbor of given sentence Method Feature: Top node feature Metrics: Euclidean Distance Result Similar sentences are retrieved

Discussion Understanding semantic structure of data is essential for applications like fine-grained search or captioning Recursive NN predicts tree structure along with node labels in an elegant way Recursive NN can be incorporated with CNN If we can jointly learn Recursive NN with

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Similar presentations

Presentation on theme: "Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Similar presentations

Presentation on theme: "Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning"— Presentation transcript:

Similar presentations

About project

Feedback