Presentation is loading. Please wait.

Presentation is loading. Please wait.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Similar presentations


Presentation on theme: "Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning"โ€” Presentation transcript:

1 Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning Slides & Speech: Rui Zhang

2 Outline Motivation & Contribution Recursive Neural Network
Scene Segmentation using RNN Learning and Optimization Language Parsing using RNN Experiments

3 Motivation Data naturally contains recursive structures
Image: Scenes split into objects, objects split into parts Language: A noun phrase contains a clause which contains noun phrases of its own

4 Motivation The recursive structure helps to
Identify components of the data Understand how the components interact to form the whole data

5 Contribution First deep learning method to achieve state-of-art performance on scene segmentation and annotation Learned deep features outperform hand-crafted ones(e.g. Gist) Can be generalized for other tasks, e.g. language parsing

6 Recursive Neural Network
Similar to one-layer full-connected network Models transformation from children nodes to parent node Recursively applied to tree structure Parent of one layer become child of the upper layer Parameters shared across layers ๐‘ 1 ๐‘ 2 ๐‘Š ๐‘Ÿ๐‘’๐‘๐‘ข๐‘Ÿ ๐‘ฅ โ„Ž ๐‘ 3

7 Recursive vs. Recurrent NN
There are two models called RNN: Recursive and Recurrent Similar Both have shared parameter which are applied in a recursive style Different Recursive NN applies to trees, while Recurrent NN applies to sequences Recurrent NN could be considered as Recursive NN for one-way trees

8 Scene Segmentation Pipeline
Over segment image into superpixels Extract feature of superpixels Map feature onto semantic space Compute score for each merge with RNN Permute possible merges Merge pair of nodes with highest score Repeat until only one node is left

9 Input Data Representation
Image Over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer to obtain feature vector Each superpixel has a class label

10 Tree Construction Scene parse trees are constructed in bottom-up style
Leaf nodes are over-segmented superpixels Extract hand-crafted feature Map onto semantic space by one full-connection layer Each leaf has a feature vector An adjacency matrix records neighboring relations ๐ด ๐‘–๐‘— =๐‘“ ๐‘ฅ = 0, ๐‘– ๐‘Ž๐‘›๐‘‘ ๐‘— ๐‘Ž๐‘Ÿ๐‘’ ๐‘›๐‘œ๐‘ก ๐‘›๐‘’๐‘–๐‘”โ„Ž๐‘๐‘œ๐‘Ÿ๐‘  &1, ๐‘– ๐‘Ž๐‘›๐‘‘ ๐‘— ๐‘Ž๐‘Ÿ๐‘’ ๐‘›๐‘’๐‘–๐‘”โ„Ž๐‘๐‘œ๐‘Ÿ๐‘  Adjacency Matrix

11 Greedy Merging Nodes are merged in a greedy style In each iteration
Permute all possible merge(pairs of adjacent nodes) Compute score for each possible merge Full-connection transformation upon โ„Ž Merge the pair with highest score ๐‘ 1 and ๐‘ 2 replaced by new node ๐‘ 12 โ„Ž 12 becomes feature for ๐‘ 12 Union of neighbors of ๐‘ 1 and ๐‘ 2 becomes neighbors of ๐‘ 12 Repeat until only one node is left ๐‘ 1 ๐‘ 2 ๐‘Š ๐‘Ÿ๐‘’๐‘๐‘ข๐‘Ÿ ๐‘ฅ โ„Ž 12 ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘Š ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’

12 Training(1) Max Margin Estimation Structured Margin Loss โˆ†
Penalize merging a segment with another one of a different label before merging with all its neighbors of the same label Number of sub-trees not appearing in correct trees Tree Score ๐‘  Sum of merge scores on all non-leaf nodes Class Label Softmax upon node feature vector Correct Trees Adjacent nodes with same label are merged first One image may have more than one correct tree

13 Training(2) Intuition: We want the score of highest scoring correct tree to be larger than other trees by a margin โ–ณ Formulation Margin Loss Function ๐‘Ÿ ๐‘– ๐œƒ is minimized ๐‘‘ is a node in the parse tree ๐‘ โˆ™ is the set of nodes ๐œƒ is all model parameters ๐‘– is index of training image ๐‘ฅ ๐‘– is training image ๐‘– ๐‘™ ๐‘– is labels of ๐‘ฅ ๐‘– ๐‘Œ ๐‘ฅ ๐‘– , ๐‘™ ๐‘– is set of correct trees of ๐‘ฅ ๐‘– ฮค ๐‘ฅ ๐‘– is all possible trees of ๐‘ฅ ๐‘– ๐‘  โˆ™ is the tree score function

14 Training(3) Label of node is predicted by softmax
The margin โ–ณ is no differentiable Therefore only a sub-gradient is computed ๐œ•๐‘  ๐œ•๐œƒ is obtained by back-propagation Gradient of label prediction is also obtained by back-propagation ๐‘ 1 ๐‘ 2 ๐‘Š ๐‘Ÿ๐‘’๐‘๐‘ข๐‘Ÿ ๐‘ฅ โ„Ž 12 ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘Š ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘Š ๐‘™๐‘Ž๐‘๐‘’๐‘™ ๐‘™๐‘Ž๐‘๐‘’๐‘™

15 Language Parsing Language parsing is similar to scene parsing
Differences Input is natural language sentence Adjacency is strictly left and right Class labels are syntactical classes Word Level Phrase Level Clause(ไปŽๅฅ) Level Each sentence has only one correct tree

16 Experiments Overview Image Language Scene Segmentation and Annotation
Scene Classification Nearest Neighbor Scene Subtree Language Supervised Language Parsing Nearest Neighbor Phrases

17 Scene Segmentation and Annotation
Dataset Stanford Background Dataset Task: Segment and label foreground and different types of background pixelwise Result 78.1% pixelwise accuracy 0.6% above state-of-art

18 Scene Classification Dataset Task Method Result Discussion
Stanford Background Dataset Task Three classes: city, countryside, sea-side Method Feature: Average of all node features/top node feature only Classifier: Linear SVM Result 88.1% accuracy for average feature 4.1% above Gist, the state-of-art feature 71.0% accuracy for top feature Discussion Learned RNN feature can better capture semantic info of scene Top feature losses some lower level info

19 Nearest Neighbor Scene Subtrees
Dataset Stanford Background Dataset Task Retrieve similar segments from all images Subtrees whose nodes have the same label corresponds to a segment Method Feature: Top node feature of the subtree Metrics: Euclidean Distance Result Similar segments are retrieved Discuss RNN feature can capture segment level characteristics

20 Supervised Language Parsing
Dataset Penn Treebank Wall Street Journal Section Task Generate parse tree with labeled node Result Unlabeled bracketing F-measure 90.29%, comparable to 91.63% of Berkley Parser

21 Nearest Neighbor Phrases
Dataset Penn Treebank Wall Street Journal Section Task Retrieve nearest neighbor of given sentence Method Feature: Top node feature Metrics: Euclidean Distance Result Similar sentences are retrieved

22 Discussion Understanding semantic structure of data is essential for applications like fine-grained search or captioning Recursive NN predicts tree structure along with node labels in an elegant way Recursive NN can be incorporated with CNN If we can jointly learn Recursive NN with


Download ppt "Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning"

Similar presentations


Ads by Google