Recursive Neural Networks

Slides:



Advertisements
Similar presentations
Dougal Sutherland, 9/25/13.
Advertisements

Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Distributed Representations of Sentences and Documents
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Data Structures TREES.
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
Semantic Compositionality through Recursive Matrix-Vector Spaces
Chapter 8: Adaptive Networks
Predicting the dropouts rate of online course using LSTM method
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Learning to Answer Questions from Image Using Convolutional Neural Network Lin Ma, Zhengdong Lu, and Hang Li Huawei Noah’s Ark Lab, Hong Kong
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Sentiment analysis using deep learning methods
Convolutional Sequence to Sequence Learning
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: Neural Networks
End-To-End Memory Networks
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Feedforward Networks
Deep Learning for Bacteria Event Identification
Deep Learning Amin Sobhani.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
ECE 5424: Introduction to Machine Learning
Recurrent Neural Networks for Natural Language Processing
Visualizing and Understanding Neural Models in NLP
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Different Units Ramakrishna Vedantam.
Neural networks (3) Regularization Autoencoder
Mean Shift Segmentation
Deep learning and applications to Natural language processing
Neural Networks 2 CS446 Machine Learning.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Shunyuan Zhang Nikhil Malik
Neural Networks and Backpropagation
RNNs: Going Beyond the SRN in Language Prediction
Convolutional Neural Networks for sentence classification
Grid Long Short-Term Memory
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Advanced Artificial Intelligence
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
A First Look at Music Composition using LSTM Recurrent Neural Networks
Chapter 3. Artificial Neural Networks - Introduction -
Deep Learning Hierarchical Representations for Image Steganalysis
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Lecture 16: Recurrent Neural Networks (RNNs)
RNNs: Going Beyond the SRN in Language Prediction
Graph Neural Networks Amog Kamsetty January 30, 2019.
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Introduction to Neural Networks
Learning and Memorization
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Recurrent Neural Networks
Deep learning: Recurrent Neural Networks CV192
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Recursive Neural Networks Hazem Nomer

Introduction Non-linear adaptive models Learn deep structured information. It can operate on structured input (e.g binary trees, graphs, sequences, etc..) Simply the RNN hidden units has the same shape of the input (tree) A generalization of recurrent neural networks Applied to parsing, sentiment analysis, protein structure prediction.

Recurrent vs. Recursive The operation of recursive neural network. Black, orange and red dots represent input, hidden and output layers, respectively. We begin by computing the representation of each word vector (leaf nodes) and then computing the internal nodes. Figure (C) shows a recurrent neural network folded through time.

Recurrent vs. Recursive They are feed forward neural networks with recurrent edges that span time steps(the activation from one unit is stored to be used as an input in the next time step)

Recursive Neural Network Given a positional directed acyclic graph, it visits the nodes in topological order, and recursively applies transformations to generate further representations from previously computed representations of children. Given a binary tree structure with leaves having the initial representations, e.g. a parse tree with word vector representations at the leaves, a recursive neural network computes the representations at each internal node as follows: where l(η) and r(η) are the left and right children of η, WL and WR are the weight matrices that connect the left and right children to the parent, and b is a bias vector.

Recursive Neural Network The previous definition shows that initial representations and intermediate representations lie in the same space. Then, there is a task-specific output layer above the representation layer. As an example, for the task of sentiment classification yη is the predicted sentiment label of the phrase given by the sub tree rooted at η. Thus, during supervised learning initial external errors are incurred on y and back propagated from the root toward leaves.

Untying Leaves and internals The previous definition treated the leaves and internal nodes the same. (the words and phrases lie in the same meaning space) We can use untied version that distinguish between leaf and internal node Benefits: Small but powerful models can be trained by using pretrained word vectors with a large dimensionality. Separating leaves and internal nodes allows the use of rectifiers in a more natural manner.

Deep Recursive Neural Network Recursive neural networks are deep in structure but they lack hierarchical interpretation of the data. In stacked deep learners the depth means a hierarchy among hidden representations: every hidden layer lies in a different representation space and is a more abstract representation of the input than the previous layer. A deep recursive neural network is which is constructed by stacking multiple layers of individual recursive nets. Where i indexes the multiple stacked layers and V(i) is the weight matrix that connects the (i − 1)th hidden layer to the ith hidden layer. For prediction, we connect the output layer to only the final hidden layer. Learning in deep RNN is done by back propagation through structure. A node receives error terms from both its parent (through structure) and from its counterpart in the higher layer (through space).

Deep Recursive Neural Network

Deep vs Shallow Recursive Neural Networks In a shallow recursive neural network a single layer is responsible for learning a representation of composition that is both useful and sufficient for the final decision. In a deep recursive neural network a layer can learn some parts of the composition to apply pass this intermediate representation to the next layer for further processing for the remaining parts of the overall composition. Irsoy and Cardie (2014) showed that deep recursive neural networks outperform shallow recursive nets of the same size in the fine-grained sentiment prediction task on the Stanford Sentiment Treebank and outperform multiplicative recursive neural network variants, achieving new state-of-the-art performance on the task.

Deep Recursive Neural Networks for natural language compositionality Stanford Sentiment Treebank (SST) includes labels for 215,154 phrases in the parse trees of 11,855 sentences, with an average sentence length of 19.1 tokens. Real-valued sentiment labels are converted to an integer ordinal label in {0, . . . , 4} by simple thresholding. Therefore the supervised task is posed as a 5-class classification problem. For the output layer: standard softmax activation: For the hidden layers : rectifier linear activation: f(x) = max{0, x}. Experimentally, rectifier activation gives better performance, faster convergence, and sparse representations.

Deep Recursive Neural Networks for natural language compositionality Regularization: using dropout technique with dropout rate (probability of dropped neurons) from {0,0.1,0.5}. Dropout prevents learned features from co-adapting. Note that dropped units are shared: for a single sentence and a layer drop the same units of the hidden layer at each node. Training: Stochastic gradient descent with learning rate (0.01) with AdaGrad. Recursive weights within a layer(Whh) are initialized as 0.5I +  where  is a small uniformly random noise. This means that initially, the representation of each node is approximately the mean of its two children. All other weights are initialized as .

Deep Recursive Neural Network

Deep Recursive Neural Network

Long Short-Term Memory Over Recursive Structures Introduced by Hochreiter and Schmidhuber, the LSTM overcome the problem of vanishing gradients. Same as a standard recurrent neural network with a hidden layer, but each ordinary node in the hidden layer is replaced by a memory cell. Each memory cell contains a node with a self-connected recurrent edge of fixed weight one to ensure that the gradient can pass across many time steps without vanishing or exploding. The LSTM model introduces an intermediate type of storage via the memory cell.

Long Short-Term Memory Over Recursive Structures On the left, long short term memory cell as introduced by Hochreiter and Schmidhuber. On the right, a long short term memory with forget gate as introduced by Gers et al. It is used to flush the contents of the internal state.

Long Short-Term Memory Over Recursive Structures A recurrent neural network with a hidden layer consisting of two memory cells. The network is shown unfolded across two time steps.

Long Short-Term Memory Over Recursive Structures Recursion is a fundamental process associated with many problems—a recursive process and the structure it forms are common in different modalities. Semantics of sentences in human languages is arguably to be a linear concatenation of words instead, sentences often have structures . Image understanding, as another example, may benefit from recursive modeling over structures. Zhu et al. extended LSTM to tree structures to learn memory blocks that can reflect the history memories of multiple child cells and also multiple descendant cells. They call the model S-LSTM.

Long Short-Term Memory Over Recursive Structures An example of S-LSTM, a long-short term memory network on tree structures. A tree node can consider information from multiple descendants. Information of the other nodes in white are blocked. The small circle (”◦”) or short line (”−”) at each arrowhead indicates a pass or block of information, respectively, while in the real model the gating is a soft version of gating.

Long Short-Term Memory Over Recursive Structures Each node in the network is composed of a S-LSTM memory block. Each memory block contains one input gate and one output gate. The number of forget gates depends on the structure (the number of children of a node). The hidden vectors of the two children, denoted as hLt−1 for the left child and hR t−1 for the right are inputs of the current block. The input gate it consider four resources of information: the hidden vectors cell vectors of its two children. The left and right forget gates can be controlled independently, allowing the pass-through of information from children’s cell vectors.

Long Short-Term Memory Over Recursive Structures The output gate ot considers the hidden vectors from the children and the current cell vector. The hidden vector ht and the cell vector ct of the current block are passed to the parent and are used depending on if the current block is a left or right child of its parent. The memory cell through merging the gated cell vectors of the children, can reflect multiple direct or indirect descendant cells so, the long- distance interplays over the structures can be captured.

Long Short-Term Memory Over Recursive Structures A S-LSTM memory block, consisting of an input gate, two forget gates, and an output gate. Hidden vectors h∗ t−1 and cell vectors c∗ t−1 from the left (red arrows) and right (blue arrows) children are deployed to compute ct and ht. ⊗ denotes a Hadamard product.

Long Short-Term Memory Over Recursive Structures

Long Short-Term Memory Over Recursive Structures During training, the gradient of the objective function with respect to each parameter can be calculated efficiently via back-propagation over structures. They use LSTM-like back-propagation where unlike a regular LSTM the pass of error needs to discriminate between children.