Recurrent Neural Networks for Natural Language Processing

Recurrent Neural Networks for Natural Language Processing
Presenter: Haotian

Roadmap What are RNNs? What can RNNs do? How to train RNNs?
Extensions of RNNs

What are RNNs? The idea behind RNNs is to make use of sequential information If we want to predict the next word in a sentence, we better know which words came before it We assume that: Inputs are not independent of each other Outputs depend on previous information Different from NN/CNN In other words, RNNs have “memory”!

What are RNNs? x_t: input at step t o_t: output at step t
s_t: hidden state at step t  Memory Could be a vector representation of word

What can RNNs do? Language Modeling and Generating Text
RNNs allow us to measure how likely a sentence is and predict missing/next words as generative models o_t=x_(t+1) during training Machine Translation Given a sequence of words in source language(German), we want to output a sequence of words in target language(English).

What can RNNs do? Speech Recognition Generating Image Descriptions
Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of phonetic segments together with their probabilities. Generating Image Descriptions Together with CNNs, RNNs have been used as part of a model to generate descriptions for unlabeled images.

How to train RNNs? Backpropagation Through Time(BPTT)
Loss function: cross-entropy loss Our goal to calculate the gradients of the error with respect to our parameters U, V and W and then learn good parameters using Stochastic Gradient Descent.

How to train RNNs? Backpropagation Through Time(BPTT)

An example of language modeling
Language modeling on reddit comments Given reddit comments, train a model to generate comments

Vocabulary size=8000 Hidden layer size=100(Memory) One-hot encoding

Loss function: cross-entropy loss Our goal is to find U,V and W that minimize the loss function for our training data. If we have N(15000) training examples and C(8000) classes:

Generated text Some good sentences: Anyway, to the city scene you’re an idiot teenager. What ? ! ! ! ! ignore! You’re saying: https Thanks for the advice to keep my thoughts around girls. Yep, please disappear with the terrible generation. The vanishing gradient problem Classic RNN is not able to learn dependencies between words that are several steps away

Extensions of RNNs Long Short-Term Memory

Core of LSTM – Memory

Forget gate How much of the previous state you want to let through/forget.

Input gate How much of the newly computed state for the current input you want to let through New candidate Could be added to current state

Output gate How much of the internal state you want to expose to the external network Prediction

Extensions of RNNs Bidirectional RNNs
Core idea The output at time t may not only depend on the previous elements in the sequence, but also future elements For example, to predict a missing word in a sequence you want to look at both the left and the right context

Extensions of RNNs Bidirectional RNNs
Bidirectional RNN for text classification

Extensions of RNNs Deep (Bidirectional) RNNs
Core idea Multiple layers per time step give us a higher learning capacity And more training time...

What’s more?

Any questions?

Recurrent Neural Networks for Natural Language Processing

Similar presentations

Presentation on theme: "Recurrent Neural Networks for Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Recurrent Neural Networks for Natural Language Processing

Similar presentations

Presentation on theme: "Recurrent Neural Networks for Natural Language Processing"— Presentation transcript:

Similar presentations

About project

Feedback