Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recurrent Neural Networks for Natural Language Processing

Similar presentations


Presentation on theme: "Recurrent Neural Networks for Natural Language Processing"— Presentation transcript:

1 Recurrent Neural Networks for Natural Language Processing
Presenter: Haotian

2 Roadmap What are RNNs? What can RNNs do? How to train RNNs?
Extensions of RNNs

3 What are RNNs? The idea behind RNNs is to make use of sequential information If we want to predict the next word in a sentence, we better know which words came before it We assume that: Inputs are not independent of each other Outputs depend on previous information Different from NN/CNN In other words, RNNs have “memory”!

4 What are RNNs? x_t: input at step t o_t: output at step t
s_t: hidden state at step t  Memory Could be a vector representation of word

5 What can RNNs do? Language Modeling and Generating Text
RNNs allow us to measure how likely a sentence is and predict missing/next words as generative models o_t=x_(t+1) during training Machine Translation Given a sequence of words in source language(German), we want to output a sequence of words in target language(English).

6 What can RNNs do? Speech Recognition Generating Image Descriptions
Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of phonetic segments together with their probabilities. Generating Image Descriptions Together with CNNs, RNNs have been used as part of a model to generate descriptions for unlabeled images.

7 How to train RNNs? Backpropagation Through Time(BPTT)
Loss function: cross-entropy loss Our goal to calculate the gradients of the error with respect to our parameters U, V and W and then learn good parameters using Stochastic Gradient Descent.

8 How to train RNNs? Backpropagation Through Time(BPTT)

9 An example of language modeling
Language modeling on reddit comments Given reddit comments, train a model to generate comments

10 An example of language modeling
Vocabulary size=8000 Hidden layer size=100(Memory) One-hot encoding

11 An example of language modeling
Loss function: cross-entropy loss Our goal is to find U,V and W that minimize the loss function for our training data. If we have N(15000) training examples and C(8000) classes:

12 An example of language modeling
Generated text Some good sentences: Anyway, to the city scene you’re an idiot teenager. What ? ! ! ! ! ignore! You’re saying: https Thanks for the advice to keep my thoughts around girls. Yep, please disappear with the terrible generation. The vanishing gradient problem Classic RNN is not able to learn dependencies between words that are several steps away

13 Extensions of RNNs Long Short-Term Memory

14 Extensions of RNNs Long Short-Term Memory
Core of LSTM – Memory

15 Extensions of RNNs Long Short-Term Memory
Forget gate How much of the previous state you want to let through/forget.

16 Extensions of RNNs Long Short-Term Memory
Input gate How much of the newly computed state for the current input you want to let through New candidate Could be added to current state

17 Extensions of RNNs Long Short-Term Memory

18 Extensions of RNNs Long Short-Term Memory
Output gate How much of the internal state you want to expose to the external network Prediction

19 Extensions of RNNs Bidirectional RNNs
Core idea The output at time t may not only depend on the previous elements in the sequence, but also future elements For example, to predict a missing word in a sequence you want to look at both the left and the right context

20 Extensions of RNNs Bidirectional RNNs
Bidirectional RNN for text classification

21 Extensions of RNNs Deep (Bidirectional) RNNs
Core idea Multiple layers per time step give us a higher learning capacity And more training time...

22 What’s more?

23 Any questions?


Download ppt "Recurrent Neural Networks for Natural Language Processing"

Similar presentations


Ads by Google