Recurrent Neural Networks for Natural Language Processing

Slides:



Advertisements
Similar presentations
Recurrent Neural Networks ECE 398BD Instructor: Shobha Vasudevan.
Advertisements

Introduction to Recurrent neural networks (RNN), Long short-term memory (LSTM) Wenjie Pei In this coffee talk, I would like to present you some basic.
Deep Learning Neural Network with Memory (1)
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Supervised Sequence Labelling with Recurrent Neural Networks PRESENTED BY: KUNAL PARMAR UHID:
Predicting the dropouts rate of online course using LSTM method
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SHOW.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.
Attention Model in NLP Jichuan ZENG.
Deep Learning RUSSIR 2017 – Day 3
Unsupervised Learning of Video Representations using LSTMs
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Deep Learning Amin Sobhani.
Randomness in Neural Networks
Recursive Neural Networks
COMP24111: Machine Learning and Optimisation
Show and Tell: A Neural Image Caption Generator (CVPR 2015)
Matt Gormley Lecture 16 October 24, 2016
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Deep Learning: Model Summary
Intro to NLP and Deep Learning
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Intelligent Information System Lab
Intro to NLP and Deep Learning
Neural networks (3) Regularization Autoencoder
Neural Networks and Backpropagation
RNNs: Going Beyond the SRN in Language Prediction
A critical review of RNN for sequence learning Zachary C
Grid Long Short-Term Memory
RNN and LSTM Using MXNet Cyrus M Vahid, Principal Solutions Architect
Advanced Artificial Intelligence
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
A First Look at Music Composition using LSTM Recurrent Neural Networks
Recurrent Neural Networks
Final Presentation: Neural Network Doc Summarization
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Understanding LSTM Networks
Introduction to RNNs for NLP
Recurrent Neural Networks
The Big Health Data–Intelligent Machine Paradox
Long Short Term Memory within Recurrent Neural Networks
Other Classification Models: Recurrent Neural Network (RNN)
Lecture 16: Recurrent Neural Networks (RNNs)
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
Machine Translation(MT)
RNNs: Going Beyond the SRN in Language Prediction
Attention.
Machine learning overview
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Please enjoy.
Word embeddings (continued)
Meta Learning (Part 2): Gradient Descent as LSTM
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
RNNs and Sequence to sequence models
Automatic Handwriting Generation
Recurrent Neural Networks
Sequence-to-Sequence Models
Deep learning: Recurrent Neural Networks CV192
Bidirectional LSTM-CRF Models for Sequence Tagging
LHC beam mode classification
Neural Machine Translation by Jointly Learning to Align and Translate
Listen Attend and Spell – a brief introduction
CRCV REU 2019 Aaron Honculada.
Presentation transcript:

Recurrent Neural Networks for Natural Language Processing Presenter: Haotian

Roadmap What are RNNs? What can RNNs do? How to train RNNs? Extensions of RNNs

What are RNNs? The idea behind RNNs is to make use of sequential information If we want to predict the next word in a sentence, we better know which words came before it We assume that: Inputs are not independent of each other Outputs depend on previous information Different from NN/CNN In other words, RNNs have “memory”!

What are RNNs? x_t: input at step t o_t: output at step t s_t: hidden state at step t  Memory Could be a vector representation of word

What can RNNs do? Language Modeling and Generating Text RNNs allow us to measure how likely a sentence is and predict missing/next words as generative models o_t=x_(t+1) during training Machine Translation Given a sequence of words in source language(German), we want to output a sequence of words in target language(English).

What can RNNs do? Speech Recognition Generating Image Descriptions Given an input sequence of acoustic signals from a sound wave, we can predict a sequence of phonetic segments together with their probabilities. Generating Image Descriptions Together with CNNs, RNNs have been used as part of a model to generate descriptions for unlabeled images.

How to train RNNs? Backpropagation Through Time(BPTT) Loss function: cross-entropy loss Our goal to calculate the gradients of the error with respect to our parameters U, V and W and then learn good parameters using Stochastic Gradient Descent.

How to train RNNs? Backpropagation Through Time(BPTT)

An example of language modeling Language modeling on reddit comments Given 15000 reddit comments, train a model to generate comments

An example of language modeling Vocabulary size=8000 Hidden layer size=100(Memory) One-hot encoding

An example of language modeling Loss function: cross-entropy loss Our goal is to find U,V and W that minimize the loss function for our training data. If we have N(15000) training examples and C(8000) classes: https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/

An example of language modeling Generated text Some good sentences: Anyway, to the city scene you’re an idiot teenager. What ? ! ! ! ! ignore! You’re saying: https Thanks for the advice to keep my thoughts around girls. Yep, please disappear with the terrible generation. The vanishing gradient problem Classic RNN is not able to learn dependencies between words that are several steps away https://jamesmccaffrey.wordpress.com/2013/11/05/why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training/

Extensions of RNNs Long Short-Term Memory

Extensions of RNNs Long Short-Term Memory Core of LSTM – Memory

Extensions of RNNs Long Short-Term Memory Forget gate How much of the previous state you want to let through/forget.

Extensions of RNNs Long Short-Term Memory Input gate How much of the newly computed state for the current input you want to let through New candidate Could be added to current state

Extensions of RNNs Long Short-Term Memory

Extensions of RNNs Long Short-Term Memory Output gate How much of the internal state you want to expose to the external network Prediction

Extensions of RNNs Bidirectional RNNs Core idea The output at time t may not only depend on the previous elements in the sequence, but also future elements For example, to predict a missing word in a sequence you want to look at both the left and the right context

Extensions of RNNs Bidirectional RNNs Bidirectional RNN for text classification

Extensions of RNNs Deep (Bidirectional) RNNs Core idea Multiple layers per time step give us a higher learning capacity And more training time...

What’s more?

Any questions?