Deep learning and applications to Natural language processing

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Neural networks Introduction Fitting neural networks
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
What is the Best Multi-Stage Architecture for Object Recognition? Ruiwen Wu [1] Jarrett, Kevin, et al. "What is the best multi-stage architecture for object.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
Deep Belief Networks for Spam Filtering
Distributed Representations of Sentences and Documents
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Nantes Machine Learning Meet-up 2 February 2015 Stefan Knerr CogniTalk
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
A shallow introduction to Deep Learning
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Introduction to Deep Learning
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Deep Learning Primer Swadhin Pradhan Reading Group Presentation 03/30/2016, UT Austin.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Fill-in-The-Blank Using Sum Product Network
Distributed Representations for Natural Language Processing
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CNN architectures Mostly linear structure
Big data classification using neural network
Unsupervised Learning of Video Representations using LSTMs
Learning Deep Generative Models by Ruslan Salakhutdinov
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Sentiment analysis algorithms and applications: A survey
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
Chilimbi, et al. (2014) Microsoft Research
ECE 5424: Introduction to Machine Learning
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Learning Insights and Open-ended Questions
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
Deep Learning Yoshua Bengio, U. Montreal
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Machine Learning Basics
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
Supervised Training of Deep Networks
Deep Belief Networks Psychology 209 February 22, 2013.
Deep Learning Workshop
Dipartimento di Ingegneria «Enzo Ferrari»
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Word2Vec CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
Convolutional Neural Networks for sentence classification
Deep learning Introduction Classes of Deep Learning Networks
Deep Architectures for Artificial Intelligence
[Figure taken from googleblog
Lecture: Deep Convolutional Neural Networks
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
实习生汇报 ——北邮 张安迪.
Neural networks (3) Regularization Autoencoder
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Toward a Great Class Project: Discussion of Stoianov & Zorzi’s Numerosity Model Psych 209 – 2019 Feb 14, 2019.
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Presentation transcript:

Deep learning and applications to Natural language processing Topic 3 Deep learning and applications to Natural language processing 9/19/2018 Huy V. Nguyen

outline Deep learning overview Deep v. shallow architectures Representation learning Breakthroughs Learning principle: greedy layer-wise training Tera scale: data, model, resource Deep learning success Deep learning in NLP Neural network language models POS, NER, parsing, sentiment, paraphrase Concerns 9/19/2018 Huy V. Nguyen

Deep v. shallow overview (Bengio 2009, Bengio et al. 2013) Human information processing mechanisms suggest deep architectures (e.g. vision, speech, audition, language understanding) Input percept is represented in multiple-level abstraction Most machine learning techniques exploit shallow architectures (e.g. GMM, HMM, CRF, MaxEnt, SVM, Logistic) Linear models cannot capture the complexity Kernel tricks is still not deep enough Unsuccessful attempts to train multi-layer neural networks for decades (before 2006) Feed-forward neural nets with back-propagation Non-convex loss function, local optima 9/19/2018 Huy V. Nguyen

Deep learning and Advantages Wide class of machine learning techniques and architectures Hierarchical in nature Multi-stage processing through multiple non-linear layers Feature re-use for multi-task learning Distributed representation (information is not localized in a particular parameter likes in one-hot representation) Multiple levels of representation Abstraction and invariance More abstract concepts is constructed from less abstract ones More abstract representation is invariant to most local changes of input 9/19/2018 Huy V. Nguyen

Representation learning Representation learning (feature learning) Learning transformation of the data that make it useful information for classifiers or other predictors Traditional machine learning algorithm deployment Hand-crafted features extractor + “simple” trainable classifier Inability to extract/organize the discriminative info. from data End-to-end learning (less dependent on feature engineering) Trainable feature extractor + trainable classifier Hierarchical representation for invariance and feature re-use Deep learning is to learn the intermediate representation Its success belongs to unsupervised representation learning 9/19/2018 Huy V. Nguyen

Encoder – decoder A representation is complete if it is possible to reconstruct the input from it Unsupervised learning for feature/representation extractions An encoder followed by a decoder Encoder encodes input vector to code vector Decoder decodes code vector to reconstruction Minimizes loss function from input to reconstruction 9/19/2018 Huy V. Nguyen

Breakthroughs Deep architectures are desired but difficult to learn Non-convex loss function, local optima 2006: breakthrough initiated by Hinton et al. (2006) 3 hidden-layer Deep belief network (DBN) Greedy layer-wise unsupervised pre-training Fine-tuning up-down algorithm MNIST digits dataset error rate DBN:1.25%, SVM:1.4%, NN:1.51% 9/19/2018 Huy V. Nguyen

Greedy layer-wise training Need of good training algorithm for deep architectures Greedy layer-wise unsupervised pre-training helps to optimize deep networks Supervised training to fine-tune all the layers A general principle applies beyond DBNs (Bengio et al. 2007) 9/19/2018 Huy V. Nguyen

Why greedy layer-wise training works (Bengio 2009, Erhan et al. 2010) Regularization Hypothesis Pre-training is “constraining” parameters in a region relevant to unsupervised dataset Representations that better describe unlabeled data are more discriminative for labeled data (better generalization) Optimization Hypothesis Unsupervised training initializes lower level parameters near localities of better minima than random initialization can 9/19/2018 Huy V. Nguyen

Tera-scale deep learning (Le et al. 2011) Trains 10M 200x200 unlabeled images from YouTube 1K machines (16K cores) in 3 days 9-layer network with 3 sparse auto-encoders 1.15B parameters ImageNet dataset for testing 14M images, 22K categories State-of-the-art: 9.3% (accuracy) Proposed: 15.8% (accuracy) ► Scales-up the dataset, the model and the computational resources 9/19/2018 Huy V. Nguyen

Deep learning success So far, examples of deep learning in vision: achieve state-of- the-art Deep learning has even more impressive impact in speech Shared view of 4 research groups: U. Toronto, Microsoft Research, Google, and IBM Research Commercialized! (Hinton et al. 2012) 9/19/2018 Huy V. Nguyen

Deep learning in NLP The current obstacles of NLP systems: Handcrafting features is time-consuming, and usually difficult Symbolic representation (grammar rules) makes NLP system fragile Advantages by deep learning Distributed representation is more (computationally) efficient than one-hot vector representation (usually used in NLP) Learn from unlabeled data Learn multiple levels of abstraction: word – phrase – sentence 9/19/2018 Huy V. Nguyen

Neural network language models Learns distributed representation for each word – word embedding – to solve the curse of dimensionality Firstly proposed in (Bengio et al. 2003) 2 hidden-layer NN Back-propagation ►Jointly learn language model and word representation The latter is even more useful 9/19/2018 Huy V. Nguyen

Neural network language models Factored restricted Boltzmann machine (Mnih & Hinton 2007) Convolutional architecture (Collobert & Weston 2008) Recurrent neural network (Mikolov et al. 2010) Compare different word representations via NLP tasks (chunking and NER) (Turian et al. 2010) Word embedding helps improve available supervised models ►The proven-efficient setting Semi-supervised learning with task-specific information that jointly inducing word representation and learning class labels 9/19/2018 Huy V. Nguyen

NNLM for Basic tasks SENNA (Collobert & Weston 2011) Convolutional neural network with feature sharing for multi-task learning POS, Chunking, NER, SRL Performs faster (16x to 122x) for less memory (25x) 9/19/2018 Huy V. Nguyen

Basic Tasks (2) SENNA’s architecture 9/19/2018 Huy V. Nguyen

Basic Tasks (3) Syntactic and semantic regularities (Mikolov et al. 2013) <x> is the learned vector representation of word x <apple> - <apples> ≈ <car> - <cars> <man> - <woman> ≈ <king> - <queen> ►Word representation not only helps NLP tasks but also has semantic inside The representation is distributed in vector-form Natural input of computational system ? Computational semantic 9/19/2018 Huy V. Nguyen

Beyond word representation Word representation is not the only thing we need It is the first layer towards building NLP systems We need a deep architecture on-top to take care of NLP tasks Recursive neural network (RNN) is a good fit Works with variable-size input Has tree structure can be learned (greedy) from data Each node is an auto-encoder to learn inner representation Paraphrase detection (Socher et al. 2011) Sentiment distribution (Socher et al. 2011b) Parsing (Socher et al. 2013) 9/19/2018 Huy V. Nguyen

Deep learning in NLP: the concerns Great variety of not-really-dependent tasks Many deep architectures, algorithms, and variants Competitive performance, but not state-of-the-art Not obvious how to combine with existing NLP Not easy to encode prior knowledge of language structure No longer symbolic, not easy to make sense from results Neural language models are difficult to train and time- consuming ►Open to more research, deep learning in NLP is future? Very promising results, unsupervised, big data, across domains, languages, tasks 9/19/2018 Huy V. Nguyen

Conclusions Deep learning = Learning hierarchical representation Unsupervised greedy layer-wise pre-training followed by fine- tuning algorithm Promising results in many applications Vision, audition, natural language understanding Neural network language models play crucial roles in NLP tasks Jointly learn word representation and classification tasks Different tasks take advantage from different deep architectures NLP: Recursive neural networks and convolutional networks What is the best RNN given an NLP task is an open question 9/19/2018 Huy V. Nguyen