Stanford CS224S Spring 2014 CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2014 Lecture 16: Acoustic Modeling.

Slides:



Advertisements
Similar presentations
Zhijie Yan, Qiang Huo and Jian Xu Microsoft Research Asia
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Deep Learning.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Speaker Adaptation for Vowel Classification
Deep Learning for Speech Recognition
Yajie Miao Florian Metze
Speech Recognition Deep Learning and Neural Nets Spring 2015.
Deep Learning and its applications to Speech EE 225D - Audio Signal Processing in Humans and Machines Oriol Vinyals UC Berkeley.
Classification Part 3: Artificial Neural Networks
Classification / Regression Neural Networks 2
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Non-Bayes classifiers. Linear discriminants, neural networks.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Linear Classification with Perceptrons
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Introduction to Deep Learning
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Abstract Deep neural networks are becoming a fundamental component of high performance speech recognition systems. Performance of deep learning based systems.
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
1 Deep Recurrent Neural Networks for Acoustic Modelling 2015/06/01 Ming-Han Yang William ChanIan Lane.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Matt Gormley Lecture 15 October 19, 2016
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 224S / LINGUIST 285 Spoken Language Processing
Welcome deep loria !.
Learning Deep Generative Models by Ruslan Salakhutdinov
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
ECE 5424: Introduction to Machine Learning
COMP24111: Machine Learning and Optimisation
Matt Gormley Lecture 16 October 24, 2016
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
CS 4501: Introduction to Computer Vision Basics of Neural Networks, and Training Neural Nets I Connelly Barnes.
Intelligent Information System Lab
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Neural Networks and Backpropagation
CSE P573 Applications of Artificial Intelligence Neural Networks
Department of Electrical and Computer Engineering
Introduction to Neural Networks
Goodfellow: Chap 6 Deep Feedforward Networks
Collaborative Filtering Matrix Factorization Approach
CS 4501: Introduction to Computer Vision Training Neural Networks II
Deep learning Introduction Classes of Deep Learning Networks
CSE 573 Introduction to Artificial Intelligence Neural Networks
[Figure taken from googleblog
Neural Networks Geoff Hulten.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Backpropagation Disclaimer: This PPT is modified based on
Artificial Intelligence 10. Neural Networks
实习生汇报 ——北邮 张安迪.
Image Classification & Training of Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
David Kauchak CS158 – Spring 2019
Introduction to Neural Networks
Deep Neural Network Language Models
Principles of Back-Propagation
Overall Introduction for the Lecture
Presentation transcript:

Stanford CS224S Spring 2014 CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2014 Lecture 16: Acoustic Modeling with Deep Neural Networks (DNNs)

Stanford CS224S Spring 2014 Logistics Poster session Tuesday! – Gates building back lawn – We will provide poster boards and easels (and snacks) Please help your classmates collect data! – Android phone users – Background app to grab 1 second audio clips – Details at

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 Acoustic Modeling with GMMs Samson S – AE – M – S –AH – N 942 – 6 – 37 – 8006 – 4422 … Transcription: Pronunciation: Sub-phones : Hidden Markov Model (HMM): Acoustic Model: Audio Input: Features 942 Features 942 Features 6 GMM models: P(x|s) x: input features s: HMM state

Stanford CS224S Spring 2014 DNN Hybrid Acoustic Models Samson S – AE – M – S –AH – N 942 – 6 – 37 – 8006 – 4422 … Transcription: Pronunciation: Sub-phones : Hidden Markov Model (HMM): Acoustic Model: Audio Input: Features (x 1 ) P(s|x 1 ) 942 Features (x 2 ) P(s|x 2 ) 942 Features (x 3 ) P(s|x 3 ) 6 Use a DNN to approximate: P(s|x) Apply Bayes’ Rule: P(x|s) = P(s|x) * P(x) / P(s) DNN * Constant / State prior

Stanford CS224S Spring 2014 Not Really a New Idea Renals, Morgan, Bourland, Cohen, & Franco

Stanford CS224S Spring 2014 Hybrid MLPs on Resource Management Renals, Morgan, Bourland, Cohen, & Franco

Stanford CS224S Spring 2014 Modern Systems use DNNs and Senones Dahl, Yu, Deng & Acero

Stanford CS224S Spring 2014 Hybrid Systems now Dominate ASR Hinton et al

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 Σ x1x1 x2x2 x3x3 +1 w1w1 w2w2 w3w3 b Slides from Awni Hannun (CS221 Autumn 2013) Neural Network Basics: Single Unit Logistic regression as a “neuron” Output

Stanford CS224S Spring 2014 a1a1 x1x1 x2x2 x3x3 +1 w 11 a2a2 w 21 Layer 1 / Input Layer 2 / hidden layer Layer 3 / output +1 Slides from Awni Hannun (CS221 Autumn 2013) Single Hidden Layer Neural Network Stack many logistic units to create a Neural Network

Stanford CS224S Spring 2014 Slides from Awni Hannun (CS221 Autumn 2013) Notation

Stanford CS224S Spring 2014 x1x1 x2x2 x3x3 +1 w 11 w Slides from Awni Hannun (CS221 Autumn 2013) Forward Propagation

Stanford CS224S Spring 2014 x1x1 x2x2 x3x3 +1 Layer 1 / Input Layer 2 / hidden layer Layer 3 / output +1 Slides from Awni Hannun (CS221 Autumn 2013) Forward Propagation

Stanford CS224S Spring Layer l +1 Slides from Awni Hannun (CS221 Autumn 2013) Forward Propagation with Many Hidden Layers... Layer l+1

Stanford CS224S Spring 2014 Forward Propagation as a Single Function Gives us a single non-linear function of the input But what about multi-class outputs? – Replace output unit for your needs – “Softmax” output unit instead of sigmoid

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 Objective Function for Learning Supervised learning, minimize our classification errors Standard choice: Cross entropy loss function – Straightforward extension of logistic loss for binary This is a frame-wise loss. We use a label for each frame from a forced alignment Other loss functions possible. Can get deeper integration with the HMM or word error rate

Stanford CS224S Spring 2014 The Learning Problem Find the optimal network weights How do we do this in practice? – Non-convex – Gradient-based optimization – Simplest is stochastic gradient descent (SGD) – Many choices exist. Area of active research

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 Slides from Awni Hannun (CS221 Autumn 2013) Computing Gradients: Backpropagation Backpropagation Algorithm to compute the derivative of the loss function with respect to the parameters of the network

Stanford CS224S Spring 2014 gx f Slides from Awni Hannun (CS221 Autumn 2013) Chain Rule Recall our NN as a single function:

Stanford CS224S Spring 2014 g1g1 x f g2g2 CS221: Artificial Intelligence (Autumn 2013) Chain Rule

Stanford CS224S Spring 2014 g1g1 x f gngn... CS221: Artificial Intelligence (Autumn 2013) Chain Rule

Stanford CS224S Spring 2014 f1f1 x f2f2 CS221: Artificial Intelligence (Autumn 2013) Backpropagation Idea: apply chain rule recursively f3f3 w1w1 w2w2 w3w3 δ (3) δ (2)

Stanford CS224S Spring 2014 x1x1 x2x2 x3x3 +1 δ (3) +1 CS221: Artificial Intelligence (Autumn 2013) Backpropagation Loss

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 What’s Different in Modern DNNs? Fast computers = run many experiments Many more parameters Deeper nets improve on shallow nets Architecture choices (easiest is replacing sigmoid) Pre-training does not matter. Initially we thought this was the new trick that made things work

Stanford CS224S Spring 2014 Scaling up NN acoustic models in 1999 [Ellis & Morgan. 1999] 0.7M total NN parameters

Stanford CS224S Spring 2014 Adding More Parameters 15 Years Ago Size matters: An empirical study of neural network training for LVCSR. Ellis & Morgan. ICASSP Hybrid NN. 1 hidden layer. 54 HMM states. 74hr broadcast news task “…improvements are almost always obtained by increasing either or both of the amount of training data or the number of network parameters … We are now planning to train an 8000 hidden unit net on 150 hours of data … this training will require over three weeks of computation.”

Stanford CS224S Spring 2014 Adding More Parameters Now Comparing total number of parameters (in millions) of previous work versus our new experiments Maas, Hannun, Qi, Lengerich, Ng, & Jurafsky. In submission.

Stanford CS224S Spring 2014 Sample of Results 2,000 hours of conversational telephone speech Kaldi baseline recognizer (GMM) DNNs take 1 -3 weeks to train Acoustic Model Training hours Dev CrossEnt Dev Acc(%) FSH WER GMM2,000N/A 32.3 DNN 36M DNN 200M DNN 36M2, DNN 200M2, Maas, Hannun, Qi, Lengerich, Ng, & Jurafsky. In submission.

Stanford CS224S Spring 2014 Depth Matters (Somewhat) Yu, Seltzer, Li, Huang, Seide Warning! Depth can also act as a regularizer because it makes optimization more difficult. This is why you will sometimes see very deep networks perform well on TIMIT or other small tasks.

Stanford CS224S Spring 2014 Architecture Choices: Replacing Sigmoids Rectified Linear (ReL) [Glorot et al, AISTATS 2011] Leaky Rectified Linear (LReL)

Stanford CS224S Spring 2014 Rectifier DNNs on Switchboard ModelDev CrossEnt Dev Acc(%) Switchboard WER Callhome WER Eval 2000 WER GMM BaselineN/A Layer Tanh Layer ReLU Layer LRelU Layer Tanh Layer RelU Layer LRelU Layer Tanh Layer RelU Layer LRelU Layer Sigmoid CE [MSR] Layer Sigmoid MMI [IBM] Maas, Hannun, & Ng,

Stanford CS224S Spring 2014 Rectifier DNNs on Switchboard ModelDev CrossEnt Dev Acc(%) Switchboard WER Callhome WER Eval 2000 WER GMM BaselineN/A Layer Tanh Layer ReLU Layer LRelU Layer Tanh Layer RelU Layer LRelU Layer Tanh Layer RelU Layer LRelU Layer Sigmoid CE [MSR] Layer Sigmoid MMI [IBM] Maas, Hannun, & Ng,

Stanford CS224S Spring 2014 Outline Hybrid acoustic modeling overview – Basic idea – History – Recent results Deep neural net basic computations – Forward propagation – Objective function – Computing gradients What’s different about modern DNNs? Extensions and current/future work

Stanford CS224S Spring 2014 Convolutional Networks Slide your filters along the frequency axis of filterbank features Great for spectral distortions (eg. Short wave radio) Sainath, Mohamed, Kingsbury, & Ramabhadran

Stanford CS224S Spring 2014 Recurrent DNN Hybrid Acoustic Models Samson S – AE – M – S –AH – N 942 – 6 – 37 – 8006 – 4422 … Transcription: Pronunciation: Sub-phones : Hidden Markov Model (HMM): Acoustic Model: Audio Input: Features (x 1 ) P(s|x 1 ) 942 Features (x 2 ) P(s|x 2 ) 942 Features (x 3 ) P(s|x 3 ) 6

Stanford CS224S Spring 2014 Other Current Work Changing the DNN loss function. Typically using discriminative training ideas already used in ASR Reducing dependence on high quality alignments. In the limit you could train a hybrid system from flat start / no alignments Multi-lingual acoustic modeling Low resource acoustic modeling

Stanford CS224S Spring 2014 End More on deep neural nets: – – – MSR video: Class logistics: – Poster session Tuesday! 2-4pm on Gates building back lawn – We will provide poster boards and easels (and snacks)