Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Speech Recognition. What makes speech recognition hard?
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Why is ASR Hard? Natural speech is continuous
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Introduction to Automatic Speech Recognition
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
7-Speech Recognition Speech Recognition Concepts
Deep Learning Neural Network with Memory (1)
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
Algoritmi e Programmazione Avanzata
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Speech Enhancement based on
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Olivier Siohan David Rybach
Speech Recognition
Artificial Intelligence for Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Hidden Markov Models (HMM)
Conditional Random Fields for ASR
Statistical Models for Automatic Speech Recognition
專題研究 week3 Language Model and Decoding
Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
LECTURE 15: REESTIMATION, EM AND MIXTURES
Personalized Speech Recognition for IoT Mahnoosh Mehrabani, Srinivas Bangalore, Benjamin interactions LLC Proceedings of IEEE 2nd World Forum on.
Artificial Intelligence 2004 Speech & Natural Language Processing
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road

Traditional Automatic Speech Recognition Paradigms Statistical Models Model building: Parameters estimated from recorded audio and transcriptions Recognition: most likely words with respect to observed acoustic speech signal and syntax of language Most likely sequence is max P(W|A) W is word sequence A is acoustic speech signal Acoustic Models (AM): assign probabilities to acoustic information using Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs). HMM model variance in temporal and spectral dimension. Language Models (LM): assign probabilities to sequence of words using n-gram counts represented in Weighted Finite State Transducers (WFST) Power Point Templates

Traditional Automatic Speech Recognition Paradigms Statistical Models Model building: Parameters estimated from recorded audio and transcriptions Recognition: most likely words with respect to observed acoustic speech signal and syntax of language Most likely sequence is max P(W|A) W is word sequence A is acoustic speech signal Acoustic Models (AM): assign probabilities to acoustic information using Deep Neural Networks (DNNs). DNNs have multiple hidden layers, can enable composition of features from lower layers, which gives them a huge learning capacity and thus the potential of modeling complex patterns of speech (deep learning). Language Models (LM): assign probabilities to sequence of words using n-gram counts represented in Weighted Finite State Transducers (WFST) Power Point Templates

Comparison & Short History of Deep Neural Networks (DNNs) Compared to HMMs, DNNs are observed to reduce word-error-rates (WER) on average by 25%. i.e. a HMM system having 80% correct (20% WER) will have 85% correct (15% WER) using DNNs. This is the most substantial single ASR technology steps in the last 10 years. DNNs theory is old, experiments since 90ies, DNNs started appearing in the field 2010 (ICASSP & Interspeech Conference 2010/2014) Used by all major commercial ASR systems e.g., Microsoft Cortana, Xbox, Skype Translator, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc. Power Point Templates

Artificial Intelligence & Semantic Understanding Traditional ASR output is a weighted result of mostly acoustic (DNNs) and language models (n-grams), producing different probability hypothesis. We (desperately) need semantic, contextual understanding, world knowledge, basic reasoning in order to re-rank hypothesis. How to wreck a nice beach vs How to recognize speech may both be valid depending on context. And not even that is enough for real-world applications: Subtitling transcribers clean up spoken language, remove offensive terms, … Power Point Templates

Mariannengasse 14 · 1090 Vienna · Austria Ph · Fax Your name here\ Name of your PP