Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Slides:



Advertisements
Similar presentations
LABORATOIRE DINFORMATIQUE CERI 339 Chemin des Meinajariès BP AVIGNON CEDEX 09 Tél (0) Fax (0)
Advertisements

Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Building an ASR using HTK CS4706
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Automatic Transcript Generation Helmer Strik A 2 RT Dept. of Language & Speech University of Nijmegen.
Automatic Continuous Speech Recognition Database speech text Scoring.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
7-Speech Recognition Speech Recognition Concepts
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Training Tied-State Models Rita Singh and Bhiksha Raj.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Automatic Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Research on the Modeling of Chinese Continuous Speech Recognition
Speech recognition, machine learning
Artificial Intelligence 2004 Speech & Natural Language Processing
Speech recognition, machine learning
Presentation transcript:

Speech Recognition Part 3 Back end processing

Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction Training Models Pattern Matching Pattern Matching Process Results Process Results Text

Building a phone model Annotate the speech input Split and create feature vectors for each

Praat Semi automatic anotation –

Probability based state machine Hidden Markov Models Transition probability and output probability

One HMM per phone (monophone) th r iy h# 45 phones in British English + silence ($)

One HMM per two phones (biphone) $ - th or th + r th - r or r + iy r - iy or iy + h# iy - h# or h# + $ Associate with left or right phone Up to 45 x 46 + $ = 2,071 models

One HMM per three phones (triphone) $ - th + r th - r + iy r - iy + h# iy - h# + $ Associate with left and right phone Up to 45x46x46+$ = 95,220 models

HMMs are presented with FV sequence of each triphone and “learn” the sound to be recognised using the Baum – Welch algorithm Training HMMs Feature vectors stepped passed and presented to the model

Millions of utterances from different people used to train the models Feature vectors from each phoneme are presented to the HMM in training mode HMM states model temporal variability. Eg, one feature vector “sound” may last longer than another so the HMM may stay in that state for longer What is the probability of the current FV being in state T? What is the probability of the transition from state T to state T, T+1, T+2? After many samples, state and transition probabilities stop improving Not all models need to be created as not all triphone combinations are sensible for a particular language

Language Model Determines the likelihood of word sequences by analysing lots of text from newspapers and other popular textual sources Typically use trigrams – the frequency of a word which follows one word and precedes another Trigrams for the whole ASR vocabulary (if enough training data is available) are stored for look-up to determine probabilities

Trigram Probability

Recognition problem With the phone based HMMs trained, consider the recognition problem: An utterance consists of a sequence of words, W=w 1,w 2,… w n and the Large Vocabulary Recognition system (LVR) needs to find the most probable word sequence W max given the observed acoustic signal Y. In other words:

Bayes’ rule Need to maximise the probability of W given utterance Y - too complex Rewrite using Bayes’ rule to create two more solvable problems

Maximise the probability of can ignore P(y) as it is independent of W P(W) is independent of Y so can be found using written text to form a language model P(Y|W): for a given word(s) what is the probability that the current series of speech vectors are that word. This comes from the acoustic models of phonemes concatenated into words

Recognition goal

Typical recogniser

Dictionary Often called a lexicon. Contains the pronunciations of the vocabulary at a phonetic level There may be more than one pronunciation stored for the same word if it is said differently in different contexts “The end” “Thee end” or dialects “Man- chest-er” or “Mon-chest-oh” There may be one entry for more than one word e.g. Red and Read (need language model to sort this out)

Decoding Complex, processor intensive, memory intensive Consider simplest method (but most costly): –Find start (after a quiet bit) and hypothesise all possible start words based on the current potential words available from the HMMs such as I, eye, high etc. –Determine the probability of the potential HMMs to be one of these

Phoneme recognition Feature vectors presented to all models. Each HMM generates a feature vector and it is compared to the current input feature vector. The output probabilities determine the most likely match. Which phonemes are tested are determined by the language model and lexicon. Feature vectors P(th) = 0.51 P(r) = 0.05 P(iy) = 0.35 P(h#) = P(p) = 0.03 P(k) = 0.015

Concatenate phonemes to make words th(0.01) R(0.5) iy(0.065) h#(0.001) p(0.03) K(0.015)

Probabilities multiplied to get total score Need some way of reducing complexity Obviously too many combinations –th + r + ie + h# is ok –th + z + z + d is not Tree is pruned by discarding least likely paths based on probability and lexical rules

–Check the validity of phoneme sequences using the lexicon –Continue to build the tree until an end of utterance is detected (silence or they may say “full stop as a command”) –From the language model, check the probability of the current possible words based on previous two words

Probability tree A probability tree of possible word combinations is built and the best paths are calculated The tree can become very large and based on processing power and memory requirements, the least likely paths are dropped or the tree is pruned.

Sentence decomposition