IRCS/CCN Summer Workshop June 2003 Speech Recognition.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models in NLP
Visual Recognition Tutorial
Speech Recognition. What makes speech recognition hard?
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Lecture 5: Learning models using EM
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Baysian Approaches Kun Guo, PhD Reader in Cognitive Neuroscience School of Psychology University of Lincoln Quantitative Methods 2011.
Scalable Text Mining with Sparse Generative Models
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
Introduction to Automatic Speech Recognition
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Graphical models for part of speech tagging
7-Speech Recognition Speech Recognition Concepts
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Automatic Speech Recognition
Learning, Uncertainty, and Information: Learning Parameters
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Statistical Models for Automatic Speech Recognition
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Hidden Markov Models Part 2: Algorithms
Statistical Models for Automatic Speech Recognition
CONTEXT DEPENDENT CLASSIFICATION
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

IRCS/CCN Summer Workshop June 2003 Speech Recognition

Why is perception hard? Task: available signals → model of the world around –signals are mostly accidental, inadequate –sometimes disguised or falsified –always mixed-up and ambiguous Reasoning about the source of signals: –Integration of context: what do you expect? –“Sensor fusion”: integration of vision, sound, smell etc. –Source (and noise) separation: there’s more than one thing out there –Variable perspective, source variation etc. depends on the type of signal depends on the type of object Much harder than chess or calculus!

Bayesian probability estimation Thomas Bayes ( ) –Minister of the Presbyterian Chapel at Tunbridge Wells –Amateur mathematician –Essay towards solving a problem in the doctrine of chances, published (posthumously) in 1764 Crucial idea: background (prior) knowledge about the plausibility of different theories can be combined with knowledge about the relation of theories to evidence in a mathematically well-defined way even if all knowledge is uncertain to reason about the most likely explanation of the available evidence Bayes’ theorem –“the most important equation in the history of mathematics” (?) –a simple consequence of basic definitions, or –a still-controversial recipe for the probability of alternative causes for a given event, or –the implicit foundation of human reasoning –a general framework for solving the problems of perception Tutorial on Bayes’ Theorem

Fundamental theorem of speech recognition P(W|S) ∝ P(S|W)P(W) where W is “Word(s)” (i.e. message text) S is “Sound(s)” (i.e. speech signal) “Noisy channel model” of communications engineering due to Shannon 1949 New algorithms, especially relevant to speech recognition due to L.E. Baum et al. ~ Applied to speech recognition by Jim Baker (CMU PhD 1975), Fred Jelinek (IBM speech group >>1975)

Motivations for a Bayesian approach A consistent framework for integrating previous experience and current evidence A quantitative model for “abduction” = reasoning about the best explanation A general method for turning a generative model into an analytic one = “analysis by synthesis” helpful where |categories| << |signals| These motivations apply both in engineering practice and in the evolution of biological systems

Basic architecture of standard speech recognition technology 1. Bayes’ Rule: P(W|S) ∝ P(S|W)P(W) 2. Approximate P(S|W)P(W) as a Hidden Markov Model a probabilistic function [ to get P(S|W)] of a markov chain [ to get P(W) ] 3. Use Baum/Welch (=EM) algorithm to “learn” HMM parameters 4. Use Viterbi decoding to find the most probable W given S in terms of the estimated HMM

HMM parameter estimation given labelled/aligned training data...

Viterbi decoding given HMM & observed signal...

Sketch of Baum-Welch (EM) algorithm for estimating HMM parameters given unaligned (or even unlabelled) training data

Other typical details: Complex elaborations of the basic ideas HMM states ← triphones ← words –each triphone → 3-5 states + connection pattern –phone sequence from pronuncing dictionary –clustering for estimation Acoustic features –RASTA-PLP etc. –Vocal tract length normalization, speaker clustering Output pdf for each state as mixture of gaussians Language model as N-gram model over words –recency/topic effects Empirical weighting of language vs. acoustic models etc.

Some limitations of the standard architecture Problems with Markovian assumptions Modeling trajectory effects Variable coordination of articulatory dimensions....