Presentation is loading. Please wait.

Presentation is loading. Please wait.

IRCS/CCN Summer Workshop June 2003 Speech Recognition.

Similar presentations


Presentation on theme: "IRCS/CCN Summer Workshop June 2003 Speech Recognition."— Presentation transcript:

1 IRCS/CCN Summer Workshop June 2003 Speech Recognition

2 Why is perception hard? Task: available signals → model of the world around –signals are mostly accidental, inadequate –sometimes disguised or falsified –always mixed-up and ambiguous Reasoning about the source of signals: –Integration of context: what do you expect? –“Sensor fusion”: integration of vision, sound, smell etc. –Source (and noise) separation: there’s more than one thing out there –Variable perspective, source variation etc. depends on the type of signal depends on the type of object Much harder than chess or calculus!

3 Bayesian probability estimation Thomas Bayes (1702-1761) –Minister of the Presbyterian Chapel at Tunbridge Wells –Amateur mathematician –Essay towards solving a problem in the doctrine of chances, published (posthumously) in 1764 Crucial idea: background (prior) knowledge about the plausibility of different theories can be combined with knowledge about the relation of theories to evidence in a mathematically well-defined way even if all knowledge is uncertain to reason about the most likely explanation of the available evidence Bayes’ theorem –“the most important equation in the history of mathematics” (?) –a simple consequence of basic definitions, or –a still-controversial recipe for the probability of alternative causes for a given event, or –the implicit foundation of human reasoning –a general framework for solving the problems of perception Tutorial on Bayes’ Theorem

4

5

6

7

8 Fundamental theorem of speech recognition P(W|S) ∝ P(S|W)P(W) where W is “Word(s)” (i.e. message text) S is “Sound(s)” (i.e. speech signal) “Noisy channel model” of communications engineering due to Shannon 1949 New algorithms, especially relevant to speech recognition due to L.E. Baum et al. ~ 1965-1970 Applied to speech recognition by Jim Baker (CMU PhD 1975), Fred Jelinek (IBM speech group >>1975)

9 Motivations for a Bayesian approach A consistent framework for integrating previous experience and current evidence A quantitative model for “abduction” = reasoning about the best explanation A general method for turning a generative model into an analytic one = “analysis by synthesis” helpful where |categories| << |signals| These motivations apply both in engineering practice and in the evolution of biological systems

10 Basic architecture of standard speech recognition technology 1. Bayes’ Rule: P(W|S) ∝ P(S|W)P(W) 2. Approximate P(S|W)P(W) as a Hidden Markov Model a probabilistic function [ to get P(S|W)] of a markov chain [ to get P(W) ] 3. Use Baum/Welch (=EM) algorithm to “learn” HMM parameters 4. Use Viterbi decoding to find the most probable W given S in terms of the estimated HMM

11 HMM parameter estimation given labelled/aligned training data...

12 Viterbi decoding given HMM & observed signal...

13 Sketch of Baum-Welch (EM) algorithm for estimating HMM parameters given unaligned (or even unlabelled) training data

14 Other typical details: Complex elaborations of the basic ideas HMM states ← triphones ← words –each triphone → 3-5 states + connection pattern –phone sequence from pronuncing dictionary –clustering for estimation Acoustic features –RASTA-PLP etc. –Vocal tract length normalization, speaker clustering Output pdf for each state as mixture of gaussians Language model as N-gram model over words –recency/topic effects Empirical weighting of language vs. acoustic models etc.

15 Some limitations of the standard architecture Problems with Markovian assumptions Modeling trajectory effects Variable coordination of articulatory dimensions....


Download ppt "IRCS/CCN Summer Workshop June 2003 Speech Recognition."

Similar presentations


Ads by Google