Download presentation

Presentation is loading. Please wait.

1
Speech Recognition

2
What makes speech recognition hard?

3
Speech Recognition Task: Identify sequence of words uttered by speaker, given acoustic waveform. Uncertainty introduced by noise, speaker error, variation in pronunciation, homonyms, etc. Thus speech recognition is viewed as problem of probabilistic inference.

4
Example: “I’m firsty, um, can I haf somefing to dwink?” From Russell and Norvig, Artificial Intelligence

5
Speech Recognition System Architecture (from Buchsbaum & Giancarlo paper) Here, “lattice” means “Hidden Markov Model” Acoustic feature extraction Acoustic Features–>Phones model Phones–>Word pronounciation model Language model

6
Acoustic feature extraction From Russell and Norvig, Artificial Intelligence

8
Hidden Markov Models Markov model: Given state X t, what is probability of transitioning to next state X t+1 ? E.g., word bigram probabilities give P (word t+1 | word t ) Hidden Markov model: There are observable states (e.g., signal S) and “hidden” states (e.g., Words). HMM represents probabilities of hidden states given observable states.

9
Phone model P( phone | frame features) = P(frame features| phone) P(phone) P(frame features| phone) often represented by Gaussian mixture model

10
From Russell and Norvig, Artificial Intelligence Acoustic Features–>Phones model

11
Word Pronunciation model Now we want P (words|phones 1:t ) = P(phones 1:t | words) P(words) Represent P(phones 1:t | words) as an HMM Phones–>Word pronounciation model

12
Example of Phones–>Word pronounciation model From Russell and Norvig, Artificial Intelligence

13
Language model

14
To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that

15
To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that Language model Combine phone models, segmentation models, word pronunciation models Search or “decoding” method

16
To build a speech recognition system, need: Lots of data Acoustic signal processing tools Methods for learning various probability models Methods for “maximum likelihood” calculation (i.e., search or “decoding”): Suppose we have observations (features from acoustic signal) O= (o 1 o 2 o 3 … o n ). We want to find W* = (w 1 w 2 w 3 … w n ) such that Language model Combine phone models, segmentation models, word pronunciation models Search or “decoding” method

17
Emotion recognition in speech (by OES high-school students!) http://www.youtube.com/watch?v=NnbsGyViN3Y

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google