Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.

Similar presentations


Presentation on theme: "Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky."— Presentation transcript:

1 Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky

2 Automated Speech Recognition - setup Page 2 Automated Speach Recognotion Input – speech waveform: Preprocessing: Modeling: Output – transcription: The boy is in the red house

3 ASR - basics Page 3 Automated Speach Recognotion Observations representing a speech signal Vocabulary V of different words Our goal – find the most likely word sequence Since we have language modeling acoustic modeling

4 Observations preprocessing Page 4 Automated Speach Recognotion A sampled waveform is converted into a sequence of parameter vectors at a certain frame rate A frame rate of 10 ms is usually taken, because a speech signal is assumed to be stationary for about 10 ms Many different ways to extract meaningful features have been developed, some based on acoustic concepts or knowledge of the human vocal tract and psychophysical knowledge of the human perception

5 Language modeling Page 5 Automated Speach Recognotion Most generally, the probability of a sequence m of words is Language is highly structured and limited histories are capable of capturing quite a bit of this structure. Bigram models: More powerful two-words (trigrams) history models Longer history -> exponentially increasing number of models -> more data is required to train, more parameters, more overfitting Partial Matching modeling

6 Acoustic modeling Page 6 Automated Speach Recognotion Determines what sound is pronounced when a given sentence is uttered Number of possibilities is infinite! (depends on the speaker, the ambiance, microphone placement etc.) Possible solution – a parametric model in the form of Hidden Markov Model Notice other solutions may also apply (for example, neural nets)

7 Hidden Markov Model Page 7 Automated Speach Recognotion A simple example of HMM:

8 Hidden Markov Model Page 8 Automated Speach Recognotion

9 Hidden Markov Model – Forward Algorithm Page 9 Automated Speach Recognotion Given an observation sequence (for example 10110), what is the probability it was generated from a given HMM (for example – HMM from previous slides). Therefore, summing over all possible paths:

10 Hidden Markov Model – Forward Algorithm Page 10 Automated Speach Recognotion A more efficient approach – forward algorithm

11 Hidden Markov Model – Forward Algorithm Page 11 Automated Speach Recognotion Forward algorithm: calculates the probabilities for all subsequences in each time step, using the results from the previous step (dynamic programming)

12 Hidden Markov Model – Forward Algorithm Page 12 Automated Speach Recognotion

13 Hidden Markov Model – Forward Algorithm Page 13 Automated Speach Recognotion

14 Hidden Markov Model – Viterbi algorithm Page 14 Automated Speach Recognotion Previously: given an observation sequence (for example 10110), what is the probability it was generated from a given HMM We now ask: given an observation sequence, what is the sequence of states that it most likely to have generated it?

15 Hidden Markov Model – Viterbi algorithm Page 15 Automated Speach Recognotion We solve this with the same forward algorithm as before, but this time with maximization instead of summation.

16 Hidden Markov Model – Viterbi algorithm Page 16 Automated Speach Recognotion

17 Hidden Markov Model – Viterbi algorithm, example Page 17 Automated Speach Recognotion Observations sequence - 101

18 Hidden Markov Model – model fitting Page 18 Automated Speach Recognotion We are interested in No analytical maximum likelihood solution. We turn to Baum-Welch algorithm or forward-backward algorithm. Basic idea – count the visits of each state and the number of transitions to derive a probability estimator

19 Hidden Markov Model – model fitting Page 19 Automated Speach Recognotion Define:

20 Hidden Markov Model – model fitting Page 20 Automated Speach Recognotion Define: Therefore, the probability of being in state i at time t, given the entire observations sequence and the model is simply:

21 Hidden Markov Model – model fitting Page 21 Automated Speach Recognotion Define: the probability of being in state i at time t and state j at time t+1 given the model and the observations sequence: Graphically:

22 Hidden Markov Model – model fitting Page 22 Automated Speach Recognotion We are now ready to introduce the parameters estimators: Transition probability estimator: The expected number of transitions from state i to j, normalized by the expected number of visits of state i

23 Hidden Markov Model – model fitting Page 23 Automated Speach Recognotion We are now ready to introduce the parameters estimators: Observations probability estimator:

24 Hidden Markov Model – model fitting Page 24 Automated Speach Recognotion Notice that the parameters we wish to estimate actually appear in both sides of the equation:

25 Hidden Markov Model – model fitting Page 25 Automated Speach Recognotion Notice that the parameters we wish to estimate actually appear in both sides of the equation Therefore, we use an iterative procedure: after stating with an initial guess for the parameters we gradually update at each iteration and terminate once the parameters stop changing to a certain limit.

26 Hidden Markov Model – model fitting Page 26 Automated Speach Recognotion We estimate the mean and variance for each state j:

27 Conclusions and final remarks Page 27 Automated Speach Recognotion We learned how to: I.Estimate HMM parameters from a sequence of observations II.Determine the probability of observing a sequence given an HMM III.Determine the most likely sequence of states, given an HMM and a sequence of observations Notice the states may either represent words, syllables, phoneme, etc. This is up for the system architect to decide For example, words are more informative than syllables, but results with more states and less accurate probability estimation (curse of dimensionality)

28 Questions?Questions? Page 28 Thank you! Automated Speach Recognotion


Download ppt "Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky."

Similar presentations


Ads by Google