Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Similar presentations


Presentation on theme: "Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya."— Presentation transcript:

1 Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya

2 Introduction Two issues for n-gram –Generalizability & adaptability Generalizability –Word class / parsing –Measure similarity in the continuous space Adaptability –Larger parameter numbers for LM –Use continuous space to reduce parameter numbers

3 Approach Word –Word vector of dimensions –New word vector of dimensions History: concatenation of words –History vector: N-1 words, dimensions –New History vector: uh 1 uh 2 … uh n-1 M M M N-1 … y L

4 Approach (cont.) Probability density for history y given the word w Probability of word w given history y Smoothed n-gram or smoothed clustered n-gram or *exponents can be used to control the dynamic ranges of n-gram and Gaussian mixture probabilities

5 Implementation Word co-occurrence matrix E –Word i follows word j –SVD, 100 dimensions To create a trigram –Two words are stacked to form a 200-d vector LDA +MLLT –Reduce dimensionality to 50 GMM Training

6 Experimental results 5-best rescoring

7 A discriminative training framework using n-best speech recognition transcriptions and scores for spoken utterance classification Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alex Acero

8 Introduction Conventionally, a two-phase approaches is adapted for SUC (spoken utterance classification) task –ASR transcription –Semantic classification It has been reported that reduction in WER (word error rate) do not necessarily translate into CER (classification error rate) A novel discriminative training framework for learning the language and classification model is proposed

9 DT framework Using the N-best Lists As long as enough words are recognized to trigger the correct salient phrase, the correct meaning is assigned to the utterance Using ME Classifier Joint association score

10 DT framework Using the N-best Lists (cont.) The most likely to yield the correct class is first extracted based on joint association score from N-best list Assign remaining sentences in the N-best list Assignment of sentences in the N-best list to classes is an effective mechanism for discriminating the sentence in the N-best list that is most likely to yield the correct class from those that more likely to yield other wrong classes

11 DT framework Using the N-best Lists (cont.) Discriminant function & loss function Approximation loss

12 DT framework Using the N-best Lists (cont.) Assignment of class ●

13 DT framework Using the N-best Lists (cont.) DT of LM parameters DT of classifier parameters

14 Experimental Results

15 Conclusions A new discriminative training framework for spoken utterance classification was proposed The use of N-best transcription is motivated by the fact the same class is often associated with many variants of spoken utterances


Download ppt "Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya."

Similar presentations


Ads by Google