Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya

Introduction Two issues for n-gram –Generalizability & adaptability Generalizability –Word class / parsing –Measure similarity in the continuous space Adaptability –Larger parameter numbers for LM –Use continuous space to reduce parameter numbers

Approach Word –Word vector of dimensions –New word vector of dimensions History: concatenation of words –History vector: N-1 words, dimensions –New History vector: uh 1 uh 2 … uh n-1 M M M N-1 … y L

Approach (cont.) Probability density for history y given the word w Probability of word w given history y Smoothed n-gram or smoothed clustered n-gram or *exponents can be used to control the dynamic ranges of n-gram and Gaussian mixture probabilities

Implementation Word co-occurrence matrix E –Word i follows word j –SVD, 100 dimensions To create a trigram –Two words are stacked to form a 200-d vector LDA +MLLT –Reduce dimensionality to 50 GMM Training

Experimental results 5-best rescoring

A discriminative training framework using n-best speech recognition transcriptions and scores for spoken utterance classification Sibel Yaman, Li Deng, Dong Yu, Ye-Yi Wang, Alex Acero

Introduction Conventionally, a two-phase approaches is adapted for SUC (spoken utterance classification) task –ASR transcription –Semantic classification It has been reported that reduction in WER (word error rate) do not necessarily translate into CER (classification error rate) A novel discriminative training framework for learning the language and classification model is proposed

DT framework Using the N-best Lists As long as enough words are recognized to trigger the correct salient phrase, the correct meaning is assigned to the utterance Using ME Classifier Joint association score

DT framework Using the N-best Lists (cont.) The most likely to yield the correct class is first extracted based on joint association score from N-best list Assign remaining sentences in the N-best list Assignment of sentences in the N-best list to classes is an effective mechanism for discriminating the sentence in the N-best list that is most likely to yield the correct class from those that more likely to yield other wrong classes

DT framework Using the N-best Lists (cont.) Discriminant function & loss function Approximation loss

DT framework Using the N-best Lists (cont.) Assignment of class ●

DT framework Using the N-best Lists (cont.) DT of LM parameters DT of classifier parameters

Experimental Results

Conclusions A new discriminative training framework for spoken utterance classification was proposed The use of N-best transcription is motivated by the fact the same class is often associated with many variants of spoken utterances

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Similar presentations

Presentation on theme: "Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya.

Similar presentations

Presentation on theme: "Gaussian Mixture Language Models for Speech Recognition Mohamed Afify, Olivier Siohan and Ruhi Sarikaya."— Presentation transcript:

Similar presentations

About project

Feedback