Presentation on theme: "PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE."— Presentation transcript:
PHONE MODELING AND COMBINING DISCRIMINATIVE TRAINING FOR MANDARIN-ENGLISH BILINGUAL SPEECH RECOGNITION Yanmin Qian, Jia Liu ICASSP2010 Pei-Ning Chen CSIE NTNU
Outline Introduction Phone Clustering Modeling Approaches – Traditional phone clustering methods – New state-time-alignment clustering method Bilingual Speech Recognition Combined Discriminative training – Review of two discriminative training methods Experimental Results Conclusion
Introduction Discriminative acoustic training techniques, such as discriminative model training MPE and discriminative feature training fMPE, have led to significant accuracy improvement on many monolingual systems. A novel State-Time-Alignment (STA) method is proposed to balance the performance and the complexity of the bilingual speech recognition system, and a comparison with the acoustic- likelihood method is presented.
Traditional phone clustering methods Current approaches to obtain an accurate global phone inventory can be divided into two categories: knowledge-based and data-driven The clustering method merged the phones between Mandarin and English based on the acoustic likelihood measure.
New state-time-alignment clustering method They first roughly divided the phone inventory into several subsets based on pronunciation rules, and then constrain the merging to only be within the same phonetic subset.
New STA clustering method(cont.) Taking a Mandarin phone model i and an English phone mode j as an example:
New STA clustering method(cont.) Define as the count of times t where model i has its state m active and model j has its state n active. Calculate the distance of state and in sub-segment q by the Bhattacharyya distance measure
New STA clustering method(cont.) And obtained the distance between model i and j by sum of the weighted sub-distances using : Then the two models are merged according to the minimum distance, and the new model parameters are updated as follows:
Review of two discriminative training methods The objective function used in MPE can be written as: fMPE : train a high-dimensional feature transformation. The original feature vector for frame t is transformed:
Discriminative training methods in bilingual system They trained a pure Chinese and a pure English language model to apply to the English and Chinese numerator lattices and obtained a mixed language model to apply on the denominator lattices, all of these language models are trained on the transcriptions of the training data.
Conclusion They have investigated the phone clustering methods and proposed a novel method named STA to obtain a more effective, compact and accurate bilingual phone inventory. Experimental results showed STA outperforms existing approaches. The contributions from these discriminative training methods are quite significant, and the combined approach (fMPE+MPE) obtain the best improvements.