Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton St. Cambridge Reporter : Chang Chih Hao

Introduction LDA and HLDA –Better classification accuracy –some common Limitations None of them assumes any prior knowledge of confusable hypotheses Their objective functions do not directly relate to the word error rate (WER) Minimum Phoneme Error –Minimize phoneme errors in lattice-based training frameworks –Since this criterion is closely related to WER, MPE_HLDA tends to be more robust than other projection methods, which makes it potentially better suited for a wider variety of features.

MPE Objection Function MPE-HLDA model MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function

MPE Objection Function

MPE Objection Function It can be shown that the derivative of (4) with respect to A is

MPE Objection Function

MPE Objection Function Therefore, Eq.(6) can be rewritten as 39*39 39*162

MPE-HLDA Implementation In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory. Two steps –First, run a forward-backward pass over the training lattices to acumulate –Second, uses these statistics together with the full covariance matrices to synthesize the derivative. The Paper used gradient descent in updating the projection matrix.

MPE-HLDA Implementation

Experimental Framework A L p*n n*l l*1 p*1 Global feature projection ---there is more useful information in longer contexts ---Reduce the computational cost

Experimentation Conversational Telephone Speech (CTS) –2300 hours of training data 800 hours : training the initial ML model 1500 hours : held-out training data –Lattice generation –Discriminative training –MPE-HLDA : only 370 hours –Testing set Eval03 Dev04

Experimentation Conversational Telephone Speech (CTS) –Feature Frame concatenated PLP cepstra –15 frames, l = 225, n = 130, p = 60

Experimentation

Broadcast News (BN) –600 hours : training the initial model (Hub4 and TDT) –330 hours : held-out data

Thanks

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.

Similar presentations

Presentation on theme: "Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton.

Similar presentations

Presentation on theme: "Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton."— Presentation transcript:

Similar presentations

About project

Feedback