Presentation is loading. Please wait.

Presentation is loading. Please wait.

AN EXPECTATION MAXIMIZATION APPROACH FOR FORMANT TRACKING USING A PARAMETER-FREE NON-LINEAR PREDICTOR Issam Bazzi, Alex Acero, and Li Deng Microsoft Research.

Similar presentations


Presentation on theme: "AN EXPECTATION MAXIMIZATION APPROACH FOR FORMANT TRACKING USING A PARAMETER-FREE NON-LINEAR PREDICTOR Issam Bazzi, Alex Acero, and Li Deng Microsoft Research."— Presentation transcript:

1 AN EXPECTATION MAXIMIZATION APPROACH FOR FORMANT TRACKING USING A PARAMETER-FREE NON-LINEAR PREDICTOR Issam Bazzi, Alex Acero, and Li Deng Microsoft Research One Microsoft Way Redmond, WA, USA 2003

2 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20072/14 Outline Introduction The Model EM Training Format Tracking Experiment Results Conclusion

3 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20073/14 Introduction Traditional methods use LPC or matching stored templates of spectral cross sections In either case, formant tracking is error-prone due to not enough candidates or templates This paper uses a predictor codebook of MFCC to present formant relationships Also, this method explores the complete formant space, avoiding premature elimination in LPC or template matching

4 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20074/14 The Model o t = F(x t ) + r t o t is observed MFCC coefficients x t is vocal tract resonances (VTR) and corresponding bandwidths F(x t ) is the quantized frequency and bandwidth of formants, named predictor codebook r t is the residual signal

5 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20075/14 Constructing F(x) All-pole model Assume there are I formants x = (F 1, B 1, F 2, B 2,……, F I, B I ) Then use z-transfrom to get H(z): Finally, each quantized VTR x can be transformed into a MFCC series F(x)

6 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20076/14 EM Training (1/2) Use a single Gaussian to model r t T frames utterance, θ is parameters (mean and covariance) of Gaussian Assume formant values x are uniformly distributed, and can take any of C quantized values

7 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20077/14 EM Training (2/2)

8 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20078/14 Formant Tracking (1/2) Frame-by-Frame Tracking  Formants in each frame are estimated independently  One-to-one Mapping (MAP)  Minimum Mean Squared Error (MMSE)

9 Ts'ai, Chung-Ming, Speech Lab, NTUST, 20079/14 Formant Tracking (2/2) Tracking with Continuity Constraints  First Order State Model: x t = x t-1 + w t  w t is modeled as a Gaussian with zero mean and diagonal Σ w  MAP method below can be estimated using Viterbi search  MMSE is more much complex and this paper uses an approximate method to obtain, which is not well described here

10 Ts'ai, Chung-Ming, Speech Lab, NTUST, 200710/14 Experiment Settings Track 3 formants  Frequencies are first mapped on mel-scale then uniformly quantized  Bandwidths are simply uniformly quantized  F1 < F2 < F3, so totally 767500 entries in codebook Gain = 1 MFCC is 12 dimension, without C 0 20 utterances of one male speaker are used for EM

11 Ts'ai, Chung-Ming, Speech Lab, NTUST, 200711/14 Experiment Results, “they were what”

12 Ts'ai, Chung-Ming, Speech Lab, NTUST, 200712/14 Experiment Results, with bandwidth

13 Ts'ai, Chung-Ming, Speech Lab, NTUST, 200713/14 Experiment Results, residual

14 Ts'ai, Chung-Ming, Speech Lab, NTUST, 200714/14 Conclusion This method is totally unsupervised, needless of any labeling Works well in unvoiced frames No gross errors May be applied to speech recognizing system


Download ppt "AN EXPECTATION MAXIMIZATION APPROACH FOR FORMANT TRACKING USING A PARAMETER-FREE NON-LINEAR PREDICTOR Issam Bazzi, Alex Acero, and Li Deng Microsoft Research."

Similar presentations


Ads by Google