Presentation is loading. Please wait.

Presentation is loading. Please wait.

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.

Similar presentations


Presentation on theme: "HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre."— Presentation transcript:

1 HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

2 2 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

3 HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

4 4 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

5 5 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Feature normalization  Motivation of PEQ:  Limitation of linear methods:  Cepstral Mean Normalization  Cepstral Mean and Variance Normalization  Limitation of non-linear methods (HEQ, OSEQ):  Speech/non-speech ratio  Estimation problems  Parametric Equalization PEQ:  Two Gaussian Model (speech / non-speech)  Training of clean Gaussians; estimation of noisy Gaussians  Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)

6 6 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Aurora-2 results: Aver. WERRelative improv. BASELINE34.1 %0.0 % OSEQ17.5 %48.6 % PEQ18.6 %45.3 %  Aurora-4 results: Aver. WERRelative improv. BASELINE45.6 %0.0 % OSEQ37.5 %17.8 % PEQ31.5 %30.1 %

7 7 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Additional problem of non-linear transformations:  Once the transformation is estimated, it is an “instantaneous transformation”  Temporal correlations are not exploited  Temporal Smoothing (TES):  Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data

8 8 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Aurora-2 results: Aver. WERImprov.Aver. WERImprov. BASELINE34.1 %0.0 %31.6 %6.5 % OSEQ17.5 %48.6 %15.5 %54.3 % PEQ18.6 %45.3 %---  Aurora-4 results: TES Aver. WERImprov.Aver. WERImprov. BASELINE45.6 %0.0 %43.4 %4.9 % OSEQ37.5 %17.8 %35.5 %22.2 % PEQ31.5 %30.1 %30.7 %32.6 % TES

9 9 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  VTS feature normalization:  Performed in log-FBE domain, (previous to DCT)  Based on a Gaussian mixture model trained with clean speech  Allows feature compensation and uncertainty estimation  Summary of VTS (vector Taylor series approach): 1. Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian 2. The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) 3. An estimation of the clean speech x is then possible 4. An estimation of the uncertainty is also possible

10 10 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g 0, f 0 and h 0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

11 11 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 2: Estimation of P(k|y): is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. where:  Step 3: Estimation of clean speech:

12 12 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 4: Estimation of uncertainty: the uncertainty of the clean speech can be estimated as: and from the estimation of the clean speech: assuming small values of the variance of the noise:

13 13 HIWIRE Meeting – Nancy, 6 -7 June, 2006  Aurora-2 results: Aver. WERRelative improv. BASELINE34.1 %0.0 % VTS + MVN14.0 %58.9 % VTS + MVN + UNCERT.13.5 %60.0 % Model Based Feature Compensation (VTS)  Some considerations about VTS:  Computational load  Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion  Estimation of noise is critical  There are some approximations in the formulation  Uncertainty: small improvement (insert., substit., delet.)  Alternative: model-based compensation based on numerical integration of pdfs

14 14 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

15 15 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Fundamentals of model-based VAD:  Gaussian mixture model in log-FBE domain  Gaussian mixture model trained with clean speech  VTS provides a noisy version of the GMM  From the noisy GMM, P(k|y) can be estimated for each observation y and each Gaussian k  A-priori probability of k th Gaussian being speech P(V|k) can be estimated from the training data  Then, the probability P(V|y) of the noisy observation y being speech is given by:

16 16 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Some considerations about model-based VAD:  VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database)  Not based on energy....  Based on observations in the log-FBE domain  VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs  Computational load

17 17 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Model-based VAD for different SNRs:

18 18 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

19 19 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

20 20 HIWIRE Meeting – Nancy, 6 -7 June, 2006  Aurora-2 recognition results (WAcc): Model-based VAD WFWF+FD G.72957.1 %57.8 % AMR.166.3 %65.0 % AMR.278.3 %78.5 % AFE75.3 %79.0 % VTS-VAD78.4 %80.2 % Baseline: 60.5 % (no VAD, no WF, no FD)

21 HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre


Download ppt "HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre."

Similar presentations


Ads by Google