Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.

Similar presentations


Presentation on theme: "1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering."— Presentation transcript:

1 1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering University of Washington Presenter: Shih-Hsiang( 士翔 ) ICSLP 2002

2 2 Introduction  The performance of ASR systems often decreases dramatically when the noise level increases The degradation is minor when the signal-to-noise ratio (SNR) is high, but quite significant at low SNR level  In the past, a variety of techniques have been proposed Principle component analysis and a discriminative neural network (Ellis et al. 2001) Missing Data theory (Cooke et al. 2001) Voice activity detector (VAD) and variable frame rate are used to drop noisy feature vector to reduce insertion error (John et al. 2001) Nonlinear spectral subtraction, noise masking, feature filters, and model adaptation (Lieb et al. 2001) data-driven temporal filters, on-line mean and variance normalization, voice activity detection, and server side discriminate features are integrated together to improve noise robustness (Morgan et al. 2001) … etc

3 3 Literature Review  Ellis et al. 2001  John et al. 2001 Variable frame rate processing  An observation vector is discarded if it does not differ much from the previous observation vector. In our implementation of VFR, frame-to-frame variation is estimated as the Euclidean norm of the sub-vector corresponding to the delta- cepstrum. Voice activity detection

4 4 Literature Review  Morgan et al. (2001)

5 5 Proposed method  The first step is standard mean subtraction (MS)  The second step is variance normalization (VN)  The third step is auto-regression moving average (ARMA) feature vector (cepstral coefficient) the order of the ARMA filter

6 6 Choosing a proper order M of the filter There are zeros in the frequency response of the ARMA filter is approximately proportional to its order It support that a large M will perform poorly since it could filter out important speech information The transfer function is: The frequency response of the ARMA filter of order M is:

7 7 Gain and phase shifts of the ARMA filter

8 8 The time sequences of the cepstral coefficient c1 for the digit string 5376869 corrupted with different levels of noises

9 9 Evaluation  Evaluate on Aurora 2.0 noisy digits database Two training sets and three test sets  Training sets : clean training set only / multi-condition speech  Test sets: stationary-noise sets / non-stationary-noise sets / convolutional noise 7 different levels of noises  Clean, 20dB, 15 dB, 10dB, 5dB, 0dB, -5dB  Recognizer Simple HMM-based system using whole-word models  Zero ~ Nine and Oh : 16 states per word, 3 mixture Gaussian per state  silence : 3-states

10 10 Recognition results Top: multi-condition training Bottom: clean training Word accuracies (as percentages)

11 11 A comparison of different orders of the ARMA filtering  A small M will retain the short-term cepstral information but is more vulnerable to noise  A large M will make the processed features less corrupted by noise, but the short-term cepstral information will be lost. Top: multi-condition training Bottom: clean training

12 12 Test the effectiveness of proposed technique  The results show that while variance normalization and mean subtraction improves performance over the baseline, the addition of the ARMA filter provides significant further improvements

13 13 Comparison of different filter  causal ARMA filter  non-causal MA filter  causal MA filter

14 14 Comparison of different filter (cont.) Top: multi-condition training Bottom: clean training


Download ppt "1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering."

Similar presentations


Ads by Google