Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advances in WP1 and WP2 Paris Meeting – 11 febr. 2005 www.loquendo.com.

Similar presentations


Presentation on theme: "Advances in WP1 and WP2 Paris Meeting – 11 febr. 2005 www.loquendo.com."— Presentation transcript:

1 Advances in WP1 and WP2 Paris Meeting – 11 febr. 2005 www.loquendo.com

2 Advances in WP1 Paris Meeting – 11 febr. 2005 www.loquendo.com

3 3 WP1: Environment & Sensor Robustness T1.2 Noise Independence Voice Activity Detection: –A Model-based approach using NN (Neural Networks) to discriminate two classes (noise and voice) will be explored; –NN input could be standard features (Cepstral coeff., Energy) after noise reduction, in case complemented by other features (pitch/voicing) produced by other partners (IRST); –Training set will be multi-style, including several types of noise conditions and languages Noise Reduction: –Some noise reduction techniques will be experimented on the test sets selected as benchmarks for the project: Spectral Subtraction (standard, Wiener and SNR dependent) and Spectral Attenuation (Ephraim-Malah SA standard and SNR dependent) New techniques for non-stationary noises

4 4 WP1: Speech Databases for Noise Reduction Aurora 2 - Connected digits - TIdigits data down sampled to 8 kHz, filtered with a G712 characteristic and noise artificially added at several SNRs (20dB, 15dB, 10 dB, 5dB, 0dB, -5dB). There are three test sets: –A: same noises as in train: subway, babble, car noise, exhibition hall; –B: 4 different noises: restaurant, street, airport, train station; –C: same noises as A but filtered with a different microphone Aurora 3 - Connected digits recorded in car environment - Signal collected by hand free (ch1) and close talk (ch0) microphones. In HIWIRE we use Italian and Spanish recordings. There are two test sets: –WM: ch0 and ch1 recordings used in training and testing lists; –HM: ch0 for training and ch1 for testing Aurora 4 - Continuous speech 5k vocabulary - It is WSJ0 5K with added noise of 6 kinds: Car, Babble, Restaurant, Street, Airport, Train station. It uses the standard Bi-Gram language modeling.

5 5 Spectral Subtraction (SS) operates in the frequency domain and attempts to compute a denoised version of the power spectrum. Wiener spectral subtraction is defined as: where m is time frame, k frequency bin, is an estimate of the noise power spectrum, is noisy power spectrum, is the estimate of clean spectrum,  (m) is noise overestimation and  (m) is flooring. The standard case assumes that flooring and overestimation are constant in time. The best results are obtained with flooring and overestimation parameters dependent on the estimated global Signal-to-Noise Ratio at time m, SNR(m), with piecewise linear functions Denoising Techniques for baseline evaluations

6 Baseline evaluations of Loquendo ASR on Aurora2 speech databases

7 7 Baseline Performance evaluations This test was performed with the Loquendo ASR with the CLEAN / MULTI_CONDITION models trained using the Aurora2 training lists. The test has been done using the A/B/C testing lists. Performances in terms of Word Accuracy and Error Reduction CLEAN ModelsTest ATest BTest CA-B-C RPLP75.677.575.376.3 + Wiener SNR Dep.84.0(34.4)84.4(30.7)83.3(32.4)84.0(32.5) MULTI ModelsTest ATest BTest CA-B-CAvg. RPLP93.591.190.291.984.1 + Wiener SNR Dep.93.9(6.1)92.1(11.2)90.5(3.1)92.5(7.4)88.2(25.8) LASR ModelsTest ATest BTest CA-B-C RPLP80.983.377.681.2 + Wiener SNR Dep.88.1(37.7)88.3(29.9)86.2(38.4)87.8(35.1)

8 Baseline evaluations of Loquendo ASR on Aurora3 speech databases

9 9 Baseline Performance evaluations This test was performed with the Loquendo ASR and the models trained using the Aurora3 training lists. The test has been done using the Well Matched (WM) and High Mismatch (HM) testing lists. Performances in terms of Word Accuracy and Error Reduction Aurora3 ModelsIta WMIta HMSpa WMSpa HM RPLP98.246.697.374.6 + Wiener SNR dep.98.3(5.5)77.5(59.4)97.6(11.1)89.9(60.2) LASR ModelsIta WMIta HMSpa WMSpa HM RPLP-56.4-79.4 + Wiener SNR dep.-74.6(41.7)-84.9(26.6)

10 Baseline evaluations of Loquendo ASR on Aurora4 speech databases (…work in progress)

11 11 WP1: Workplan selection of suitable benchmark databases (m6); Completion of LASR baseline experimentation of Spectral Subtraction (Wiener SNR dependent) (m12) Discriminative VAD (m16) Spectral Attenuation (Ephraim-Malah SA SNR dependent) (m18) Noise estimation and reduction for non-stationary noises (m24)

12 Advances in WP2 Paris Meeting – 11 febr. 2005 www.loquendo.com

13 13 WP2: User Robustness T2.2 Speaker Adaptation Acoustic model adaptation: –Loquendo ASR is based on Hybrid HMM-NN; –Hybrid HMM-NN is an alternative to HMM modeling that exploits the discriminative training of MLP to estimate the acoustic units likelihood; it is also very efficient for open vocabularies; –Differently from HMM, not much has been done in the literature for the adaptation of NN; State-of-art NN adaptation methods: –The Linear Input Network (LIN) method has been proposed for speaker adaptation with promising results [Neto 1996] [Mana 2002] –The principle of LIN adaptation is to learn through error back-propagation the parameters of a linear input space transformation; –The speaker independent acoustic model (MLP) is kept fixed; Innovative NN adaptation methods: –Other innovative techniques for NN adaptation will be proposed and experimented, including regularization techniques and rotations of NN hidden units activations

14 14 LOQUENDO Activity in the first year The first activity has been the selection of suitable benchmark databases: WSJ0 Adaptation component and WSJ1 Spoke-3 component The second activity has been the set up of experimental baselines for these databases, with standard LASR and without adaptation In the meantime, LIN adaptation method has been implemented and experimentations on the benchmarks are under way and will be presented at M12;

15 15 Speech Databases for Speaker Adaptation WSJ0: (standard ARPA, 1993, LDC, 1000$) –Large vocabulary (5K words) continuous speech database –Test Set: 8 speakers, ~40 utterances, read speech, bigram LM –Adaptation set: the same 8 speakers, 40 utterances each WSJ1: (1994,LDC, 1500$) –Similar to WSJ0, same vocabulary and LM –SPOKE-3: standard case study of adaptation to non-native speakers –10 speakers, 40 adaptation utterances, 40 test utterances Hiwire Non-Native Speaker database: –Collected within the project; –80 speakers, each reads 100 utterances

16 16 WSJ0 baseline WSJ0 SI Test Set is made up by 8 speakers and ~40 sentences for each speaker (two microphones: WV1: Sennheiser; WV2: others) Vocabulary: 5K words, with a standard bigram LM The Adaptation component of WSJ0 is made up by the same 8 speakers of SI test, with 40 adaptation sentences for each of them; Only the component of adaptation and test set with the coherent microphone (Sennheiser -WV1) has been employed Adaptation Model Spk: WV1_440 Spk: WV1_441 Spk: WV1_442 Spk: WV1_443 Spk: WV1_444 Spk: WV1_445 Spk: WV1_446 Spk: WV1_447 Average No Adaptation83.679.080.787.179.782.288.582.082.8

17 17 WSJ1 – SPOKE-3 baseline Spoke-3 is the standard WSJ1 case study to evaluate adaptation to non-native speakers There are 10 non-native speakers For each of them there are 40 adaptation sentences and ~40 test sentences Vocabulary is 5K words, with standard bigram LM Standard LASR for US-english has been used Adaptation Model 4N04N14N34N44N54N84N94NA4NB4NCAverage No Adaptation 19.824.334.029.056.277.771.171.360.657.849.7 THE FEMALE PRODUCES A LITTER OF TWO TO FOUR YOUNG IN NOVEMBER AND DECEMBER

18 18 Workplan Selection of suitable benchmark databases (m6) Baseline set-up for the selected databases (m8) LIN adaptation method implemented and experimented on the benchmarks (m12) Regularization methods implemented and experimented on the benchmarks (m12) Innovative NN adaptation methods for acoustic modeling (m24)


Download ppt "Advances in WP1 and WP2 Paris Meeting – 11 febr. 2005 www.loquendo.com."

Similar presentations


Ads by Google