Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas.

Similar presentations


Presentation on theme: "11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas."— Presentation transcript:

1 11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas

2 22 Outline  Motivation  BBN’s standard training procedure without noise words  Effect of noise words in ML training  Effect of noise words in discriminative training  Conclusions

3 33 Motivation  BBN’s English CTS system does not train with noise words in transcripts  For RT04 non-English CTS systems, we found that using noise words helped –[LAUGH], [NOISE], [SIGH], etc., appear in transcripts used to train non-English ML models –Levantine system: 1.6% gain on unadapted LevAr.Dev04 test –Mandarin system: 1.0% gain on unadapted Man.CTS.Dev04 test  Do these results hold for English? for discriminative training? –Success would simplify the preparation of Fisher training transcripts: no need to change transcripts and re-segment

4 44 Noise Words in English Transcripts  MSU Jan 2000 Switchboard I transcripts include: [laughter], [noise], [vocalized-noise]  For RT02, BBN switched to CU-HTK training transcripts, in which explicit noise words were removed from MSU transcripts –Found no significant difference in performance compared with previous BBN transcripts –Assumed noise words were a no-op, but there were other differences and we did not test which ones helped or hurt  WordWave Fisher transcripts include [LAUGH], [NOISE], [MN], [COUGH], [LIPSMACK], [SIGH]  BBN RT04 CTS English system removes noise words from transcripts and relies on silence HMM to model them.

5 55 Training Procedure without Noise Words  Process training transcripts –Drop utterances containing only noise words –Map noise words to silence  Train initial ML models and generate word alignments  Remove long silences –Using alignment information, chop utterances containing silences longer than two seconds  Train final ML models using the processed transcripts and segmentation

6 66 Effect of Noise Words in ML Training  Comparison experiments, Fisher training –Train ML models using 330 hours of automatically segmented Fisher data with and without noise words in transcripts  Validation experiments, Switchboard training –Train ML models using 180 hours of Switchboard data with noise words: MSU's original transcripts without noise words: CU's processed transcripts  Test models on combined Eval03 and Dev04 test set

7 77 Fisher Training Experiments  Without noise words: train as described 2 slides back  With noise words: use four phonemes to model six noise words; transcripts and segmentation unaltered. Noise WordPhonetic spelling [COUGH]COF-COF [LAUGHTER]LAF-LAF [NOISE]AMN-AMN [MN]BRN-BRN [SIGH]BRN-BRN [LIPSMACK]BRN-BRN

8 88 Fisher Experiment results Noise words in transcripts Unadapted WER Eval03+Dev04 No26.2 Yes25.4  Noise words in acoustic modeling (AM) and language modeling (LM) transcripts give 0.8% WER gain

9 99 Diagnostic Experiments  Is the gain with noise words due to better acoustic or better language modeling?  Expt I: explicit noise words in transcripts but modeled as silences: spell all noise words using the silence phoneme  Expt II: test acoustic models from Expt I using LMs trained on transcripts without noise words Noise words in AM transcripts? Noise phones Noise words in LM transcripts? Unadapted WER Eval03+Dev04 No--No26.2 YesNoiseYes25.4 I YesSilenceYes25.5 II YesSilenceNo25.6

10 10 Diagnostic Experiments, cont’d  Including or excluding noise words from the LM training has no significant effect on performance  Noise words in transcripts improve performance whether they are trained as noise models or as silence ==> Acoustic model initialization improves when noise words are explicitly marked in the transcripts

11 11 ML Training on Switchboard corpus 1.Use 2385 Swbd I conversations (160 hours), processed and segmented by CU from Eval03 training set 2.Same 2385 conversations from original MSU Swbd I Jan 2000 release (180 hours) 3.Apply auto-segmentation process to the MSU version of conversations, produced 180 hours Noise words in training?Segmentation Unadapted WER Eval03+Dev04 NoCU28.4 YesMSU manual27.7 YesBBN auto27.6

12 12 Effects with Discriminative Training  Trained SI-MPE models using baseline 330-hour Fisher ML models as the seed models Noise words in transcripts Unadapted WER Eval03+Dev04 No23.6 Yes23.4  Noise words still yield better models, but the gain is just 0.2%

13 13 Conclusions  Including noise words in transcripts results in better model initialization in acoustic training  Discriminative training procedure overcomes most of the poor initial estimate when noise words are not explicitly marked in the transcripts  We can directly use Fisher transcripts that are output by the BBN / WordWave, i.e. no need to map noise words and resegment


Download ppt "11 Effects of Explicitly Modeling Noise Words Chia-lin Kao, Owen Kimball, Spyros Matsoukas."

Similar presentations


Ads by Google