Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik.

Similar presentations

Presentation on theme: "Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik."— Presentation transcript:


2 Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik

3 Manual transcriptions Transcriptions made by SPEX: –orthographic transcriptions –transcriptions on chunk level (2-3 sec.) Formats: –*.Textgrid  praat –xml-derivatives: *.pri – no time information *.skp – time information

4 Manual transcriptions Total amount of transcribed matches on ftp-site (including the demo matches): Dutch: 6 matches German: 21 matches English: 3 matches Extensions: Dutch (_N), German (_G), English (_E)

5 Automatic speech recognition 1.Acoustic preprocessing Acoustic signal  features 2. Speech recognition Acoustic models Language models Lexicon

6 Automatic transcriptions Problem of recorded data: Commentaries and stadium noise are mixed  Very high noise levels  Recognition of such extreme noisy data is very difficult

7 Examples of data Yug-Ned match Dutch English German “op _t ogenblik wordt in dit stadion de opstelling voorgelezen” “and they wanna make the change before the corner” “und die beiden Tore die die Hollaender bekommen hat haben”

8 Examples of data Eng-Dld match Dutch English German “geeft nu een vrije trap in _t voordeel van Ince” “and phil neville had to really make about three yards to stop pulling it down and playing it” “wurde von allen englischen Zeitungen aus der Mannschaft”

9 Evaluation of aut. transcriptions insertions+deletions+substitutions number of words WER(%) =  WER can be larger than 100% !

10 WERs (all words) DutchEnglishGerman Yug-Ned84.5 77.4 Eng-Dld83.283.390.8

11 WERs (player names) DutchEnglishGerman Yug-Ned names 84.5 53.0 84.5 48.2 77.4 40.9 Eng-Dld names 83.2 55.0 83.3 56.2 90.8 77.4

12 WERs versus SNR DutchEnglishGerman Yug-Ned SNR 84.5 9 84.5 12 77.4 19 Eng-Dld SNR 83.2 8 83.3 11 90.8 7

13 Automatic transcriptions The language model (LM) and lexicon (lex) are adapted to a specific match Start with a general LM and lex Add player names of the specific match Expand the general LM and lex when more data is available

14 WERs for various amounts of data

15 Oracle experiments - ICLSP’02 Due to limited amount of material we started off with oracle experiments: Language models are trained on target match Acoustic models are trained on part of target match or other match  Much lower WERs

16 Summary of results Acoustic model training: Leaving out non-speech chunks does not hurt recognition performance Using more training data is benificial, but more important: The SNRs of the training and test data should be matched

17 Summary of results WERs are SNR-dependent (tested on Yug-Ned match)

18 Summary of results Split words into categories, i.e. function words, content words and football player’s names: WER function words > WER content words > WER names (tested on Yug-Ned match)

19 Summary of results Noise reduction tool (FTNR)  small improvement

20 Ongoing work Techniques to lower WERs Tuning of the generic language model – Defining different classes – Reduction of OOV words in lexicon and in the language model (using more material) Speaker Adaptation in HTK (note: all other experiments are being carried out using Phicos)

21 Ongoing work Noise robustness Extension of the acoustic models by using double deltas. Histogram Normalization and FTNR. SNR dependent acoustic models.

22 Recommendations Acoustic modeling Record commentaries and stadium noise separately Speaker adaptation: - Transcribe characteristics of commentator - Collect more speech data of commentator

23 Recommendations Lexicon and language modeling Collect orthographic transcriptions of spoken material, instead of written material - Subtitles - Close captions

Download ppt "Speech recognition in MUMIS Judith Kessens, Mirjam Wester & Helmer Strik."

Similar presentations

Ads by Google