Speech recognition in MUMIS Eric Sanders (KUN) March 2003
People involved at KUN Helmer Strik Judith Kessens Mirjam Wester Janienke Sturm Eric Sanders Febe de Wet Paul Tielen
Overview Speech data Baseline recognition Adding data Noise robustness Word types Conclusions
Examples of Data Dutch “op _t ogenblik wordt in dit stadion de opstelling voorgelezen” English “and they wanna make the change before the corner” German “und die beiden Tore die die Hollaender bekommen hat haben” From Yugoslavia-The Netherlands
Speech Data All data LanguageDutchEnglishGerman # matches6321 # words40,29634,684127,265
Speech Data MatchDutchEnglishGerman Yugoslavia – The Netherlands5,92210,1883,998 England – Germany5,79813,4887,280 Test data (#words)
Baseline recognition PMs:- trained on the other test match Lex:- based on the other test set - match specific words added LM: - category LM - based on the other test match - match specific words added
Baseline recognition
Adding Data Extra training data: Dutch = 4 matches German = 19 matches English = 1 match Adding training data to train the lexicon and the language models (phone models trained on 1 match)
Adding Data (German)
Noise Robustness Dutch English German
Noise Robustness
Matching acoustic properties of train and test material Training SNR dependent phone models Applying noise robust feature extraction: Histogram Normalisation & FTNR Possible solutions:
Noise Robustness YUG-NL, very noisy
Word Types Not all words are equally important for an information retrieval task Categories: - function words (prepositions, pronouns) - application specific words (player names) - other content words WERs for different categories
Word Types
Conclusions SNR values explain the WERs to a large extent More data is not necessarily better Applying noise robust features leads to best results Overall WERs are very high, but application specific words are recognised relatively well
The end