Presentation is loading. Please wait.

Presentation is loading. Please wait.

S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,

Similar presentations


Presentation on theme: "S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,"— Presentation transcript:

1 S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere, Jean-Pierre Martens, Vincent Vandeghinste, Frank Van Eynde, Erik Tjong Kim Sang, Walter Daelemans

2 S1S1 S2S2 S3S3 8 October 20022DARTS 2002 Outline l Project overview l Tasks + results l Conclusions

3 S1S1 S2S2 S3S3 8 October 20023DARTS 2002 ATraNoS l Automatic Transcription and Normalization of Speech l IWT-STWW TOP project, 2x2years, €1.25M l Started 1 October 2000 l Partners: ESAT/KULeuven, ELIS/UGent, CCL/KULeuven, CNTS/UIA

4 S1S1 S2S2 S3S3 8 October 20024DARTS 2002 Project aims l Automatic transcription of spontaneous speech l Conversion of transcriptions according to application, e.g. subtitling (test vehicle in this project)

5 S1S1 S2S2 S3S3 8 October 20025DARTS 2002 Work packages l WP1: segmentation of audio stream in homogeneous segments (ELIS): –preprocessor for speech decoder –segments containing single type of signal (wideband speech, telephone speech, background, etc.) –label segments, cluster speakers –induce only small delay

6 S1S1 S2S2 S3S3 8 October 20026DARTS 2002 WP1 Results Speech/non-speech segmentation using GMM’s (Gaussian Mixture Models) l 65% of the non-speech removed while preserving more than 98% of the speech. l mean duration of the speech segments is 40 seconds (already easy to handle) l performance in accordance with literature

7 S1S1 S2S2 S3S3 8 October 20027DARTS 2002 WP1 Results Segmentation of speech segments using BIC (Bayesian Information Criterion) l Recall = 65%: detection of 72.5% of the speaker changes, 24.3% of the acoustic condition changes, 19.0% false alarms l Recall = 72%: detection of 78.5% of the speaker changes, 37.4% of the acoustic condition changes, 41.3% false alarms l Results competitive with literature l Very fast algorithm (1 minute per hour)

8 S1S1 S2S2 S3S3 8 October 20028DARTS 2002 Work packages (cont’d) l WP2: detection and handling of OOV words: –extension of the lexicon (CCL): compounding module  reduce OOV rate –augment recognition results with confidence measures (ESAT): OOV detection –phoneme-to-grapheme conversion (CNTS): transcribe OOV words

9 S1S1 S2S2 S3S3 8 October 20029DARTS 2002 Architecture Speech Recognizer input: speech output: text Confidence threshold Suspected OOV Phoneme Recognizer Phoneme string P2G Converter Spelling Spelling correction with large vocabulary Training Data

10 S1S1 S2S2 S3S3 8 October 200210DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Compounding module in combination with ASR: recognition accuracy does not drop because of shorter lexical units; after recomposition: 10 to 20% relative improvement on OOV-rate, compared with baseline

11 S1S1 S2S2 S3S3 8 October 200211DARTS 2002 WP2 Results Detection and handling of Out-Of-Vocabulary (OOV) words l Confidence measures with ASR: based on combination of measures from literature, plus own work l Phoneme-to-grapheme conversion based on machine learning methods

12 S1S1 S2S2 S3S3 8 October 200212DARTS 2002 P2G converter results Performance: all wordsOOVs grapheme-level75.963.8 word-level44.0 7.6 Spelling correction:Net effect: 8.6 (OOVs) (Simulated) interaction with speech recognizer: Increases WER, but improves readability

13 S1S1 S2S2 S3S3 8 October 200213DARTS 2002 Work packages (cont’d) l WP3: spontaneous speech problems: –detection of disfluencies (ELIS): use acoustic/prosodic features; supply info to HMM recognizer –statistical language model (ESAT): extend traditional trigram LM to incorporate hesitations, filled pauses, self-corrections, repetitions  sequence of clean speech islands.

14 S1S1 S2S2 S3S3 8 October 200214DARTS 2002 Work packages (cont’d) l WP4: subtitling: –data collection and automatic alignment (CNTS) –input/output specifications (CCL): linguistic characteristics –subtitling: statistical approach (CNTS) –subtitling: linguistic approach (CCL) –hybrid system possible?

15 S1S1 S2S2 S3S3 8 October 200215DARTS 2002 Data collection and alignment News autocuesSubtitles (semi-)automatic alignment (semi-)automatic data capture Machine Learner Training Data Linguistic Annotation Classifier autocues subtitles

16 S1S1 S2S2 S3S3 8 October 200216DARTS 2002 Conclusions


Download ppt "S1S1 S2S2 S3S3 8 October 2002 DARTS 2002 1 ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,"

Similar presentations


Ads by Google