Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.

Similar presentations


Presentation on theme: "Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT."— Presentation transcript:

1 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT - KULeuven

2 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 The Goal of Speech Recognition may be defined as “ to allow for a natural and non-intrusive user option between either speech input or text input for any application “

3 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Applications of Speech Recognition INFORMATION RETRIEVAL –GOAL: To understand a spoken query –ROLE of ASR: “speech recognition” is a tool, full accuracy may not be required as long as the essential elements are understood TRANSCRIPTION –GOAL: To make a written version of a spoken document –ROLE of ASR: “speech recognition” is a goal by itself AUDIO MINING, SPEECH TRANSLATION –Are in theory different from their textual cousins –Use in practice a combination of ‘speech recognition for transcription’ and text mining, text translation,..

4 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Specialization in Speech Recognition Is natural because the requirements depend on the application area Is required because speech recognition technology is still far from perfect Is achieved by limiting –Vocabulary size –Task complexity –Speaking style –Acoustic variability –Speaker variability –Language variability

5 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 2 20 200 2000 20000 Speech Recognition Problems Spontaneous Fluent Read Connected Isolated Voice commands Directory assistance Office dictation Natural conversation Name dialing More difficult Dialog systems Vocabulary size Speaking style

6 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Gradually increasing task complexity in research projects Small vocabulary (1970-1980’s) –isolated words, spelling –digit strings (TIdigits) –discrete dictation Towards large vocabulary read speech (1987-1996) –medium vocabulary continuous tasks (RM, ATIS) –large vocabulary read speech (WSJ) Towards unconstrained speech (1996-…) –Transcription of Broadcast news (ABN) –Mixed environmental conditions –Mixed speaking styles & speakers –Spontaneous speech (Switchboard, CallHome)

7 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Research Benchmarks Humans vs Machines

8 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Why are we shifting the evaluation paradigm ? We prefer to do research on what is difficult, but not on what is impossible The older (easier) tasks are more artificial in nature and reflect insufficiently well how speech recognition will perform in real life Progress is hard to measure, when the task performance is getting reasonable. Industry should take over at this point. Implicit over-training on specific test material

9 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Transcription vs. Speech Recognition Speech recognition always provides a transcription, but the requirements put onto that transcription may vary: –Conceptual / Keyword Transcription –Textual transcription –Transcription of non-verbal events Hesitations, background noise … Speaker turns … Intonation, hidden intentions … –Edited / Normalized transcription Correction of restarts, misspeaks, … Grammatical corrections Shortened transcription for close-captioning Markup for document layout

10 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Applications of ‘Transcription per se’ Transcription of ‘Dictated Speech’ –Document Generation Commercial packages available for public at large and specific professional groups (doctors, lawyers) Cooperative users, controlled acoustic conditions Works well if sufficiently close to ‘read speech’ Transcription of ‘Available Speech’ –Meeting Transcription Examples: parliament, court hearings, … Spontaneous & non-formal, Multi-speaker Attention required to non-verbal information –Broadcast transcription Multi-speaker & multi environment Read speech & spontaneous dialogues Attention required to specific usage

11 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 2 20 200 2000 20000 “Transcription of Natural Speech” Spontaneous Fluent Read Connected Isolated Voice commands Directory assistance Office dictation Natural conversation Name dialing More difficult Dialog systems Vocabulary size Speaking style

12 Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Research Projects world-wide DARPA sponsored projects on ‘broadcast transcription’ –Started in 1995 –Common test sets –< 10 labs typically participate in the official benchmark tests –English language dominated ‘Other’ projects –Many ‘national’ projects focusing on local languages –Focus on different aspects of the ‘transcription problem’ –Examples today: ALERT DRUID ATRANOS


Download ppt "Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT."

Similar presentations


Ads by Google