Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,

Similar presentations


Presentation on theme: "Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,"— Presentation transcript:

1

2 Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań Humboldt-Kolleg, Słubice 13.-15. November 2008

3 Spoken Language Technologies: Introduction (1) The need for and increasing interest in SLT systems:  oral information is more efficient than a written message  speech is the easiest and fastest way of communication (man – man, man – machine) Progress in the field:  technological advances in computer science  availability of specialized speech analysis and processing tools  collection and management of large speech corpora  investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristicsIntroduction

4 Spoken Language Technologies: Introduction (2) Speech synthesis (TTS, text-to-speech) systems  generate speech signal for a given input text  example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)  ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems Automatic speech recognition (ASR) systems  provide text of the input speech signal  example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice) The tasks of SLT systems (TTS and ASR)

5 Spoken Language Technologies: Application areas Application areas Speech synthesis  telecommunications (access to textual information over the telephone)  information retrieval  measurement and control systems  fundamental & applied research on speech and language  a tool of communication e.g. for the visually handicapped Speech recognition & related technologies  text dictation  information retrieval & management AZAR  man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project)

6 Spoken Language Technologies: Performance of TTS and ASR systems Performance Speech synthesis  high intelligibility and naturalness in limited domains (e.g. broadcasting news) Speech recognition  the best results for small vocabulary tasks  the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3% Generally, the output quality is high as regards generation/recognition of the linguistic propositional content of speech

7 Limitations Spoken Language Technologies: Limitations of TTS and ASR systems  insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits Speech synthesis  lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to- speech translation)  data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly Speech recognition  transcription of conversational and expressive speech – substantially higher word-error rate

8 Humboldt-Kolleg, Słubice 13.-15. November 2008 Progress  the need of modeling the non-verbal content of speech i.e. affective information Applications:  high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)  commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)  public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis) Spoken Language Technologies: Progress in the field (1)

9 Humboldt-Kolleg, Słubice 13.-15. November 2008 Progress Spoken Language Technologies: Progress in the field (2) Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality -> encoding and decoding of affective information Emotion: Anger, Fear, Elation higher mean F0 higher F0 variability higher intensity increased speaking rate Emotion: Sadness, Boredom lower mean F0 lower F0 variability lower intensity decreased speaking rate Intonation models: hierarchical, sequential, acousitc-phonetic, phonological, etc. linguistic variation – well handled affective, emotional variation – unaccounted for

10 The comprehensive intonation model: Components  a module of F0 contour analysis  a module of F0 contour synthesis  description of intonation discrete tonal categories (higher-level, access to the meaning of the utterance) acoustic parameters (low-level) intonation description F0 generation (decoding) analysis (encoding)

11 The comprehensive intonation model: Analysis and Synthesis Automatic analysis of F0 contours Summary results comparable to inter-labeler consistency in manual annotation of intonation high accuracy achieved using small vectors of acoustic features statistical modeling techniques application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets Automatic synthesis of F0 contours Summary estimation of F0 values with a regression model results comparable to those reported in the literature natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests)

12 Audio (1): Mean opinion in the perception test: no audible difference The comprehensive intonation model: Synthesis example (1)

13 The comprehensive intonation model: Synthesis example (2) Audio (2): Mean opinion in the perception test: very good quality

14 Humboldt-Kolleg, Słubice 13.-15. November 2008 Future research  contribution from other knowledge domains (psychology)  affective speech data collection  classification of affective states  types of acoustic parameters  measurement of affective inferences Spoken Language Technologies: Future research issues Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

15 THANK YOU FOR YOUR ATTENTION!


Download ppt "Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,"

Similar presentations


Ads by Google