Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,

Slides:



Advertisements
Similar presentations
Phonetics as a scientific study of speech
Advertisements

Audio-based Emotion Recognition for Advanced Information Retrieval in Judicial Domain ICT4JUSTICE 2008 – Thessaloniki,October 24 G. Arosio, E. Fersini,
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING NLP-AI IIIT-Hyderabad CIIL, Mysore ICON DECEMBER, 2003.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
MULTI LINGUAL ISSUES IN SPEECH SYNTHESIS AND RECOGNITION IN INDIAN LANGUAGES NIXON PATEL Bhrigus Inc Multilingual & International Speech.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Producing Emotional Speech Thanks to Gabriel Schubiner.
1 Problems and Prospects in Collecting Spoken Language Data Kishore Prahallad Suryakanth V Gangashetty B. Yegnanarayana Raj Reddy IIIT Hyderabad, India.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
Natural Language Understanding
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
14: THE TEACHING OF GRAMMAR  Should grammar be taught?  When? How? Why?  Grammar teaching: Any strategies conducted in order to help learners understand,
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.
Data collection and experimentation. Why should we talk about data collection? It is a central part of most, if not all, aspects of current speech technology.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.
Language Technology I © 2005 Hans Uszkoreit Language Technology I 2005/06 Hans Uszkoreit Universität des Saarlandes and German Research Center for Artificial.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
A prosodically sensitive diphone synthesis system for Korean Kyuchul Yoon Linguistics Department The Ohio State University.
Natural Language Processing Daniele Quercia Fall, 2000.
University of Maribor Faculty of Electrical Engineering and Computer Science AST ’04, July 7-9, 2004 Slovenian Lexica and Corpora in the Scope of the LC-STAR.
THE NATURE OF TEXTS English Language Yo. Lets Refresh So we tend to get caught up in the themes on English Language that we need to remember our basic.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Introduction to Computational Linguistics
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
A Fully Annotated Corpus of Russian Speech
Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.
S PEECH T ECHNOLOGY Answers to some Questions. S PEECH T ECHNOLOGY WHAT IS SPEECH TECHNOLOGY ABOUT ?? SPEECH TECHNOLOGY IS ABOUT PROCESSING HUMAN SPEECH.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Language in Cognitive Science. Research Areas for Language Computational models of speech production and perception Signal processing for speech analysis,
Basics of Natural Language Processing Introduction to Computational Linguistics.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
INTRODUCTION TO APPLIED LINGUISTICS
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
How can speech technology be used to help people with disabilities?
Neural Machine Translation
Automatic Speech Recognition
Artificial Intelligence for Speech Recognition
Course Projects Speech Recognition Spring 1386
3.0 Map of Subject Areas.
Why Study Spoken Language?
Why Study Spoken Language?
Advanced NLP: Speech Research and Technologies
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics, Institute of Linguistics, Adam Mickiewicz University in Poznań Humboldt-Kolleg, Słubice November 2008

Spoken Language Technologies: Introduction (1) The need for and increasing interest in SLT systems:  oral information is more efficient than a written message  speech is the easiest and fastest way of communication (man – man, man – machine) Progress in the field:  technological advances in computer science  availability of specialized speech analysis and processing tools  collection and management of large speech corpora  investigation of acoustic dimensions of speech signals fundamental frequency (F0), duration, intensity and spectral characteristicsIntroduction

Spoken Language Technologies: Introduction (2) Speech synthesis (TTS, text-to-speech) systems  generate speech signal for a given input text  example: BOSS (Polish module developed at Dept. of Phonetics in cooperation with IKP, Uni Bonn)  ECESS (European Centre of Excellence in Speech Synthesis): standards of development of language resources, tools, modules and systems Automatic speech recognition (ASR) systems  provide text of the input speech signal  example: Jurisdic (first Polish ASR system for needs of Police, Public Prosecutors and Administration of Justice) The tasks of SLT systems (TTS and ASR)

Spoken Language Technologies: Application areas Application areas Speech synthesis  telecommunications (access to textual information over the telephone)  information retrieval  measurement and control systems  fundamental & applied research on speech and language  a tool of communication e.g. for the visually handicapped Speech recognition & related technologies  text dictation  information retrieval & management AZAR  man machine communication (together with speech synthesis): - dialogue systems, - speech-to-speech translation, - Computer Assisted Language Learning, CALL (e.g. the AZAR tutoring system developed in the scope of the EURONOUNCE project)

Spoken Language Technologies: Performance of TTS and ASR systems Performance Speech synthesis  high intelligibility and naturalness in limited domains (e.g. broadcasting news) Speech recognition  the best results for small vocabulary tasks  the state-of-the-art speaker-independent LVCSR systems achieve a word-error rate of 3% Generally, the output quality is high as regards generation/recognition of the linguistic propositional content of speech

Limitations Spoken Language Technologies: Limitations of TTS and ASR systems  insufficient knowledge about methods for processing the non-verbal content of speech i.e. affective information – speaker’s attitude, emotional state, mood, interpersonal stances & personality traits Speech synthesis  lack of variability in speaking style which encodes affective information can be detrimental to communication (e.g. in speech-to- speech translation)  data-driven approach to conversational, expressive speech synthesis is inflexible and quite costly Speech recognition  transcription of conversational and expressive speech – substantially higher word-error rate

Humboldt-Kolleg, Słubice November 2008 Progress  the need of modeling the non-verbal content of speech i.e. affective information Applications:  high-quality conversational and emotional speech synthesis (for dialogue or speech-to-speech translation systems)  commerce – monitoring of the agent-customer interactions, information retrieval and management (e.g. QA5)  public security, criminology – secured area access control (speaker verification), truth-detection invesitgation (e.g. Computer Voice Stress Analyzer, Layered Voice Analysis) Spoken Language Technologies: Progress in the field (1)

Humboldt-Kolleg, Słubice November 2008 Progress Spoken Language Technologies: Progress in the field (2) Prosodic features: fundamental frequency (F0 – the central acoustic variable that underlies intonation), intensity, duration and voice quality -> encoding and decoding of affective information Emotion: Anger, Fear, Elation higher mean F0 higher F0 variability higher intensity increased speaking rate Emotion: Sadness, Boredom lower mean F0 lower F0 variability lower intensity decreased speaking rate Intonation models: hierarchical, sequential, acousitc-phonetic, phonological, etc. linguistic variation – well handled affective, emotional variation – unaccounted for

The comprehensive intonation model: Components  a module of F0 contour analysis  a module of F0 contour synthesis  description of intonation discrete tonal categories (higher-level, access to the meaning of the utterance) acoustic parameters (low-level) intonation description F0 generation (decoding) analysis (encoding)

The comprehensive intonation model: Analysis and Synthesis Automatic analysis of F0 contours Summary results comparable to inter-labeler consistency in manual annotation of intonation high accuracy achieved using small vectors of acoustic features statistical modeling techniques application: 1) automatic labeling of speech corpora, 2) lexical & semantic content, 3) ambiguous parses, 4) estimation of F0 targets Automatic synthesis of F0 contours Summary estimation of F0 values with a regression model results comparable to those reported in the literature natural (similar to the original ones) F0 contours for synthesis of a high quality and comprehensible speech (confirmed in perception tests)

Audio (1): Mean opinion in the perception test: no audible difference The comprehensive intonation model: Synthesis example (1)

The comprehensive intonation model: Synthesis example (2) Audio (2): Mean opinion in the perception test: very good quality

Humboldt-Kolleg, Słubice November 2008 Future research  contribution from other knowledge domains (psychology)  affective speech data collection  classification of affective states  types of acoustic parameters  measurement of affective inferences Spoken Language Technologies: Future research issues Extensive and systematic investigation of the mechanisms in voice production and perception of affective speech:

THANK YOU FOR YOUR ATTENTION!