SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk.

Slides:



Advertisements
Similar presentations
1 Multimodal Technology Integration for News-on-Demand SRI International News-on-Demand Compare & Contrast DARPA September 30, 1998.
Advertisements

Speech Synthesis Markup Language SSML. Introduced in September 2004 XML based Assists the generation of synthetic speech Specifies the way speech is outputted.
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
Communicating with Robots using Speech: The Robot Talks (Speech Synthesis) Stephen Cox Chris Watkins Ibrahim Almajai.
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
5-Text To Speech (TTS) Speech Synthesis
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
Course Overview Lecture 1 Spoken Language Processing Prof. Andrew Rosenberg.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Modern speech synthesis: communication aid personalisation Sarah Creer Stuart Cunningham Phil Green Clinical Applications of Speech Technology University.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
A Text-to-Speech Synthesis System
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Building High Quality Databases for Minority Languages such as Galician F. Campillo, D. Braga, A.B. Mourín, Carmen García-Mateo, P. Silva, M. Sales Dias,
Arabic TTS (status & problems) O. Al Dakkak & N. Ghneim.
Speech & Language Modeling Cindy Burklow & Jay Hatcher CS521 – March 30, 2006.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.

7-Speech Recognition Speech Recognition Concepts
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Graphophonemic System – Phonics
Prepared by: Waleed Mohamed Azmy Under Supervision:
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
CSA2050 Introduction to Computational Linguistics Lecture 3 Examples.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
Segmental encoding of prosodic categories: A perception study through speech synthesis Kyuchul Yoon, Mary Beckman & Chris Brew.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Towards optimal TTS corpora CADIC Didier BOIDIN Cedric D'ALESSANDRO Christophe.
Page 1 NOLISP, Paris, May 23rd 2007 Audio-Visual Audio-Visual Subspaces Audio Visual Reduced Audiovisual Subspace Principal Component & Linear Discriminant.
Audio-visual Speaker association Zhijie Shao Master of Computer Science Supervisor: Trent Lewis.
Introduction to Computational Linguistics
© 2013 by Larson Technical Services
A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
The role of prosody in dialect authentication Simulating Masan dialect with Seoul speech segments Kyuchul Yoon Division of English, Kyungnam University.
Dialect Simulation through Prosody Transfer: A preliminary study on simulating Masan dialect with Seoul dialect Kyuchul Yoon Division of English, Kyungnam.
Basics of Natural Language Processing Introduction to Computational Linguistics.
Chapter 1 Introduction PHONOLOGY (Lane 335). Phonetics & Phonology Phonetics: deals with speech sounds, how they are made (articulatory phonetics), how.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
G. Anushiya Rachel Project Officer
Text-To-Speech System for English
Speech and Language Processing
Dialog Design 4 Speech & Natural Language
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Informatique et Phonétique
How (Not) to Add Laughter to Synthetic Speech
Indian Institute of Technology Bombay
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

SIPCom 8-4 Speech Processing, MM7 - Speech Synthesis - Speech Recognition (Part 1 of 3) Børge Lindberg lindberg@kom.aau.dk

Text-to speech Synthesis Text analysis Prosody generation Sound generation Synthetic speech Lexicon & Rules Pitch & duration (stød) Diphone-database

Why is it so difficult ? Text nomalisation Morphological analysis “kl 12-14”, “8-3=5”, “8-4-1997”, “mio”, “USA” Morphological analysis “periferien” vs. “skoleferien”, “hul” Syntactic analysis “en mand med hul røst dør bag en dør med hul i” Semantic analysis “The man fed her dog biscuits” Sound generation Transitions, time- and pitch scaling

Concatenative synthesis test = /tEsd/ = /#t/ + /tE/ + /Es/ + /sd/ + /d#/ /#t/ /tE/ /Es/ /sd/ /d#/

Di-(tri)phone Database database of male speaker Approx. 2600 subword units (di- & triphones) Requires pitch-, di- and triphone segmentation

Input to the sound generator

Effect of scaling No scaling Time scaled + pitch scaled + energy + stød

More examples Normal High speaking rate, normal pitch (aalb.wav) High speaking rate, normal pitch (fast.wav) Low speaking rate, normal pitch (slow.wav) Normal speaking rate, high pitch (light.wav) Normal speaking rate, low pitch (dark.wav)

Evaluation - intelligibility 32 test persons 156 stimuli in carrier sentence: “Det er <keyword>, de siger“

Evaluation - naturalness 32 test persons 155 stimuli