IIT Bombay ISTE, IITB, Mumbai, 28 March, 2003 1 SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.

Slides:



Advertisements
Similar presentations
ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,
Advertisements

Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Speech Synthesis December 4, 2014 Gentle Reminders Final exam: Friday, December 12 th, 3:30 – 5:30 pm In this room! Final exam review: Wednesday, December.
5-Text To Speech (TTS) Speech Synthesis
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.
EE 225D, Section I: Broad background Synthesis/vocoding history (chaps 2&3) Recognition history (chap 4) Machine recognition basics (chap 5) Human recognition.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
1st and 2nd Generation Synthesis
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
1 Interspeech Synthesis of Singing Challenge, Aug 28, 2007 Formant-based Synthesis of Singing Sten Ternström and Johan Sundberg KTH Music Acoustics, Speech.
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Synthetic Audio A Brief Historical Introduction Generating sounds Synthesis can be “additive” or “subtractive” Additive means combining components (e.g.,
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
A PRESENTATION BY SHAMALEE DESHPANDE
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
Source/Filter Theory and Vowels February 4, 2010.
Speech synthesis Recording and sampling Speech recognition Apr. 5
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Phonetics: the generation of speech Phonemes “The shortest segment of speech that, if changed, would change the meaning of a word.” hog fog log *Phonemes.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
Chapter 7 SPEECH COMMUNICATIONS
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Speech Synthesis April 12, 2013 Speech Synthesis: A Basic Overview Speech synthesis is the generation of speech by machine. The reasons for studying.
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Acoustic Phonetics 3/14/00.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
G. Anushiya Rachel Project Officer
Text-To-Speech System for English
Automated Detection of Speech Landmarks Using
Dialog Design 4 Speech & Natural Language
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Indian Institute of Technology Bombay
Presentation transcript:

IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03

IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech units Sentences & phrases Words Syllables Phonemes Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features Intensity variation Pitch variation Phonemic features Articulatory Acoustic Perceptual

IIT Bombay ISTE, IITB, Mumbai, 28 March, Classification of phonemes Vowels Pure vowels Diphthongs Consonants Semivowels Whisper Stops Nasals Fricatives Affricates

IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech production system

IIT Bombay ISTE, IITB, Mumbai, 28 March, Schematic of speech production

IIT Bombay ISTE, IITB, Mumbai, 28 March, Vovel spectrum

IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech synthesis Generation of speech by a machine Applications Voice response systems (limited vocabulary) Text-to-speech synthesis (unlimited vocabulary) Analysis-by-synthesis (speech research) Generation of speech-like test signals Analysis-synthesis systems * channel capacity reduction* secure commn. * speech enhancement * voice transformation * processing for hearing aids

IIT Bombay ISTE, IITB, Mumbai, 28 March, Development of speech synthesizers Mechanical / electro-mechanical ( ) Electronic analog with key-board input (1930’s) Electronic analog analysis-synthesis systems ( ) Digital synthesizer (1950..) * software based* hardware based

IIT Bombay ISTE, IITB, Mumbai, 28 March, Mechanical synthesizers Von Kempelen, 1780 Wheatstone’s speaking machine

IIT Bombay ISTE, IITB, Mumbai, 28 March, Riesz, 1930’s: Speaking machine

IIT Bombay ISTE, IITB, Mumbai, 28 March, Dudley, 1930s: Voder Electronic analog synthesizer with mechanical keyboard

IIT Bombay ISTE, IITB, Mumbai, 28 March, Fant, 1950s: OVE

IIT Bombay ISTE, IITB, Mumbai, 28 March, Holmes, 1960s: Parallel formant synth.

IIT Bombay ISTE, IITB, Mumbai, 28 March, Klatt, 1970s: Cascade/parallel formant synth.

IIT Bombay ISTE, IITB, Mumbai, 28 March, Modern synthesis approaches Waveform based high quality natural output limited vocabulary large storage requirement Speech model based unlimited speech synthesis with small storage difficulty in parameter generation & concatenation Text-to-speech synthesis Text pre-processing & phonetic transcription Parsing for syntactic & semantic structure Prosodic information & Sound units Speech waveform generation

IIT Bombay ISTE, IITB, Mumbai, 28 March, Speech model based approaches Articulatory Source-filter * channel vocoder * LPC vocoder * homomorphic vocoder * formant-based synthesizer Acoustic * phase vocoder * sinusoidal model * harmonic plus noise model (HNM)

IIT Bombay ISTE, IITB, Mumbai, 28 March, HARMONIC PLUS NOISE MODEL (Stylianou, 1995; 2001) Speech signal divided into: harmonic part noise part Harmonic part Noise part Parameters: Harmonic amplitudes and phases max. voiced frequency V/UV & pitch noise parameters

IIT Bombay ISTE, IITB, Mumbai, 28 March, IMPLEMENTATION OF HNM

IIT Bombay ISTE, IITB, Mumbai, 28 March, ANALYSIS

IIT Bombay ISTE, IITB, Mumbai, 28 March, SYNTHESIS

IIT Bombay ISTE, IITB, Mumbai, 28 March, SEGMENT CONCATENATION For generation of longer units from smaller ones. Steps: 1) Parsing of phonetic transcript 2) Fetching the parameters of required units 3) Pitch and intensity modifications for prosody 4) Smoothening of the parameter tracts at unit boundaries 5) Interpolation of the parameters over the frame length from end point values 6) Synthesis

IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS All VCV syllables and vowels natural & intelligible if synthesized using harmonic part only, except / a∫a / and / asa / HNM preserve the styles (anger, high articulatory rate) Synthesized /a∫a/ Synthesized /asa/

IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS (continued) GCIs from glottal signal give better synthesis. Pitch contours for "/ ap kΛhœn ja rΛhE hœn /" From glottal signal From speech (Childers and Hu’s, 1994)

IIT Bombay ISTE, IITB, Mumbai, 28 March, RESULTS (continued) Good quality of the larger units constructed from prarameters of the smaller units. Recorded /Λb h Imani/ Synthesized from /Λb h I/, /Ima/, /ani/

IIT Bombay ISTE, IITB, Mumbai, 28 March, /a/ R HN H Cardinal Vowels /I/ R HN H /u/ R HN H R HN H R HN H Stops /a k a / /a g a/ /a k h a/ /a g h a/ Fricatives /a ∫ a / /a s a/ Affricates /a t∫ a / /a d z a/ /a t∫ h a / /a dz h a/ Word / Λb h Imani / R HN Sentence / mœn dzΛmmu ja rΛha hun / R HN DEMONSTRATIONS

IIT Bombay ISTE, IITB, Mumbai, 28 March, Further developments High quality multilingual / multi-dialect text-to-speech synthesis Voice transformations Processing for aids for the hearing impaired

IIT Bombay ISTE, IITB, Mumbai, 28 March,