1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

Chapter 7 Principles of Analog Synthesis and Voltage Control Contents Understanding Musical Sound Electronic Sound Generation Voltage Control Fundamentals.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
The Human Voice. I. Speech production 1. The vocal organs
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
6/3/20151 Voice Transformation : Speech Morphing Gidon Porat and Yizhar Lavner SIPL – Technion IIT December
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Analog-to-Digital Converters Prepared by: Mohammed Al-Ghamdi, Mohammed Al-Alawi,
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Chapter 15 Speech Synthesis Principles 15.1 History of Speech Synthesis 15.2 Categories of Speech Synthesis 15.3 Chinese Speech Synthesis 15.4 Speech Generation.
Chapter 8 Bits and the "Why" of Bytes: Representing Information Digitally.
1 Speech synthesis 2 What is the task? –Generating natural sounding speech on the fly, usually from text What are the main difficulties? –What to say.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Digital signal Processing Digital signal Processing ECI Semester /2004 Telecommunication and Internet Engineering, School of Engineering, South.
A PRESENTATION BY SHAMALEE DESHPANDE
SIMS-201 Representing Information in Binary. 2  Overview Chapter 3: The search for an appropriate code Bits as building blocks of information Binary.
Text-To-Speech System for Marathi Miss. Deepa V. Kadam Indian Institute of Technology, Bombay.
DSP. What is DSP? DSP: Digital Signal Processing---Using a digital process (e.g., a program running on a microprocessor) to modify a digital representation.
LE 460 L Acoustics and Experimental Phonetics L-13
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Lecture 1 Signals in the Time and Frequency Domains
Data vs. Information OUTPUTOUTPUT Information Data PROCESSPROCESS INPUTINPUT There are 10 types of people in this world those who read binary and those.
04/08/04 Why Speech Synthesis is Hard Chris Brew The Ohio State University.
Phonetics and Phonology
COMP Representing Sound in a ComputerSound Course book - pages
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Multimedia Specification Design and Production 2013 / Semester 2 / week 3 Lecturer: Dr. Nikos Gazepidis
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Chapter 7 SPEECH COMMUNICATIONS
15.1 Properties of Sound  If you could see atoms, the difference between high and low pressure is not as great.  The image below is exaggerated to show.
Foundations of Computer Science Computing …it is all about Data Representation, Storage, Processing, and Communication of Data 10/4/20151CS 112 – Foundations.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Computer Programming I. Today’s Lecture  Components of a computer  Program  Programming language  Binary representation.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
DR.D.Y.PATIL POLYTECHNIC, AMBI COMPUTER DEPARTMENT TOPIC : VOICE MORPHING.
1 Speech Synthesis User friendly machine must have complete voice communication abilities Voice communication involves Speech synthesis Speech recognition.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
Analogue & Digital. Analogue Sound Storage Devices.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.
Performance Comparison of Speaker and Emotion Recognition
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
EE Audio Signals and Systems Speech Production Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Millions of electronic pulses move through your computer every second. Computers are capable of processing thousands of functions in the time it takes.
DATA Unit 2 Topic 2. Different Types of Data ASCII code: ASCII - The American Standard Code for Information Interchange is a standard seven-bit code that.
Vocoders.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Ch2: Data Representation
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
The Vocoder and its related technology
Data Acquisition (DAQ)
Linear Prediction.
COMS 161 Introduction to Computing
EE Audio Signals and Systems
CHAPTER 69 NUMBER SYSTEMS AND CODES
Presentation transcript:

1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude variation (I.e time domain) A mathematical model of the frequency spectrum is stored and used to control an electronic model of a human vocal tract (opposed to time domain – digitize speech waveform on a one to one analog to digital conversion basis) Two methods employed: Linear Predictive Coding (LPC) Formant analysis/synthesis

2 Frequency Domain Analysis/Synthesis E.g: Speak & Spell education toy by Texas Instrument Speech waveform is digitized with ADC using SPCM then the waveform is analyzed to extract the frequency, intensity and other vocal tract type variables needed to mathematically reconstruct the waveform. The extracted speech data are then coded into a series of linear equation parameters called LPC codes that tmodel the frequency characteristics of the spoken waveform. The synthesizer circuit is designed as a model of the human vocal tract. Linear Predictive Coding (LPC)

3 Frequency Domain Analysis/Synthesis Synthesizer circuit can be divided into 3 major sections: Excitation source Multistage digital filter DAC Linear Predictive Coding (LPC) (cont)

4 Frequency Domain Analysis/Synthesis Periodic pulse generator Emulates vocal cords action by producing periodic voiced sound frequencies The rate at which vocal cords vibrate determine the pitch of the synthesized sound White Noise Generator Produce unvoiced sounds (produced as a result of air turbulence in the vocal cavity) by generating random frequency pattern that result in a hissing type of noise Electronic Switch The voiced and unvoiced sound are combined by electronically switching between the two sounds generator Amplifier amplified the sounds and pass it through multistage digital filter circuit. Linear Predictive Coding (LPC) (cont) Excitation source

5 Frequency Domain Analysis/Synthesis Shape or modulate the excitation signal the same way the throat, tongue, teeth and lips modulate vocal cavity sounds Linear Predictive Coding (LPC) (cont) Multistage digital filter Convert digital to analog speech signals DAC

6 LPC's code controls the the following circuit function: Pitch of the voiced sounds Selection between voiced and unvoiced sounds Amplitude of the excitation signal Control of the digital filter by giving the filter coefficients Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)

7 Weakness: It can take several minutes with a large computer just to convert a few seconds of speech to the required LPC's format Advantages: Once coded, LPC data rate required to reproduce speech is less than 24,000 bps (10 seconds of speech can be stored in less than 2.9k byte of memory) Retains all the pitch and accent characteristics Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)

8 Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)

9 Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)

10 Similar to LPC (based on frequency spectrum found in natural speech and utilize the same synthesizer circuit) Formant analysis/synthesis attempts to generate speech by reconstructing the formant. Formant: Any of several frequency regions of relatively great intensity in a sound spectrum, which together determine the characteristic quality of a vowel sound Formant frequency are constantly shifting to produce different sound as you speak. Formant frequency characteristics of a spoken waveform can be digitally coded and used to control frequency generators and filters in electronic synthesizer to reproduce the original speech waveform. Frequency Domain Analysis/Synthesis Formant Analysis/ Synthesis

11 Original speech formant can be coded and synthesized one word at a time. Individual words are stored and played back to produce connected speech. This is called stored-word or dictionary Weakness: vocabulary is fixed and limited by memory available. Advantage: Less complex and economical. Frequency Domain Analysis/Synthesis Formant Analysis/ Synthesis (cont)

12 Phoneme Speech Synthesis

13 Most phoneme synthesizer are really LPC synthesizer Phoneme synthesizer can be divided into three major sections: Lookup ROM -Translates phoneme code into a set of LPC parameter that is applied to the excitation sources and digital filter -LPC parameters control which excitation source is selected, its pitch and the filter settings that are required to produce the given phoneme. Excitation source Multistage digital filter Phoneme speech synthesizer can be used in one of two ways: direct speech synthesis -text-to-speech synthesis Phoneme Speech Synthesis (cont)

14 Direct Phoneme Synthesis Phoneme Speech Synthesis (cont) Phoneme code for a given phrase must be determined by programmer. This code is called phoneme string and are usually stored as part of a speech subroutine in RAM or ROM The subroutine is executed when the programmed phrase must be spoken. For example, a robot might be programmed to say “low voltage” when its battery needs recharging. This phrase will be executed when the voltage sensing circuit detected the low voltage condition.

15 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont) Developing Phoneme String : Determine phoneme string symbol required for the given words within a phrase. Provide pauses between syllables and words as needed for timing and rhythm Provide intonation for the individual word as well as the entire phrase Convert the phoneme symbol string to phoneme code string Execute the phoneme string, listen to the result and modify accordingly.

16 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)

17 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)

18 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)

19 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)

20 Text to Speech Conversion Phoneme Speech Synthesis (cont) Phrases is entered into a computer by means of keyboard and let the computer perform the code conversion. Since most computer represent letters and symbols using ASCII code, the program task reduces to converting ASCII code to phoneme code Example of usage: for person who loses their sight, mute etc 3 ways written text can be converted to phoneme code string: word lookup morpheme lookup phoneme lookup

21 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Also known as dictionary method Software will look for the ASCII representation of a space to divide up the phrase into individual words. Each individual word will be compared with dictionary until a match is found. If there is a match, lookup table will produce phoneme code string that is required to pronounce the word. Phoneme code string are sequentially passed to a phoneme synthesizer for immediate speech reproduction or temporarily stored in a phoneme memory buffer for subsequent playback Weakness: Less flexible and need large memory Large dictionary require too much search time Abbreviation, misspelled or unusual odd might never be found. Word Lookup

22 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Morpheme Lookup Morpheme is any word or a word segment that conveys meaning. Example: sun in sundown, ortho in orthopedic, blue in blueberry, the sun in sundown. Works like word lookup system in that the morph are stored in memory Weakness: Text must be dissected and analyzed to produce appropriate morph string. Relatively require large amount of computer time and is inefficient (software must look at all possible ways that a given word can be broken up in order to find respective morph).

23 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Morpheme Lookup (cont) Advantage More flexible if compared to word lookup. Only 8000 or so morph (English word) need to be stored to obtain very large vocabulary. New and unusual words rarely need to be added to the dictionary, since in most cases they will consist of existing morph.

24 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Phoneme Lookup Most efficient and flexible Also known as letter-to-phoneme lookup because of the software attempts to convert each individual text letter or symbol to its corresponding phoneme A system developed by Naval Research Laboratory (NRL) uses production rules to convert written text into phonemes: IF THEN # Context must be one or more vowels : Context must be zero or more consonants ! Context must be a non-alphanumeric character (e.g. space, punctuation mark, mathematical symbol

25 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Phoneme Lookup (cont) E.g: IF #: (AL)! THEN UH, L From the example #: means that context before AL must be one or more vowels and must be zero or more consonants from left to right The right context is represented by a single exclamation mark (!) or context must be a non-alphanumeric character Therefore the word FICTIONAL (as an example) satisfies IF #: (AL)! THEN UH, L

26

27

28