Presentation on theme: "1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude."— Presentation transcript:
1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude variation (I.e time domain) A mathematical model of the frequency spectrum is stored and used to control an electronic model of a human vocal tract (opposed to time domain – digitize speech waveform on a one to one analog to digital conversion basis) Two methods employed: Linear Predictive Coding (LPC) Formant analysis/synthesis
2 Frequency Domain Analysis/Synthesis E.g: Speak & Spell education toy by Texas Instrument Speech waveform is digitized with ADC using SPCM then the waveform is analyzed to extract the frequency, intensity and other vocal tract type variables needed to mathematically reconstruct the waveform. The extracted speech data are then coded into a series of linear equation parameters called LPC codes that tmodel the frequency characteristics of the spoken waveform. The synthesizer circuit is designed as a model of the human vocal tract. Linear Predictive Coding (LPC)
3 Frequency Domain Analysis/Synthesis Synthesizer circuit can be divided into 3 major sections: Excitation source Multistage digital filter DAC Linear Predictive Coding (LPC) (cont)
4 Frequency Domain Analysis/Synthesis Periodic pulse generator Emulates vocal cords action by producing periodic voiced sound frequencies The rate at which vocal cords vibrate determine the pitch of the synthesized sound White Noise Generator Produce unvoiced sounds (produced as a result of air turbulence in the vocal cavity) by generating random frequency pattern that result in a hissing type of noise Electronic Switch The voiced and unvoiced sound are combined by electronically switching between the two sounds generator Amplifier amplified the sounds and pass it through multistage digital filter circuit. Linear Predictive Coding (LPC) (cont) Excitation source
5 Frequency Domain Analysis/Synthesis Shape or modulate the excitation signal the same way the throat, tongue, teeth and lips modulate vocal cavity sounds Linear Predictive Coding (LPC) (cont) Multistage digital filter Convert digital to analog speech signals DAC
6 LPC's code controls the the following circuit function: Pitch of the voiced sounds Selection between voiced and unvoiced sounds Amplitude of the excitation signal Control of the digital filter by giving the filter coefficients Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)
7 Weakness: It can take several minutes with a large computer just to convert a few seconds of speech to the required LPC's format Advantages: Once coded, LPC data rate required to reproduce speech is less than 24,000 bps (10 seconds of speech can be stored in less than 2.9k byte of memory) Retains all the pitch and accent characteristics Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)
8 Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)
9 Frequency Domain Analysis/Synthesis Linear Predictive Coding (LPC) (cont)
10 Similar to LPC (based on frequency spectrum found in natural speech and utilize the same synthesizer circuit) Formant analysis/synthesis attempts to generate speech by reconstructing the formant. Formant: Any of several frequency regions of relatively great intensity in a sound spectrum, which together determine the characteristic quality of a vowel sound Formant frequency are constantly shifting to produce different sound as you speak. Formant frequency characteristics of a spoken waveform can be digitally coded and used to control frequency generators and filters in electronic synthesizer to reproduce the original speech waveform. Frequency Domain Analysis/Synthesis Formant Analysis/ Synthesis
11 Original speech formant can be coded and synthesized one word at a time. Individual words are stored and played back to produce connected speech. This is called stored-word or dictionary Weakness: vocabulary is fixed and limited by memory available. Advantage: Less complex and economical. Frequency Domain Analysis/Synthesis Formant Analysis/ Synthesis (cont)
13 Most phoneme synthesizer are really LPC synthesizer Phoneme synthesizer can be divided into three major sections: Lookup ROM -Translates phoneme code into a set of LPC parameter that is applied to the excitation sources and digital filter -LPC parameters control which excitation source is selected, its pitch and the filter settings that are required to produce the given phoneme. Excitation source Multistage digital filter Phoneme speech synthesizer can be used in one of two ways: direct speech synthesis -text-to-speech synthesis Phoneme Speech Synthesis (cont)
14 Direct Phoneme Synthesis Phoneme Speech Synthesis (cont) Phoneme code for a given phrase must be determined by programmer. This code is called phoneme string and are usually stored as part of a speech subroutine in RAM or ROM The subroutine is executed when the programmed phrase must be spoken. For example, a robot might be programmed to say “low voltage” when its battery needs recharging. This phrase will be executed when the voltage sensing circuit detected the low voltage condition.
15 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont) Developing Phoneme String : Determine phoneme string symbol required for the given words within a phrase. Provide pauses between syllables and words as needed for timing and rhythm Provide intonation for the individual word as well as the entire phrase Convert the phoneme symbol string to phoneme code string Execute the phoneme string, listen to the result and modify accordingly.
16 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)
17 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)
18 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)
19 Direct Phoneme Synthesis (cont) Phoneme Speech Synthesis (cont)
20 Text to Speech Conversion Phoneme Speech Synthesis (cont) Phrases is entered into a computer by means of keyboard and let the computer perform the code conversion. Since most computer represent letters and symbols using ASCII code, the program task reduces to converting ASCII code to phoneme code Example of usage: for person who loses their sight, mute etc 3 ways written text can be converted to phoneme code string: word lookup morpheme lookup phoneme lookup
21 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Also known as dictionary method Software will look for the ASCII representation of a space to divide up the phrase into individual words. Each individual word will be compared with dictionary until a match is found. If there is a match, lookup table will produce phoneme code string that is required to pronounce the word. Phoneme code string are sequentially passed to a phoneme synthesizer for immediate speech reproduction or temporarily stored in a phoneme memory buffer for subsequent playback Weakness: Less flexible and need large memory Large dictionary require too much search time Abbreviation, misspelled or unusual odd might never be found. Word Lookup
22 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Morpheme Lookup Morpheme is any word or a word segment that conveys meaning. Example: sun in sundown, ortho in orthopedic, blue in blueberry, the sun in sundown. Works like word lookup system in that the morph are stored in memory Weakness: Text must be dissected and analyzed to produce appropriate morph string. Relatively require large amount of computer time and is inefficient (software must look at all possible ways that a given word can be broken up in order to find respective morph).
23 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Morpheme Lookup (cont) Advantage More flexible if compared to word lookup. Only 8000 or so morph (English word) need to be stored to obtain very large vocabulary. New and unusual words rarely need to be added to the dictionary, since in most cases they will consist of existing morph.
24 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Phoneme Lookup Most efficient and flexible Also known as letter-to-phoneme lookup because of the software attempts to convert each individual text letter or symbol to its corresponding phoneme A system developed by Naval Research Laboratory (NRL) uses production rules to convert written text into phonemes: IF THEN # Context must be one or more vowels : Context must be zero or more consonants ! Context must be a non-alphanumeric character (e.g. space, punctuation mark, mathematical symbol
25 Text to Speech Conversion (cont) Phoneme Speech Synthesis (cont) Phoneme Lookup (cont) E.g: IF #: (AL)! THEN UH, L From the example #: means that context before AL must be one or more vowels and must be zero or more consonants from left to right The right context is represented by a single exclamation mark (!) or context must be a non-alphanumeric character Therefore the word FICTIONAL (as an example) satisfies IF #: (AL)! THEN UH, L