Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.

Similar presentations


Presentation on theme: "Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters."— Presentation transcript:

1 Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters ● Synthesis ● Speech synthesizers ● How they work ● An articulatory synthesizer ● Phones to Synthesizer parameters ● Sampling ● Interpolation methods

2 Key Terms and Definitions ● Phone = the basic unit used to describe an utterance. Basic speech sound. ● Phoneme = A categorization of a phone ● Two sounds may sound different, but may still be classified as the same category ● Vowel = A speech sound which is produced by a relatively unconstricted vocal tract ● Voicing = A kind of excitation of the vocal tract produced by forcing air through the vocal folds ● Frication = A kind of excitation of the vocal tract produced by forcing air through a narrow constriction ● Aspiration = The escape of breath through a relatively unconstricted vocal tract without accompanying vibration of the vocal folds ● Uvula = The thing that hangs down in the back of your throat ● Posture = A configuration of the vocal tract. A particular configuration will typically produce sounds within a particular phoneme category

3 Key Processes – Unrestricted Text-To-Speech Plain Text Text To Postures Conversion Postures Audio Out Posture to Synthesizer Parameter Conversion Speech Synthesizer Synthesizer Control Parameters

4 Text to Postures Conversion ● There are two main algorithms for converting plain text to a phonetic or posture representation ● Rule-based (letter to sound rules) ● Dictionary based ● A combination of the above can be used ● Not a simple problem to solve ● Different accents ● Dialects ● Homonyms ● Lead/Lead ● Read/Read

5 Speech Synthesizer ● Early synthesizers performed a spectral analysis of acoustic waveforms. ● The synthesis procedure attempted to reproduce waveforms (using filters) which contained similar acoustic features. ● DECTalk ● MITalk ● The Klatt synthesizer ● Articulatory synthesis uses a physical model of the vocal tract in order to simulate the resonances of the vocal tract

6 The articulatory synthesizer developed at the U of C ● Developed by Dr. Leonard Manzara ● Two main types of parameters ● Utterance rate parameters ● Set once per utterance ● Sample rate parameters ● Set every sample period (usually 2-4 ms)

7 Utterance Rate Parameters 2 ; output file format (0 = AU, 1 = AIFF, 2 = WAVE) 44100.0 ; output sample rate (22050.0, 44100.0) 4.0 ; input control rate (1 - 1000 Hz) 60.0 ; master volume (0 - 60 dB) 2 ; number of sound output channels (1 or 2) 0.0 ; stereo balance (-1 to +1) 0 ; glottal source waveform type (0 = pulse, 1 = sine) 40.0 ; glottal pulse rise time (5 - 50 % of GP period) 16.0 ; glottal pulse fall time minimum (5 - 50 % of GP period) 32.0 ; glottal pulse fall time maximum (5 - 50 % of GP period) 0.00 ; glottal source breathiness (0 - 10 % of GS amplitude) 17.5 ; nominal tube length (10 - 20 cm) 32 ; tube temperature (25 - 40 degrees celsius) 1.50 ; junction loss factor (0 - 5 % of unity gain) 3.05 ; aperture scaling radius (3.05 - 12 cm) 5000.0 ; mouth aperture coefficient (100 - nyqyist Hz) 5000.0 ; nose aperture coefficient (100 - nyquist Hz) 1.35 ; radius of nose section 1 (0 - 3 cm) 1.96 ; radius of nose section 2 (0 - 3 cm) 1.91 ; radius of nose section 3 (0 - 3 cm) 1.3 ; radius of nose section 4 (0 - 3 cm) 0.73 ; radius of nose section 5 (0 - 3 cm) 1500.0 ; throat lowpass frequency cutoff (50 - nyquist Hz) 6.0 ; throat volume (0 - 48 dB) 1 ; pulse modulation of noise (0 = off, 1 = on) 48.0 ; noise crossmix offset (30 - 60 db)

8 Sample Rate Parameters Pitch (0 = middle C 256 Hz approx.) Glotal Volume(0 – 60 dB) Aspiration Volume(0 – 60 dB) Frication Volume(0 – 60 dB) Frication Position(0 – 7) Frication Center Frequency(100Hz - 20000Hz) Frication Bandwidth(250Hz - 20000Hz) R1(0cm - 3cm) R2(0cm - 3cm) R3(0cm - 3cm) R4(0cm - 3cm) R5(0cm - 3cm) R6(0cm - 3cm) R7(0cm - 3cm) R8(0cm - 3cm) Velum(0cm - 1.5)

9 Assignment Number 1 ● Your job will be to take a list of phones/postures and convert them to synthesizer control parameters. ● You will have to provide for at least two interpolation methods ● All of the posture data has been made available for you in both HTML and XML form. ● Vowels will be easy to synthesize. ● Consonants will be difficult to synthesize ● You will be expected to apply topics discussed in class to your assignment ● Refactoring ● Reflection (possibly) ● Polymorphism


Download ppt "Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters."

Similar presentations


Ads by Google