Presentation is loading. Please wait.

Presentation is loading. Please wait.

5-Text To Speech (TTS) Speech Synthesis

Similar presentations


Presentation on theme: "5-Text To Speech (TTS) Speech Synthesis"— Presentation transcript:

1 5-Text To Speech (TTS) Speech Synthesis
Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness Concatenative Approaches Rule-Based Approaches

2 Speech Synthesis Concept
Text Speech Text Text to Phone Sequence Phone Sequence to Speech Speech Natural Language Processing (NLP) Speech Processing

3 Phone Units Paragraph ( ) Sentence ( )
Word (Depends on the language. Usually more than 100,000) Syllable Diphone & Triphone Phoneme (Between 10 , 100)

4 Phone Units (Cont’d) Diphone : We model Transitions between two phonemes p1 p2 p3 p4 p5 Diphone Phoneme

5 Phone Units (Cont’d) In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically. Practically the only Diphone that we don’t have in farsi is /zho/ we have Triphone Theoretically. But practically we have about Triphone in farsi.

6 Phone Units (Cont’d) Syllable = Onset (Consonant) + Rhyme
Syllable is a set of phonemes that exactly contains one vowel Syllables in Farsi : CV , CVC , CVCC We have about 4000 Syllables in farsi Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . . Number of Syllables in English is very much

7 Phone Sequence To Speech
Concatenative Approaches : Trade-Off between Naturality And Memory usage and variety of desired functions Rule-Based Approaches : The most important Rule-Based approach is Klatt method

8 Phone Sequence To Speech (Cont’d)
to primitive utterance primitive utterance to Natural Speech Text to Phone Sequence Speech Text NLP Speech Processing

9 Speech Naturalness Obviation of undesirable noise and distortion and dissociation from speech Prosody generation Speech energy Duration pitch Intonation Stress

10 Speech Naturalness (Cont’d)
Intonation and Stress are very effective in speech naturalness Intonation : Variation of Pitch frequency along speaking Stress : Increasing the pitch frequency in a specific time

11 Concatenative Approaches
In this approaches we store units of natural speech for reconstruction of desired speech We could select the appropriate phone unit for speech synthesis we can store compressed parameters instead of main waveform

12 Concatenative Approaches (Cont’d)
Benefits of storing compressed parameters instead of main waveform Less memory use General state instead of a specific stored utterance Generating prosody easily

13 Concatenative Approaches (Cont’d)
Phone Unit Type of Storing Paragraph Sentence Word Syllable Diphone Phoneme Main Waveform Coded/Main Waveform Coded Waveform

14 Concatenative Approaches (Cont’d)
Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing Overlap-Add-Method is a standard DSP method PSOLA is a base action for Voice Conversion. In this method in analysis stage we select frames that are synchronous by pitch markers.

15 Rule-Based Approach Stages
Determine the speech model and model parameters Determine type of phone units Determine some parameter amount for each phone unit Substitute sequence of phone units by its equivalent parameter sequence Put parameter sequence in speech model

16 KLATT 80 Model

17 KLATT 88 Model

18 THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER
FNP FNZ FTP FTZ F1 B1 BNP BNZ BTP BTZ DF1 DB F2 B F3 B F4 B F5 B5 GLOTTAL SOUND SOURCES NASAL POLE ZERO PAIR TRACHEAL POLE ZERO PAIR FIRST FORMANT RESONATOR SECOND FORMANT RESONATOR THIRTH FORMANT RESONATOR FOURTH FORMANT RESONATOR FIFTH FORMANT RESONATOR FILTERED IMPULSE TRAIN TL CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES F0 AV OO FL DI SPECTRAL TILT LOW-PAS RESONANTOR KL GLOTT 88 model (default) SS CP + NASAL FORMANT RESONATOR AH ANV ASPIRATION NOISE GENERATOR SO MODIFIED LF MODEL FIRST FORMANT RESONATOR A1V SECOND FORMANT RESONATOR B2F + - A2F FIRST DIFFERENCE PREEMPHASIS SECOND FORMANT RESONATOR A2V + THIRD FORMANT RESONATOR B3F A3F THIRTH FORMANT RESONATOR AF A3V FRICATION NOISE GENERATOR FOURTH FORMANT RESONATOR B4F A4F FOURTH FORMANT RESONATOR A4V FIFTH FORMANT RESONATOR B5F + - A5F TRACHEAL FORMANT RESONATOR ATV B6F F6 SIXTH FORMANT RESONATOR A6F AB PARALLEL VOCAL TRACT MODEL LYRYNGEAL SOUND SOURCES (NORMALLY NOT USED) BYPASS PATH PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES

19 Three Voicing Source Model In KLATT 88
The old KLSYN impulsive source The KLGLOTT88 model The modified LF model


Download ppt "5-Text To Speech (TTS) Speech Synthesis"

Similar presentations


Ads by Google