Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Text-to-Speech Synthesis System

Similar presentations


Presentation on theme: "A Text-to-Speech Synthesis System"— Presentation transcript:

1 A Text-to-Speech Synthesis System
Presented By: Michael Beddaoui Abdel-Aziz El-Solh

2 Presentation Outline Introduction Background
3 Components of TTS System Text Pre-processing Aziz Prosody Mike Concatenation Mike Summary What has been done / Future Work Conclusion Questions

3 What is a TTS System? Definition:
A system which takes as input a sequence of words and converts them to speech Applications: Services for the hearing impaired Reading aloud Commercial TTS Systems: Festival Bell Labs TTS

4 Different TTS Systems Phoneme-Based TTS System Phonemes are:
The minimal distinctive phonetic units Relatively small in number (39 phonemes in English) Disadvantage: Phonemes ignore transitional sound !!!

5 Different TTS Systems (cont’d)
Diphone-Based TTS System Diphones are: Made up of 2 phonemes Incorporate transitional sound Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!

6 Fundamental Components
TTS System words Text Pre-processing Prosody Concatenation

7 Text Pre-Processing Input Output Objective
String of characters (sentence) Output String of diphone symbols Objective Perform sentence level analysis Punctuation marks Pauses between words Convert all input to corresponding diphones

8 Text Pre-Processing (Block Diagram)
Number Converter Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

9 Number Converter Replace numerals with their textual versions
one hundred Handle fractional and decimal numbers point two five

10 Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

11 Acronym Converter Replace acronyms with single letter components
A.B.C A B C Change abbreviations to full textual format Mr Mister

12 Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

13 Word Segmenter Divide sentence into word segments Segments can be:
Special delimiter to separate segments (i.e. ‘||’) Segments can be: A single word An acronym A numeral Identify punctuation marks

14 Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

15 Word To Diphone Converter (Phonetization)
Purpose Translate words to their diphone representations Resource Dictionary of words and their diphones (derived from CMU phoneme database) Over 175,000 words supported

16 start index, end index, middle index
W-to-D Converter Cont’d Implementation Binary Search Algorithm in C Start with whole dictionary as search range start index, end index, middle index If target word alphabetically less then middle word, then ignore second half (i.e. end index = middle index) else ignore first half (i.e. start index = middle index) Repeat until word found or range contains zero words

17 W-to-D Converter Cont’d
Advantages Fast search times Search range decreases exponentially with each iteration (max of 1 sec currently) Less complicated to implement Compared to indexing dictionary or Importing the dictionary to an internal structure

18 Text Pre-Processing (Block Diagram)
Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary

19 The Multi-Level Data Structure
Contains all necessary data for the next sub-system: Word Diphone representation Prosodic parameters for each diphone This reflects both word-level and sentence- level prosody Allows for modularization

20 Prosody done Diphone Acoustic MLDS Concatenation Retrieval
Manipulation Diphone Retrieval Concatenation yes no Diphone Database

21 Diphone Retrieval Database of recorded diphones
Every diphone matched with txt file Distinguished by type (CC, CV, VC, VV) References to specific components within waveform Store diphone waveform and prosodic parameters in variables

22 Properties of Speech Signals
eg. cat.wav c a t Non- Periodic Periodic Non- Periodic

23 Acoustic Manipulation - MATLab
Recognizes wave files (.WAV) load, play, write Vast array of signal processing tools Built-in functions Ease of debugging GUI-capable

24 Pitch/Duration/Amplitude Alteration
Pitch – vowels only As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks in order to alter pitch of speech signal

25 Altering Pitch = X Original diphone Extracted pitch period Hanning
window Hanned pitch period ‘C_A’

26 Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add =
50% Overlap + Add Pitch Up > 50% Pitch Down < 50%

27 Altering Pitch Cont’d = X Kaiser window X 12 -naturally spoken
vowels contain 12-18 pitch marks =

28 Altering Duration Altering Amplitude
Increase number of PSOLA iterations (overlaps) to increase duration Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude Multiplying the signal by a constant If constant > 1, amplitude increase If constant < 1, amplitude decrease

29 Concatenation Diphones Words Using PSOLA at the joining ends Ensures smooth transition Words Sentence Straight joining at the end points due to presence of pauses

30 Summary TTS System System modularized words Text Pre-processing
Prosody Concatenation System modularized

31 Progress Work Completed / Current Status Work To Be Done
Text pre-processing and prosodic manipulation for a multi-syllable word Diphone concatenation 200+ diphones in database Fully functional GUI implemented Work To Be Done Sentence level synthesis Expand diphone database Fine-tuning and enhancing Prepare for Poster Fair Write final report

32 Questions? Contact Information Michael Beddaoui Abdel-Aziz El-Solh

33


Download ppt "A Text-to-Speech Synthesis System"

Similar presentations


Ads by Google