A Text-to-Speech Synthesis System

Name: A Text-to-Speech Synthesis System
Uploaded: 2017-08-11T19:34:00+00:00
Duration: PTM10S41
Channel: Cecil Ross
Description: A Text-to-Speech Synthesis System

A Text-to-Speech Synthesis System
Presented By: Michael Beddaoui Abdel-Aziz El-Solh

Presentation Outline Introduction Background
3 Components of TTS System Text Pre-processing Aziz Prosody Mike Concatenation Mike Summary What has been done / Future Work Conclusion Questions

What is a TTS System? Definition:
A system which takes as input a sequence of words and converts them to speech Applications: Services for the hearing impaired Reading aloud Commercial TTS Systems: Festival Bell Labs TTS

Different TTS Systems Phoneme-Based TTS System Phonemes are:
The minimal distinctive phonetic units Relatively small in number (39 phonemes in English) Disadvantage: Phonemes ignore transitional sound !!!

Different TTS Systems (cont’d)
Diphone-Based TTS System Diphones are: Made up of 2 phonemes Incorporate transitional sound Make for better sounding speech Disadvantage: Over 1500 diphones in the English language !!!

Fundamental Components
TTS System words Text Pre-processing Prosody Concatenation

Text Pre-Processing Input Output Objective
String of characters (sentence) Output String of diphone symbols Objective Perform sentence level analysis Punctuation marks Pauses between words Convert all input to corresponding diphones

Text Pre-Processing (Block Diagram)
Number Converter Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Number Converter Replace numerals with their textual versions
one hundred Handle fractional and decimal numbers point two five

Number Converter Acronym Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Acronym Converter Replace acronyms with single letter components
A.B.C A B C Change abbreviations to full textual format Mr Mister

Number Converter Acronym Converter Word Segmenter Word Segmenter Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Word Segmenter Divide sentence into word segments Segments can be:
Special delimiter to separate segments (i.e. ‘||’) Segments can be: A single word An acronym A numeral Identify punctuation marks

Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) Word to Diphone Translator (Phonetization) MLDS Diphone Dictionary

Word To Diphone Converter (Phonetization)
Purpose Translate words to their diphone representations Resource Dictionary of words and their diphones (derived from CMU phoneme database) Over 175,000 words supported

start index, end index, middle index
W-to-D Converter Cont’d Implementation Binary Search Algorithm in C Start with whole dictionary as search range start index, end index, middle index If target word alphabetically less then middle word, then ignore second half (i.e. end index = middle index) else ignore first half (i.e. start index = middle index) Repeat until word found or range contains zero words

W-to-D Converter Cont’d
Advantages Fast search times Search range decreases exponentially with each iteration (max of 1 sec currently) Less complicated to implement Compared to indexing dictionary or Importing the dictionary to an internal structure

Number Converter Acronym Converter Word Segmenter Word to Diphone Translator (Phonetization) MLDS MLDS Diphone Dictionary

The Multi-Level Data Structure
Contains all necessary data for the next sub-system: Word Diphone representation Prosodic parameters for each diphone This reflects both word-level and sentence- level prosody Allows for modularization

Prosody done Diphone Acoustic MLDS Concatenation Retrieval
Manipulation Diphone Retrieval Concatenation yes no Diphone Database

Diphone Retrieval Database of recorded diphones
Every diphone matched with txt file Distinguished by type (CC, CV, VC, VV) References to specific components within waveform Store diphone waveform and prosodic parameters in variables

Properties of Speech Signals
eg. cat.wav c a t Non- Periodic Periodic Non- Periodic

Acoustic Manipulation - MATLab
Recognizes wave files (.WAV) load, play, write Vast array of signal processing tools Built-in functions Ease of debugging GUI-capable

Pitch/Duration/Amplitude Alteration
Pitch – vowels only As pitch increases, pitch period shrinks As pitch decreases, pitch period expands Need to alter length between pitch marks in order to alter pitch of speech signal

Altering Pitch = X Original diphone Extracted pitch period Hanning
window Hanned pitch period ‘C_A’

Altering Pitch Cont’d PSOLA – Pitch Synchronous Overlap and Add =
50% Overlap + Add Pitch Up > 50% Pitch Down < 50%

Altering Pitch Cont’d = X Kaiser window X 12 -naturally spoken
vowels contain 12-18 pitch marks =

Altering Duration Altering Amplitude
Increase number of PSOLA iterations (overlaps) to increase duration Decrease number of PSOLA iterations (overlaps) to decrease duration Altering Amplitude Multiplying the signal by a constant If constant > 1, amplitude increase If constant < 1, amplitude decrease

Concatenation Diphones Words Using PSOLA at the joining ends Ensures smooth transition Words Sentence Straight joining at the end points due to presence of pauses

Summary TTS System System modularized words Text Pre-processing
Prosody Concatenation System modularized

Progress Work Completed / Current Status Work To Be Done
Text pre-processing and prosodic manipulation for a multi-syllable word Diphone concatenation 200+ diphones in database Fully functional GUI implemented Work To Be Done Sentence level synthesis Expand diphone database Fine-tuning and enhancing Prepare for Poster Fair Write final report

Questions? Contact Information Michael Beddaoui Abdel-Aziz El-Solh

A Text-to-Speech Synthesis System

Similar presentations

Presentation on theme: "A Text-to-Speech Synthesis System"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Text-to-Speech Synthesis System

Similar presentations

Presentation on theme: "A Text-to-Speech Synthesis System"— Presentation transcript:

Similar presentations

About project

Feedback