Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pronouncing Words in TTS Systems

Similar presentations


Presentation on theme: "Pronouncing Words in TTS Systems"— Presentation transcript:

1 Pronouncing Words in TTS Systems
Julia Hirschberg CS 4706 9/21/2018

2 Today Motivation Improve TTS intelligibility and naturalness
An application: Language Learning Challenges for automatic word pronunciation Standard methods Pronunciation by rule Pronouncing dictionaries Innovative solutions Pronunciation by language origin Pronunciation by rhyming analogy Expanding the lexicon using Active Learning techniques 9/21/2018

3 Motivation Intelligibility Naturalness
Applications to language learning Unlimited vocabulary Type a word or phrase and hear it spoken in your target language To imitate To learn to recognize 9/21/2018

4 Converting Text to Phonemes
Pronouncing numbers in different contexts Identifying proper names Expanding abbreviations and acronyms correctly 9/21/2018

5 Numbers Pronouncing numbers in different contexts
In 1996 she sold 1995 shares and deposited $42 in her 401(k). The number is That cc # is Visa , expiration 2/07. Conventions: Years Money Phone numbers Money amounts 9/21/2018

6 Cultural Dependence Russia:
Article 3 of the rules attached to the Moscow Telephone Network Subscribers Directory, 1916: “Numbers over a hundred are to be pronounced as follows: 1.23—one twenty three, 9.72—nine seventy two, 70.09—seventy zero nine. In numbers over 10,000 every figure of a hundred should be pronounced separately, for example, —one twenty forty eight, —two zero eight thirty five, —three thirty five twenty nine, —four forty nine fifty two, —five fifteen eighty six etc., not one hundred and twenty forty eight, two hundred and eight thirty five etc.” 9/21/2018

7 In France A French phone number is 10 digits given in series of two:
"Zéro un, quarante-trois, quarante-huit, douze, quatre-vingt-cinq". Numbers in addresses are always pronounced as a full number: Chambre 823, 240 rie Rivoli Chambre huit-cent-vingt-trois. Deux-cent-quarante, rue de Rivoli 9/21/2018

8 Pronouncing Words Part-of-speech: use, close, dove, multiply, coax
Origin: shoe (ME shoo), phoenix (Gr) mole, attaches, resume Morphological analysis: ferryboat, ferryboats Popemobile Letter-to-sound correspondences: oo, th, qu, e (beet, bet, bite, weigh,…) 9/21/2018

9 Conventions for numbers and symbols: &c, evalu8, cu, tsp
Genre: , chat, recipe, want ad, software license… 9/21/2018

10 Word Sense Ambiguity and Pronunciation
Homographs bass/bass Nice/nice desert/desert Homograph disambiguation 9/21/2018

11 Letter-to-Sound Rules
E.g. I _{C}e$  /ai/ rise Else I  /ih/ rip Advantages Pronounces anything, seen or not Disadvantages Must be built by hand How to encode all the exceptions: E.g. ripen/risen/riser/river 9/21/2018

12 Exceptions dictionary
Proper names: Nice, Ramirez, Ribeiro, Rise, Infiniti Solutions More complex rules Exceptions dictionary Consulted first But how handle morphological analysis? Rise’s hat 9/21/2018

13 Dictionary-based Approaches
Rely on very large dictionary Disadvantages Hand labor to create entries Redundancy Cat, cats, cat’s, cats’ Out-of-vocabulary items Proper names: covering all U.K. surnames would require >5,000,000 entries New words: fax, , mudd, … Technical terms: liposuction, anova, bernaise Foreign borrowings: frappe, ciao, louche 9/21/2018

14 Morphological preprocessing before dictionary look-up
Solutions Morphological preprocessing before dictionary look-up Fall back to L2Sound rules if no dictionary ‘hit’ 9/21/2018

15 More Innovative Approaches
Pronouncing OOV words Handling proper names Inferring country of origin: Takashita, Leroy, Kirov, Lima, Infiniti Pronunciation by analogy Analog/dialog Risible/visible Proper names: Alifano/Califano 9/21/2018

16 Bootstrapping Phonetic Lexicons (Maskey et al ’04)
For some languages, online pronouncing lexicons exist – but for others….e.g. Nepali How to minimize effort in creating lexicons? Idea Given a native speaker and a large amount of online text in the language… Native speaker builds small lexicon by hand for seed set of N most common words in text, e.g. is: /izh/ the: /dhax/ 9/21/2018

17 Derive L2S rules from lexicon automatically, e.g.
is  ih{zh} the  {dh}ax … Loop: Choose the next N most common set of words from the text and use the lexicon + L2S rules to predict pronunciations, e.g. telephone -> /telaxfown/ He -> /hax/? Rise -> /rihzhax/? Assign a confidence score to each prediction by comparing each word to all words in lexicon If is -> /ihzh} in lexicon and no other orthographically similar words are pronounced differently, new rule his -> /hihzh/ scores high For low confidence pronunciations, Active Learning step: Inspect and calculate error rate Hand correct errors and add all to lexicon 9/21/2018

18 Results English: German: Nepali
Build a new set of L2S rules from augmented lexicon Iterate from Loop until performance stabilizes Results English: 94% success on test set after 23 iterations, 16K entry lexicon Performance comparable to CMUDict and 1/7 the size German: 90% accuracy after 13 iterations, 28K lexicon Nepali 9/21/2018

19 94.6% accuracy after 16 iterations, 5K lexicon
9/21/2018

20 Improving Pronunciation Dictionary Coverage (Fackrell and Skut ’04)
Idea: Many proper names have more than one spelling (e.g. More, Moore; Smith Smythe) Find a mapping between OOV (Out of Vocabulary) spellings and alternate spellings – a ‘fuzzy’ match Identify spelling alternations that are ‘pronunciation-neutral’ in an existing lexicon to produce rewrite rules for OOV words 9/21/2018

21 How do current systems do on pronunciation?
Loquendo (temporarily unavailable) CEPSTRAL AT&T Naturally Speaking 9/21/2018

22 Next Class Accenting and information status 9/21/2018


Download ppt "Pronouncing Words in TTS Systems"

Similar presentations


Ads by Google