Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech.

Similar presentations


Presentation on theme: "Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech."— Presentation transcript:

1 Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech Processing Applying the Pronunciation Lexicon Specification to ASR & TTS Patrizio Bergallo

2 Applying the Pronunciation Lexicon Specification to ASR & TTS 2 Patrizio Bergallo Agenda Loquendo Today Introduction to PLS –Reference Scenario –Pronunciation Lexicons –International Phonetic Alphabet Overview of PLS –How does TTS use PLS? –How does ASR use PLS? Examples of Use Latest Improvements

3 Applying the Pronunciation Lexicon Specification to ASR & TTS 3 Patrizio Bergallo Loquendo Today Global company of the Telecom Italia group, leader in Europe and South America in the Speech Technologies market Company founded in 2001 from Telecom Italia Labs, benefiting from know-how gained from more than 30 years research experience Complete set of Multilingual speech technologies on a wide spectrum of devices; 25 patents, 50 voices and 20 languages Full support for international standards (MRCPv1/v2, VoiceXML 2.0/2.1, CCXML, SSML, SRGS, SISR) Company ready for challenging future scenarios: Multimodality, Security 100 employees, and displayed strong growth throughout 2007 HQ in Turin, Offices in US, Spain, Germany and France, and a Worldwide Network of Partners

4 Applying the Pronunciation Lexicon Specification to ASR & TTS 4 Patrizio Bergallo Reference Scenario Many speech applications need to specify pronunciation for words and phrases –Surnames, locations, company names –Acronyms –Names in specific contexts (restaurants, sports, movie titles, etc.) –Foreign words, mixed languages Pronunciation is critical both for TTS and ASR –Improves reading of prompts by TTS –Improves ASR performance VoiceXML 2.0/2.1 applications are the reference scenario –Prompts are based on SSML 1.0 (or in future SSML 1.1) –Recognition grammars are based on SRGS 1.0

5 Applying the Pronunciation Lexicon Specification to ASR & TTS 5 Patrizio Bergallo Pronunciation Lexicons Pronunciation Lexicon –a mapping between words (or short phrases), their written representations, and their pronunciations suitable for use by an ASR engine or a TTS engine Pronunciation lexicons are not only useful for voice browsers –They have also proven effective mechanisms to support accessibility for the differently able as well as greater usability for all users –They are used to good effect in screen readers and user agents supporting multimodal interfaces The W3C Pronunciation Lexicon Specification (PLS) Version 1.0 is designed to enable interoperable specification of pronunciation lexicons

6 Applying the Pronunciation Lexicon Specification to ASR & TTS 6 Patrizio Bergallo Pronunciation Lexicon Specification W3C specification status –Second Last Call Working Draft (26 October, 2006) –Currently the Implementation Report Plan and the Disposition of Comments are under development (all public comments were addressed) –Candidate Recommendation expected 3Q07 Part of first version of the Speech Interface Framework (Larson, 2000) W3C Recommendation W3C Last Call Working Draft

7 Applying the Pronunciation Lexicon Specification to ASR & TTS 7 Patrizio Bergallo International Phonetic Alphabet Pronunciation is represented by a phonetic alphabet –Standard phonetic alphabets International Phonetic Alphabet (IPA) –Well known phonetic alphabet SAMPA - ASCII based (simple to write) Pinyin (Chinese Mandarin), JEITA (Japanese), etc. –Proprietary phonetic alphabets International Phonetic Alphabet (IPA) –Created by International Phonetic Association (active since 1896), collaborative effort by all the major phoneticians around the world –Universally agreed system of notation for sounds of languages –Covers all languages –Requires UNICODE to write it –Normatively referenced by PLS

8 Applying the Pronunciation Lexicon Specification to ASR & TTS 8 Patrizio Bergallo Overview of PLS A PLS document is a container ( ) of several lexical entries ( ) Each lexical entry contains –One or more spellings ( ) –One or more pronunciations ( ) or substitutions ( ) Each PLS document is related to a single unique language ( xml:lang ) SSML 1.0 and SRGS 1.0 documents can reference one or more PLS documents Current version doesn’t include morphological, syntactic and semantic information associated with pronunciations

9 Applying the Pronunciation Lexicon Specification to ASR & TTS 9 Patrizio Bergallo PLS Example <lexicon version="1.0" xmlns=" http://www.w3.org/2005/01/pronunciation-lexicon " xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance " xsi:schemaLocation=" http://www.w3.org/2005/01/pronunciationlexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon2007@@@@/pls.xsd " alphabet="ipa" xml:lang="en-US"> Sepulveda sə ˈ p ʌ lv ɪ də W3C World Wide Web Consortium

10 Applying the Pronunciation Lexicon Specification to ASR & TTS 10 Patrizio Bergallo How does TTS use PLS? SSML 1.0 The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Benigni. PLS 1.0 La vita è bella ˈ l ɑ ˈ vi ːɾ ə ˈʔ e ɪ ˈ b ɛ lə Benigni b ɛˈ ni ː nji

11 Applying the Pronunciation Lexicon Specification to ASR & TTS 11 Patrizio Bergallo How does ASR use PLS? SRGS 1.0 Terminator 2: Judgment Day Pluto's Judgement Day PLS 1.0 judgment judgement ˈ d ʒʌ d ʒ.mənt

12 Applying the Pronunciation Lexicon Specification to ASR & TTS 12 Patrizio Bergallo Examples of Use Multiple pronunciations for the same orthography Multiple orthographies Homophones Homographs Acronyms, Abbreviations, etc.

13 Applying the Pronunciation Lexicon Specification to ASR & TTS 13 Patrizio Bergallo Multiple pronunciations for the same orthography Multiple pronunciations are represented by more than one or element Newton ˈ nju ː tən ˈ nu ː tən

14 Applying the Pronunciation Lexicon Specification to ASR & TTS 14 Patrizio Bergallo Multiple orthographies Alternative textual representations for the same word or phrase are represented by more than one inside the same All the pronunciations given within the apply to each and every within the nihongo 日本語 にほんご ɲ ihoŋo

15 Applying the Pronunciation Lexicon Specification to ASR & TTS 15 Patrizio Bergallo Homophones Words with the same pronunciation but different meanings are represented as different lexemes cede si ː d seed si ː d

16 Applying the Pronunciation Lexicon Specification to ASR & TTS 16 Patrizio Bergallo Homographs (1/2) Words with the same spelling but pronounced in different ways are represented using the role attribute of the element This mechanism allows for the referencing of defined taxonomies of word classes (part of speech, meaning, etc.) <lexicon version="1.0“ xmlns:claws=“http://www.example.com/claws7tags” alphabet="x-myorganization-pinyin" xml:lang="zh-CN"> 处 chu3 处 chu4

17 Applying the Pronunciation Lexicon Specification to ASR & TTS 17 Patrizio Bergallo Homographs (2/2) <lexicon uri="http://www.example.com/lexicon.pls“ type="application/pls+xml“ xml:id="mylex"/> 他这个人很不好相 处 。 此 处 不准照相。 SSML 1.1 will support the role attribute Currently PLS doesn’t define/mandate any taxonomy PLS generally defines role values as qualified names (QNames)

18 Applying the Pronunciation Lexicon Specification to ASR & TTS 18 Patrizio Bergallo Acronyms, Abbreviations, etc. Pronunciations expressed as a sequence of other orthographies (acronyms, abbreviations, etc.) are represented by the element W3C World Wide Web Consortium 101 one hundred and one

19 Applying the Pronunciation Lexicon Specification to ASR & TTS 19 Patrizio Bergallo Latest Improvements W3C Last Call Working Draft stage allows public comments to be addressed –Large majority were clarifications –New functionalities were deferred to a future version of PLS specification Major clarifications were about – recursion – Multiple pronunciations Changes are subject to a formal approval by the Working Group Next Steps –PLS 1.0 is very close to Candidate Recommendation stage –SSML 1.1 will provide a more complete support of PLS 1.0

20 Applying the Pronunciation Lexicon Specification to ASR & TTS 20 Patrizio Bergallo recursion Pronunciations of the element contents MUST be generated by the processor, using pronunciations described by the element of any constituent graphemes in the PLS document, and without invoking recursive access to the PLS document on the elements of any constituent graphemes GNU GNU is Not Unix gə ˈ nu ː Unix UNIX a multiplexed information and computing service ˈ ju ː n ɪ ks GNU is pronounced: gə ˈ nu ː is Not ˈ ju ː n ɪ ks

21 Applying the Pronunciation Lexicon Specification to ASR & TTS 21 Patrizio Bergallo Multiple pronunciations (1/2) ASR –If more than one pronunciation for a given is specified, an ASR processor MUST consider each of them as valid pronunciations for the TTS –If more than one pronunciation for a given is specified, a TTS processor MUST use the first one in document order that has the prefer attribute set to " true “ –If none of the pronunciations has prefer set to " true ", the TTS processor MUST use the first one in document order unless the TTS processor is documented as having a method of selecting pronunciations, in which case the processor MUST use any one of the pronunciations

22 Applying the Pronunciation Lexicon Specification to ASR & TTS 22 Patrizio Bergallo Multiple pronunciations (2/2) An ASR processor will recognize both pronunciations, whereas a TTS processor will only use the first one (because it is the first in document order that has prefer set to " true "). lead led li ː d led

23 Applying the Pronunciation Lexicon Specification to ASR & TTS 23 Patrizio Bergallo References PLS 1.0 Second Last Call Working Draft (26 October, 2006) –http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/http://www.w3.org/TR/2006/WD-pronunciation-lexicon-20061026/ Voice Browser Activity Page (VoiceXML, SSML, SRGS, …) –http://www.w3.org/Voice/http://www.w3.org/Voice/ International Phonetic Association –http://www.arts.gla.ac.uk/IPA/http://www.arts.gla.ac.uk/IPA/ VoiceXML Forum –http://www.voicexml.org/http://www.voicexml.org/

24 Applying the Pronunciation Lexicon Specification to ASR & TTS 24 Patrizio Bergallo Final Remarks THANK YOU For more information please –Visit Loquendo’s booth #509 –Keep an eye on: www.loquendo.comwww.loquendo.com –Contact us: patrizio.bergallo@loquendo.com


Download ppt "Applying the Pronunciation Lexicon Specification to ASR & TTS 1 Patrizio Bergallo 1 Monday, August 20, 2007 SpeechTEK ASTS - Advances in Text-to-Speech."

Similar presentations


Ads by Google