Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005.

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005

2 W3C SSML workshop 2-3 Nov 05 - Beijing Overview Introduction to Pronunciation Lexicon Pronunciation Alphabets The PLS language Issues for the workshop

3 W3C SSML workshop 2-3 Nov 05 - Beijing Introduction to Pronunciation Lexicon Specification The PLS spec is about “Pronunciation Lexicon”: –How to pronounce words and phrases –How to deal with the variability of pronunciations by country, region, person, etc. –How to spell abbreviations and acronyms Two main uses: –Speech Synthesis (SSML documents) –Speech Recognition (SRGS grammars) –Other uses are possible (embedded or referenced in other mark-up)

4 W3C SSML workshop 2-3 Nov 05 - Beijing The TTS perspective A TTS engine’s job is to transform an “input text” into speech, this involves a lot of processing, including: –Text normalization –Word pronunciation (lexical stress, phonetic transcription) –Sentence structure (intonation, rhythm) –Sentence level modification in phonetic transcription (co-articulation) –Computation of prosodic parameters –Generation of the acoustic signal SSML documents enable TTS enhancement, acting on several levels of processing through SSML markup elements PLS improves SSML on text normalization and phonetic transcription

5 W3C SSML workshop 2-3 Nov 05 - Beijing An SSML example document This is a simple SSML document: This is an enhancement of the same example: The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni. The title of the movie is: La vita è bella (Life is beautiful), which is directed by Roberto Benigni

6 W3C SSML workshop 2-3 Nov 05 - Beijing An SSML example with PLS This is a simple SSML document that references an external Pronunciation Lexicon: PLS factorizes all the changes in an external document TTS engine loads the PLS document(s) and applies it(them) transparently to the SSML document An application may define contextual PLS documents to be used in different points of the interaction The title of the movie is: "La vita è bella" (Life is beautiful), which is directed by Roberto Benigni.

7 W3C SSML workshop 2-3 Nov 05 - Beijing The ASR perspective An ASR engine’s job is to transform an audio signal into a textual or semantic representation of the meaning of the sentence Using SRGS grammars constrains the sentences to be recognized and improves ASR performance PLS improves ASR performance by allowing multiple pronunciations of words, phrases, abbreviations, text normalization

8 W3C SSML workshop 2-3 Nov 05 - Beijing An SRGS example grammar This is a very simple SRGS grammar: The grammar recognizes sentences like: –“Boston Massachusetts” or “Miami Florida” but also: –“Boston Florida” or “Fargo Massachusetts” <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice"> Boston Miami Fargo Florida North Dakota Massachusetts

9 W3C SSML workshop 2-3 Nov 05 - Beijing An SRGS example with PLS This is a simple SRGS grammar that references an external Pronunciation Lexicon: The grammar allows different pronunciations of words to accommodate many different speakers <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US" version="1.0" root="city_state" mode="voice"> Boston Miami Fargo Florida North Dakota Massachusetts

10 W3C SSML workshop 2-3 Nov 05 - Beijing PLS allows you… to create Pronunciation Lexicons to be used by both ASR and TTS to take into account different usages: –For TTS: to improve reading proper names –For ASR: to give multiple pronunciations –For TTS/ASR: to expand abbreviations and acronyms to exchange Pronunciation Lexicons between different applications (interoperability) to use contextual Pronunciation Lexicons in different points of the application The PLS is a W3C standard language! PLS saves application developers time/money for creating good speech applications!

11 W3C SSML workshop 2-3 Nov 05 - Beijing Phonetic Alphabets To describe the pronunciation of a word/phrase, you need a phonetic alphabet An alphabet contains symbols to represent speech sounds, just like in a dictionary, e.g. Cracked /krakt/ adj. 1 having cracks. 2 (predic.) slang crazy The PLS spec suggests to use either: –a standard pronunciation alphabet, such as IPA (defined by the International Phonetic Association, see: http://www2.arts.gla.ac.uk/IPA/index.html)http://www2.arts.gla.ac.uk/IPA/index.html –other alphabets: SAMPA which is an ASCII-way of encoding IPA and X-SAMPA Pying, JEITA, etc

12 W3C SSML workshop 2-3 Nov 05 - Beijing IPA – Chart IPA was founded in 1886 It is the major international association of phoneticians The IPA alphabet provides symbols making possible the phonemic transcription of all known languages IPA characters can be encoded in Unicode by supplementing ASCII with characters from other ranges, particularly: –IPA extensions (0250–02AF) –Latin Extended-A (0100-017F) See the detailed: http://www.unicode.org/charts http://www.unicode.org/charts

13 W3C SSML workshop 2-3 Nov 05 - Beijing SAMPA – SAM Phonetic Alphabet Developed for phonetic transcription in a EU founded project called Speech Assessment Methods (SAM) It is ASCII based (easy to write). It is an “ASCII-ization” of IPA Recently, Prof. John C. Wells proposed an alphabet called “X-SAMPA”, which encodes all the IPA symbols in ASCII format A few examples: –“thin”IPA: / θɪn /X-SAMPA: / TIn / –“thing”IPA: / θɪŋ /X-SAMPA: / TIN / –“flabbergasted”IPA: / ’fl æ bəgɑːstɪ d/X-SAMPA: / ”fl{b@gA:stID / – “Weltanshauung”IPA: /’ vɛltʔan,ʃaʊʊŋ /X-SAMPA: / ”vElt?an%SaUUN / – en-GB :“vice versa”IPA: / va ɪ sə ’ v ɜ ːsə / X-SAMPA: / vaIs@ “v3:s@ / it-IT :“vice versa” IPA: /’ viʧe ’ vɛrsa / X-SAMPA: / ”vitSe ”vErsa /

14 W3C SSML workshop 2-3 Nov 05 - Beijing Phonetic Alphabets – Issues How to write pronunciation in a reliable and easy way? Problems with fonts, word processors, browsers There are very few tools to help with writing pronunciation and to let you listen to what you have written The standardization process may push the creation of tools and the improvement of the coverage by word processors. Has IPA any uses for Asian languages? Are there standard phonetic alphabets for Asian languages? Such as pinyin, jyutping or jeita? Should they be referenced in a standard way, like “ipa”?

15 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language PLS is an XML language The container element is, attributes: –version (required): "1.0" –xmlns (required): " http://www.w3.org/2005/pronunciation-lexicon " –alphabet (optional): "ipa" (default value) –xml:lang (optional):“ en-US ” or “ zh-CN ” or “ jp ” Example: <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon" alphabet="ipa" xml:lang=“zh-CN"> The current PLS is monolingual!

16 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language - metadata Metadata (annotation of the document for other uses, …) can be of two varieties: – element (for compatibility with other markup, like SRGS and SSML) – element (which contains the annotations either RDF format or other formats) Example of metadata: <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon alphabet="ipa" xml:lang="en-US”> <rdf:RDF xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"http://www.w3.org/1999/02/22-rdf-syntax-ns xmlns:dc = "http://purl.org/dc/elements/1.1/"> <rdf:Descriptionhttp://purl.org/dc/elements/1.1/ rdf:about="" dc:title="Pronunciation lexicon for W3C terms“ dc:description="This lexicon contains common pronunciations for many W3C acronyms and abbreviations, such as I18N, WSDL or WAI" dc:publisher="W3C“ dc:language="en-US“ dc:date="2005-11-29“ dc:rights="Copyright 2002 W3C“ dc:format="application/pls+xml"> The W3C Voice Browser Working Group

17 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The element is the container of a lexicon entry. It is composed of: –One or more elements that indicate the words/phrases to be matched in the input –One or more either or elements that indicate the possible pronunciations or expansions respectively First considerations: –More elements may be present  this means that all of them will match the pronunciations –More elements may be present  this means that several pronunciations are in alternative –A mixture of and elements may be present  there is a preference mechanism to choose the single one for TTS

18 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The element contains CDATA that represents orthographies: –Regional spelling variations e.g. "colour" and "color"; –Free spelling variations e.g. "judgment" and "judgement" –Traditional vs Modern spellings e.g. for example in German it is common to replace "ö" with "oe". –Alternate writing systems, e.g. Japanese uses a mixture of Han ideographs (Kanji), and phonemic spelling systems e.g. Katakana or Hiragana for representing the orthography of a word or phrase <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon xml:lang="jp" alphabet="ipa"> nihongo 日本語にほんご <!– Here you can insert the pronunciation of “nihongo”. in IPA language it could be: " nɪhɒŋɒ " -->v Is an explicit “orthography” attribute useful? Is it redundant?

19 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The elements are contained inside contains CDATA specifying the pronunciation in a given pronunciation alphabet: –An “ alphabet ” attribute may be specified to override the alphabet of the whole lexicon –A “ prefer ” attribute may be present to indicate precedence among pronunciations Example of lexeme for Sepulveda: http://www.w3.org/2005/pronunciation-lexicon Sepulveda sə'pʌlvɪdə

20 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – Other examples Example for more than one pronunciation of the word “huge”: <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon xml:lang=“en-US" alphabet="ipa"> huge hju:ʤ ju:ʤ Example for the Japanese word “nihongo” with different spellings: <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon xml:lang="jp" alphabet="ipa"> nihongo 日本語にほんご nɪhɒŋɒ

21 W3C SSML workshop 2-3 Nov 05 - Beijing The PLS language – The elements are contained inside is used to indicate the pronunciation of an acronym or an abbreviated term in the form of other orthographies. may contain –A “ prefer ” attribute to indicate precedence among pronunciations Both and may occur in a Example of lexeme with both and : <lexicon version="1.0" xmlns=" http://www.w3.org/2005/pronunciation-lexicon " http://www.w3.org/2005/pronunciation-lexicon alphabet="ipa" xml:lang="en"> W3C World Wide Web Consortium

22 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases/Future Issues The current version of PLS can deal with: Multiple Pronunciations for ASR Homographs Abbreviations But it cannot deal with: Homophones Part of speech annotations (and other contextual information) Grouping lexemes and external references  Too challenging tasks to be solved for PLS version 1.0

23 W3C SSML workshop 2-3 Nov 05 - Beijing Issues for the workshop Monolingual lexicon? Orthography attribute: Useful or redundant? Mandate new phonetic alphabets?

24 W3C SSML workshop 2-3 Nov 05 - Beijing Quick demo of SSML+PLS Mobile device (with embedded TTS) By GPRS, the device connects to a server: –It donwloads News for news site (RSS) –Transformation in SSML –Returned to the mobile device The device then: –Shows the news on the screen –Read the SSML document (which includes a lexicon) using the TTS engine

25 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases – Multiple pronunciations More than one pronunciation for a word (very common for ASR) Example of two pronunciations for the word “Newton”: <lexicon version="1.0“ xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon alphabet="ipa" xml:lang="en"> Newton nju:'tən nu:'tən

26 W3C SSML workshop 2-3 Nov 05 - Beijing Use Cases – Multiple Orthographies More than one orthography for a word (common for ASR and TTS) Example of two orthographies for colour/color: <lexicon version="1.0" xmlns="http://www.w3.org/2005/pronunciation-lexicon"http://www.w3.org/2005/pronunciation-lexicon alphabet="ipa" xml:lang="en"> color colour 'kʌlə

27 W3C SSML workshop 2-3 Nov 05 - Beijing Final Remarks The usage of PLS: –Simplifies the development of a speech application –Improves the performance of speech recognition (in a standard way) –Enhances TTS output A standard language for PLS enables the exchange of pronunciations between applications

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005.

Similar presentations

Presentation on theme: "Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005.

Similar presentations

Presentation on theme: "Pronunciation Lexicon Background Paolo Baggia, Loquendo W3C SSML Workshop Beijing – 2-3 Nov 2005."— Presentation transcript:

Similar presentations

About project

Feedback