SSML extensions for multi-language usage Davide Bonardo W3C Workshop on Internationalizing SSML Crete, 30-31 May 2006
2 About Loquendo R&D of speech technology Over 30 years experience (from CSELT laboratories) Technologies: –TTS (text to speech) –ASR (automatic speech recognition) & SV (Speaker Verification) Solutions: –Easy integration of speech technologies –Speech servers (MRCPv1 & v2 protocols) –Speech platforms (VoiceXML & CCXML interpreters) –Embedded solutions (for many OS and devices)
3 Ideas for SSML extensions element –Extension of the values for the “ interpret-as ” attribute New element –
4 Proposal 1: extension (1/3) Problem: –How to interpret a part of an input text –Different contexts of dialog require different interpretations –The interpretation could be language dependent Many contexts could be defined: sms, e-mails, news, application for rescue operations, … The TTS engines may use context information to activate the best configuration for: –reading acronyms –abbreviation expansions –using customized prosodic phrasing –activating a special reading style
5 Proposal 1: extension (2/3) Proposal: To extend the “ interpret-as ” attribute with new values, for instance: –sms –e-mail –news –banking –navigation –…
6 Proposal 1: extension (3/3) Examples I call you asap. I call you asap Mtfbwu
7 Proposal 2: New element (1/3) Problem 1: the activation of the correct language knowledge at the specific point of the text “xml:lang” attribute is currently available in,, and elements The behavior for the engine could be different: –In the root element, “xml:lang” defines the language of the whole document, but for the engine it involves the selection of a voice –In the element, it is an important recommendation in order to load the correct voice –In the and elements, it is mainly a language information and the engine, if able to do this, can use the same voice but a different language knowledge (e.g. phonetic mapping) Problem 2: it could be necessary to specify a language change for a text unit smaller than a sentence.
8 Proposal 2: New element (2/3) Proposal: To introduce a new element To extend the use of “xml:lang” attribute to the element Advantages: It is a generic element It is extensible –Without attributes, it could be used to give information on the segmentation, where needed. –With other attributes, it could specify new information for the token (i.e. part of speech)
9 Proposal 2: New element (3/3) Examples The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella": director, co-writer, and star. The movie is the product of Italian comic sensation Roberto Benigni, who wore three hats for "La vita è bella" : director, co-writer, and star.
10 Conclusions Proposal 1: –To increase the number of “interpret-as” values with the identification of new context of speech Proposal 2: –To introduce a new element to define some specific information (i.e. the language) for a single word, or phrase and so on.