Presentation on theme: "Voice XML Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright."— Presentation transcript:
Voice XML Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright Team 1 Matt Ganis, Jonathan Hill, Henry Wong Anne I. Mannette-Wright
April 8, 2006Team 1 VoiceXML Agenda History of Voice Applications and Voice XML Related Voice Type Languages Advantages of Voice XML Architecture of VoiceXML Paper 1 Paper 2 Paper 3 Demonstration Voice XML 2.0 Differences between Voice XML 1.0 and 2.0 The Future – Voice XML 2.1
April 8, 2006Team 1 VoiceXML History of Voice Applications Voice technologies emerged in the 1990s : –Automatic Speech Recognition (ASR) Small vocabulary and speech recognition problems were solved –Text-to-Speech Systems Can generate speech responses on the fly –Interactive Voice Response (IVR) applications
April 8, 2006Team 1 VoiceXML History of Voice Applications IVRs became programmable but programmable IVRs are: –Difficult to program (call scripting is often vendor specific) so each vendor had to “reinvent wheel” –Did not allow for the easy movement of an application from one IVR to another due to the proprietary nature of IVRs
April 8, 2006Team 1 VoiceXML History of Voice XML 1995: AT&T started work on Phone Markup Language (PML) Oct.1998: Motorola developed VoxML (Voice Markup Language) Feb.1999: IBM developed SpeechML technology Mar.1999: VoiceXML Forum was formed by IBM, AT&T, Lucent, and Motorola –Mission was to design a standard dialog design language that developers could use to build conversational applications March 2000: VoiceXML Forum releases VoiceXML 1.0 to the general public May 2000: accepted by W3C
April 8, 2006Team 1 VoiceXML W3C Speech Interface Framework From McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from
April 8, 2006Team 1 VoiceXML Related Voice Type Languages Related to VoiceXML –Grammar XML (grXML) Provides speech grammars used by speech recognition engines –Speech Synthesis Markup Language (SSML) SSML specification is based upon JSML(J Speech Markup Language) and JSGF (J Speech Grammar Format) specifications, which are owned by Sun.JSMLJSGF Introduced in September 2004 is currently a W3C standard at Version 1.0 Standardized way of specifying how text is rendered as speech and includes tags for pronunciation, tone, inflection, etc. Often embedded in VoiceXML scripts to drive interactive telephony systems.
April 8, 2006Team 1 VoiceXML Related Voice Type Languages Related to VoiceXML (Continued) –Call Control XML (CCXML) W3C standard markup language for controlling telephony and telephony equipment; currently at Version 1.0 Performs tasks such as setting up conference calls, transferring incoming calls, etc. Works hand-in-hand with VoiceXML
April 8, 2006Team 1 VoiceXML Architecture of VoiceXML From: eXtensible Markup Language (VoiceXML™) version 1.0
April 8, 2006Team 1 VoiceXML Advantages of Voice XML VoiceXML is a markup language that: –Minimizes client/server interactions by specifying multiple interactions per document. –Shields application authors from low-level, and platform-specific details. –Separates user interaction code (in VoiceXML) from service logic (e.g. CGI scripts). –Promotes service portability across implementation platforms. VoiceXML is a common language for content providers, tool providers, and platform providers. –Is easy to use for simple interactions, and yet provides language features to support complex dialogs.
April 8, 2006Team 1 VoiceXML Paper 1 Authored by Bruce Lucas: “ VoiceXML for Web-based Distributed Conversational Applications” Presents an introduction to VoiceXML Comparison to HTML Support for Natural Dialogue
April 8, 2006Team 1 VoiceXML Paper 1 VoiceXML is an XML application which results in the following benefits: –Allows the reuse and easy retooling of existing tools for creating, transforming, and parsing XML documents –Allows VoiceXML to make use of other complementary XML- based standards. Example: Java Speech Markup Language for speech synthesis A form is VoiceXML’s basic dialogue unit –Contains a set of inputs (fields) –Specifies what to do with a set of fields after data is collected A field includes a prompt and a specification of what the user is allowed to say
April 8, 2006Team 1 VoiceXML Paper 1 - VoiceXML Code Example Say one of: Sports scores Weather information Log in Please say your complete phone number Please say your PIN code
April 8, 2006Team 1 VoiceXML Paper 1 VoiceXML includes support for common field types including numbers, digits, phone, date and time AND for user-specified fields using grammars What would you like to drink? coffee | tea | orange juice | milk | nothing What sandwich would you like?
April 8, 2006Team 1 VoiceXML Paper 1 – The Distributed Model VoiceXML provides support for advanced features such as: –Local validation and processing –Audio playback and recording –Support for context specific and taped help and reusable sub dialogues From: Lucas, Bruce, “VoiceXML for Web-Based Distributed Conversational Applications, Communications of the ACM, Vol.43, No.9, September 2000.
April 8, 2006Team 1 VoiceXML Paper 1 – VoiceXML compared with HTML An HTML document is a single unit specified by a URI and presented to the user all at once –A VoiceXML document contains a number of dialogue units (menus or forms) presented sequentially An HTML document has no markup language to identify distinct units –A VoiceXML document is structured to reflect the sequential nature of the voice medium An HTML document is like one single dialogue –A VoiceXML document requires dialogue elements so they can be presented one at a time. –VoiceXML has application logic for sequencing among dialogue units
April 8, 2006Team 1 VoiceXML Paper 1 – Support for Natural Dialogue VoiceXML supports “directed” and “mixed initiative” dialogues –“ directed” dialogues: the computer directs the conversation at each step by prompting the user for the next piece of information Example: C: On what date do you wish to fly? H: May 6th –“mixed initiative” dialogues: each participant can take the initiative in leading a conversation. VoiceXML does this by allowing input grammars to be specified at the form level C: How can I help you? H: I’d like to fly from New York on May 8 th C: Where would you like to fly to?
April 8, 2006Team 1 VoiceXML Paper 2 Concepts of Programming by Voice –Motivated by need to program without typing, therefore preventing repetitive stress injuries (RPI), a common injury among those who spend long hours typing –Voice-activated software for the disabled is a prime motivator in development –Paper proposes a system that creates an environment for voice-activated programming
April 8, 2006Team 1 VoiceXML Paper 2 Costs of such software has fallen dramatically; –$7500 in 1998 –$100 in 2005 –Products Include; –Dragon Naturally Speaking –IBM Via Voice –Hausbie Voice Express
April 8, 2006Team 1 VoiceXML Paper 2 Authors developed a generator called VocalGenerator using Dragon Naturally Speaking with MS Visual C++ Input = a context-free grammar compatible with most programming languages Output = An environment in which a voice recognition, syntax-directed program can be written by voice input alone Allows for better recognition and selection of sections of code
April 8, 2006Team 1 VoiceXML Paper 2 Evaluation of the product –Programming is faster using a Syntax directed voice recognition system than a natural language DVR –A programmer suffering from repetitive stress injuries will be able to program at a speed sufficient to ‘maintain competitive employment’
April 8, 2006Team 1 VoiceXML Paper 3 Paper 3 focuses on ‘V-commerce’ – through a survey of Voice XML applications for business communication Looks at the inherent risks in human to human communication and the challenges these pose to human to computer communication Examines speech recognition Seeks to leverage the predominance of telephone usage globally
April 8, 2006Team 1 VoiceXML Paper 3 Utilizes the W3C Voice Browser Working Group design criteria including; –Consistency –Interoperability –Generality –Internationalization –Generalization and Readability –Implementation
April 8, 2006Team 1 VoiceXML Paper 3 Looks at the potential for Voice-activated Web interface Looks at a transactional communication method with six phases; –Sender has an idea –Sender transforms the idea into a message –Sender transmits a message –Receiver gets the message –Receiver interprets the message –Receiver reacts and sends feedback
April 8, 2006Team 1 VoiceXML Paper 3 Challenges Include –Unproven business models –Business Process Change Requirements –Channel conflicts –Technology hurdles –Legal issues –Security & privacy
April 8, 2006Team 1 VoiceXML Paper 3 Conclusions –Speech is natural, flexible and efficient –Voice technology will improve –Voice recognition capabilities will improve –The intersection of voice recognition, telecom and Web technologies may lead to a large market for products that take advantage of this intersection
April 8, 2006Team 1 VoiceXML Demo Using TellMe Studio (http://studio.tellme.com) TellMe Studio provides you with resources to: –Build and test your own Internet-powered "phone sites" with nothing but your Web browser and an ordinary telephone in the following ways:Build Type VoiceXML directly into an area called the “Scratchpad” and then call the phone number to preview the code Publish the VoiceXML and audio files on a publically accessible Web server, point Studio at the URL for your application's "home page", and once again call the Studio phone number to preview the application –Browse and leverage an extensive library of sample code, grammars, audio, and VoiceXML documentationcode grammarsaudioVoiceXML documentation –Participate in the Voice Web development community through open newsgroupsnewsgroups
April 8, 2006Team 1 VoiceXML Demo (Continued) This demo – Drink Recipes I - will use one of the “prebuilt” VoiceXML scripts available from the TellMe Studio Code Library This version of Drink Recipes –asks the caller for a drink name –in response, plays back the drink's ingredients list and mixing instructions. –demonstrates the use of large grammars and how to create data-driven applications.
April 8, 2006Team 1 VoiceXML VoiceXML 2.0 From: McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from
April 8, 2006Team 1 VoiceXML Differences Between VoiceXML 2.0 Differences between VoiceXML 1.0 and 2.0: –Interoperability –Functional Completeness –Clarity
April 8, 2006Team 1 VoiceXML VoiceXML 2.0 Interoperability: VoiceXML 2.0 contains the following new formats that guarantee developers that their applications run on any VoiceXML platform conforming to the VoiceXML 2.0 specification: –input: XML Format of the Speech Recognition Grammar Specification for speech and DTMF input; VoiceXML 1.0 did not require any particular speech grammar format –output: Speech Synthesis Markup Language (SSML) is used for text-to-speech and audio output; VoiceXML 1.0 did not use SSML and its speech markup elements are not supported in Voice XML 2.0
April 8, 2006Team 1 VoiceXML VoiceXML 2.0 Interoperability: (Continued) –protocol: the HTTP protocol for fetching documents and resources is supported. Voice XML 1.0 did not require support for HTTP –audio: audio platforms recommended for support in VoiceXML 1.0 are now required in VoiceXML 1.0
April 8, 2006Team 1 VoiceXML VoiceXML 2.0 Functional Completeness: New elements, attributes and variables have been added in VoiceXML 2.0 that enable developers to ensure that key aspects of the cycle of generating system output, interpreting user input and transitioning from one dialog to another is described. NOTE: VoiceXML 1.0 contained “gaps” for example: when prompts were played to the user Some of the new/enhanced elements, variables and support include: –application.lastresult$ variable: provides info about last recognition in the application – element: generates a debug message – and elements: enhanced to provide more info – element: enhanced with an “expr” attribute – : enhanced with “accept” attribute –Enhanced support for greater control over universal grammars
April 8, 2006Team 1 VoiceXML VoiceXML 2.0 Clarity: Voice XML 2.0 provides a clear description and interpretation of ALL elements (and their attributes), how they interact with one another, and their expected behavior. NOTE: VoiceXML 1.0 contains omissions and contradictions in this respect Some clarification changes include: –Subdialogs: description clarified –Root and Leaf document definitions explicitly defined –Prompt queueing and input collection: relationship between these two clarified –Relationship between VoiceXML 2.0 and ECMAScript variables clarified –VoiceXML 2.0 clarifies conformance between VoiceXML documents and VoiceXML processors – Alignment of VoiceXML 2.0 with Speech Grammar and Speech Synthesis specifications
April 8, 2006Team 1 VoiceXML VoiceXML 2.1 Voice XML 2.1was released on June 13, 2005 by the W3C as a “candidate” recommendation Voice XML 2.1 proposes 8 enhancements to VoiceXML 2.0 as follows: –Referencing grammars dynamically –Referencing scripts dynamically –Using to detect Barge-in during prompt playback –Using to fetch XML without requiring a dialog transfer –Concatenating prompts dynamically using. –Recording user utterances while attempting recognition –Adding namelist to –Adding type to
April 8, 2006Team 1 VoiceXML References 1.Ali, Sanwar, Albohali, Mohamed, Wibowo, Kustim, “VoiceXML for Business Applications: A Survey”, First Annual ABIT Conference, May 3-5, 2001, Pittsburg, Pennsylvania. 2.Arnold, Stephen A., Mark, Leo and Goldthwaite, John, “Programming by Voice, VocalProgramming”, ASSETS’00, November 13-15, Arlington, Virginia 3.Lucas, Bruce, “VoiceXML for Web-based Distributed Conversational Applications”, Communications of the ACM, September 2000, Vol.43, No.9, pp http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 1.0} 5.http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.0) 6.http://www.w3.org/TR/voicexml/Voice eXtensible Markup Language (VoiceXML version 2.1) 7.https://studio.tellme.com/vxml2/ovw/migrating21.htmlhttps://studio.tellme.com/vxml2/ovw/migrating21.html 8.http://www.voicexmlreview.org/Dec2001/features/inside-full.htmlhttp://www.voicexmlreview.org/Dec2001/features/inside-full.html 9.McGashan, Dr. Scott, “VoiceXML 2.0 from the Inside”, retrieved from