XML Brief Introduction:- 1) Understanding XML Documents:- An XML document is comprised of one or more named elements organized into a nested hierarchy. An element is an opening tag, some data, and a closing tag. A tag is an element name preceded by a less-than symbol (<) and followed by a greater-than (>) symbol. For any given element, the name of the opening tag must match that of the closing tag. A closing tag is identical to an opening tag except that the less-than symbol is immediately followed by a forward-slash (/). Tag names are case-sensitive. Welcome to IVR 2) Using Comments:- A comment begins with the combination of characters "". Comments may appear as a child of any element in an XML document. They can also appear before or after the root element. A comment may span multiple lines but cannot be nested. Ex:
Reserved characters and CDATA sections:- XML reserves certain characters, including less-than ( ), and ampersand (&). To express these characters in your document data, use the equivalent character entity: Less-than (<)< greater than (>)> ampersand (&)&
VoiceXML is a markup language derived from XML for writing telephone-based speech applications. Users call applications by telephone. They listen to spoken instructions and questions instead of viewing a screen display; they provide input using the spoken word and the touchtone keypad instead of entering information with a keyboard or mouse. VoiceXML:- Just as a web browser renders HTML documents visually, a VoiceXML interpreter renders VoiceXML documents audibly. You can think of the VoiceXML interpreter as a telephone-based voice browser. As with HTML documents, VoiceXML documents have web URIs and can be located on any web server. Yet a standard web browser runs locally on your machine, whereas the VoiceXML interpreter is run remotely--at the VoiceXML hosting site, for example. And you use your telephone to access the VoiceXML interpreter.
Environment:- In order to support a telephone interface, the VoiceXML interpreter runs within an execution environment that includes a telephony component, a text-to-speech (TTS) speech-synthesis component, and a speech-recognition component. The VoiceXML interpreter transparently interacts with these infrastructure components as needed. For example: Text strings in output elements are rendered using TTS. Connection issues (picking up the incoming call, detecting a hang-up, transferring a call) are handled by the telephony component. Listening to spoken input from the user and identifying its meaning is handled by the speech-recognition component.
VoiceXML Interpreter:- VoiceXML language as any XML languages needs an interpreter that interprets VoiceXML commands. This interpreter is now the ground of dialog management in interactive voice response systems. The main advantage of interpreted programming language is that its code need not be compiled. This is especially useful by dynamic generation of VoiceXML pages, when we work with often changing content.
Application structure:- A VoiceXML application consists of a set of VoiceXML documents, and each VoiceXML document contains one or more dialogs describing a specific interaction with the user. Dialogs may present the user with information or prompt the user to provide information, and when complete, they can redirect the flow of control to another dialog in that document, to a dialog in another document in the same application, or to a dialog in another application entirely. At the root of every VoiceXML document is a root element, the vxml element. This element should contain one or more elements representing dialogs. VoiceXML 2.x provides two types of dialogs: 1)form and 2)menu.
Tags and Elements:- VoiceXML uses markup tags and plain text. A tag is a keyword enclosed by the angle bracket characters ( ). A tag may have attributes inside the angle brackets. Each attribute consists of a name and a value, separated by an equal sign (=) and the value must be enclosed in quotes. Tags occur in pairs; corresponding to the start tag is the end tag. Between the start and end tag, other tags and text may appear. Everything from the start tag to the end tag, is called an element. For example, the following three lines constitute a prompt element: What is your telephone number? If there are no other tags or text between the start and end tag, a syntactic shorthand is permitted. You can precede the closing angle bracket ( > ) of the start tag with a slash ( / ) and omit the end tag. For example, instead of writing a value element as: you can use the shorthand notation:
If one element contains another, the containing element is called the parent element of the contained element. The contained element is called a child element of its containing element. The parent element may also be called a container. Terminologies and Concepts:-- Documents Applications Dialogs
Documents:- An executable VoiceXML file is called a document. The VoiceXML interpreter loads a document file to execute it. Every VoiceXML document must start with header information that conforms to the XML standard: indicates that the document is an XML document. This tag is required. The first 4 characters of any XML file (including a VoiceXML document) must be:
"name": "Documents:- An executable VoiceXML file is called a document.",
"description": "The VoiceXML interpreter loads a document file to execute it. Every VoiceXML document must start with header information that conforms to the XML standard: indicates that the document is an XML document. This tag is required. The first 4 characters of any XML file (including a VoiceXML document) must be:
Applications:- A VoiceXML application consists of one or more documents. Any multidocument application has a single application root document. Each document in an application identifies the application root document with the application attribute of the tag: Whenever the interpreter executes a document, it loads that document. If the document specifies an application root document, that document is also loaded. Here “myAppRoot.vxml” is a application root document
Dialogs :- Within a document, a user interacts with dialogs, in which the application produces auditory output, typically asking for information. The user provides input by speaking or pressing keys on the telephone. User speech must be recognized and its meaning interpreted. The telephone key input is interpreted as a sequence of tones in the Dual Tone Multifrequency (DTMF) signalling system. VoiceXML has two kinds of dialogs: 1) forms, and 2) menus. The main elements of a document (within the element) are forms. VoiceXML forms are analogous to web forms; you use them to collect input from the user. A form interacts with the user to fill in a number of fields. Every field has an associated variable, called its input-item variable, or just input variable. Initially, the variable has a value of undefined. It is filled in when the speech-recognition engine recognizes a valid response in a user utterance. The VoiceXML tag defines a form and the tag defines a field in a form. You specify the name of the input variable with the name attribute of the tag.
Menus:- A menu presents the user with a number of choices; it transitions to a different dialog based on the user's selection. The tag defines a menu; each choice consists of a element. The next attribute of a element specifies the destination dialog to which the interpreter should transition when the user selects that choice. If a or element is to be the destination of a transition, the id attribute for the destination dialog should specify a unique identifier. For example, the following menu consists of three choices. Please choose one of local movies local radio stations national TV listings
The prompt in this menu includes an tag. This tag lets you set up a template for an automatically generated description of the choices. By default, the template simply lists all the choices. In the previous example, the prompt is "Please choose one of local movies, local radio stations, national TV listings." The destination dialog specified by the next attribute can be in the current document or in a different document: If the user says "local movies", the interpreter transitions to the dialog named MovieForm in the same document. If the user says "local radio stations", the interpreter transitions to the dialog named RadioForm in the document localBroadcast.vxml. If the user says "national TV listings", the interpreter transitions to the first dialog in the document tv.vxml in the national TV web site.
Handling Events:- An event is a significant occurrence in a system or application. In a desktop application, events occur as a result of a user action such as a key press or the clicking of a mouse button. Understanding event types: VoiceXML places events in two categories - 1) pre-defined events, and 2) application-defined events. Pre-defined events are events thrown by the Platform and are divided into two subcategories - normal events and error events. Application-defined events are custom events that are both thrown and caught by the voice application.
1.1 Pre-defined normal events: The following table describes the pre-defined normal events defined in the VoiceXML 2.0 specification. help Thrown when the user requests help. noinput Thrown within an interactive call state when the user has said nothing within the timeout period. nomatch Thrown when the user utters something outside the active grammars. connection.disconnect.hangup Thrown when the user disconnects or is disconnected. Post-hang-up processing is limited to 5 seconds. If you need to do extensive processing, your event handler should execute a submit to your Web server to perform any additional processing. Prior to Revision 1, this event was telephone.disconnect.hangup. cancel Thrown when the user has requested the current prompt be canceled. exit Thrown when the user has asked to exit. maxspeechtimeout Thrown when the user input exceeded the 'maxspeechtimeout' property. connection.disconnect.transfer Thrown when the user has been transferred unconditionally to another line and will not return. Prior to Revision 1, this event was telephone.disconnect.transfer.
1.2. Pre-defined error events: The following table describes pre-defined error events defined in the VoiceXML 2.0 specification: error.semantic Thrown when the VoiceXML interpreter detects a run-time error. error.badfetch Thrown when an HTTP request for a VoiceXML document, external grammar, or external script fails. If the VoiceXML document that required the fetch contains semantic errors, the badfetch event is thrown to the container element that attempted to navigate to the document. Because the Tellme VoiceXML interpreter loads scripts on demand, a badfetch event that results from the fetch of a script is not thrown until the script element is encountered by the VoiceXML interpreter. error.noauthorization Thrown when the user is not authorized to perform the requested operation. error.unsupported.format Thrown when the requested resource is in an unsupported format. error.unsupported.language Thrown when the Platform doesn't support the specified language. error.unsupported.element Thrown when the Platform doesn't support the given VoiceXML element.
Handling events:- To handle events, use the catch element and set the event attribute to the name of the desired event. For example, the following nomatch handler queues some audio and then listens for user input: I'm sorry. I didn't get that. I'm sorry, but an error has occurred. Application error:
Understanding event handler selection In the previous section you learned that the VoiceXML interpreter executes an event handler whose name matches that of the event that was thrown. This is an oversimplified explanation. The interpreter uses a number of additional criteria when selecting an event handler. These include: * the location of the handlers in the document hierarchy, also known as scope. * the location of the handlers relative to one another within a single container, also known as document source order. * the value of the count attribute of the event handlers and the number of times the event has fired. * the value of the cond attribute of the event handlers
Understanding event scoping: In the introductory section on handling events, you learned to declare a handler within a field element to catch events such as a noinput or a nomatch. The field defines an anonymous scope, and the event handler has access to all the variables, scripts, and data declared within that scope in addition to those scopes defined by containing elements including the form also known as dialog scope, the vxml element also known as document scope, and the application root document also known as application scope. In addition to defining variables, scripts, and data at dialog, document, and application scope, you can also define event handlers at these scopes. When the VoiceXML interpreter begins the event handler selection process, it considers the handlers defined within the current anonymous scope as well as the outer containing scopes. The following diagram illustrates the scopes the VoiceXML interpreter searches for an appropriate event handler when an event is thrown from within a field:
Building VoiceXML applications:- A voice application is a collection of one or more VoiceXML documents. A VoiceXML document is composed of one or more dialogs. A single VoiceXML document serves as the application entrypoint. This is the VoiceXML document that the Tellme VoiceXML interpreter fetches when a customer dials the telephone number associated with your voice application. Upon execution of that document, the interpreter fetches dependencies including grammars, scripts, recorded audio, XML data, and additional VoiceXML documents as required by your application. The documents in a voice application share a common application root document. An application root document is a VoiceXML document that can contain variable declarations, scripts, links, grammars, event handlers, and properties that are available throughout the application.
The following diagram depicts a simple voice application. The application consists of a main document, "doc1" that serves as the application entrypoint, an application root document, "app_root", that provides shared application resources, and two additional VoiceXML documents, "doc2" and "doc3". * "doc1" references the external grammar located in the file "gram1". * "doc2" refers to the external grammars located in the files "gram2" and "gram3". * "doc3" refers to the external grammar located in the file "gram4". * The application flows from "doc1" to "doc2" and from "doc2" to "doc3".
Using Subdialogs:- A subdialog is a self-contained, reusable VoiceXML component. To maintain encapsulation, subdialogs do not have access to the execution context of their user. The execution context includes the variables defined in the calling dialog as well as the resources defined in the application root document pointed to by the VoiceXML document containing the calling dialog. The subdialog element allows you to call a self-contained VoiceXML dialog. In addition to passing parameters to the called dialog via param elements, the called dialog can return values to the caller via the return element. Whether or not you return data from a subdialog, you should always include the return element in the dialog called by the subdialog element to properly return to the execution context of the caller. Just before the subdialog is executed the VoiceXML interpreter suspends the execution context of the caller including all event handlers, grammars, links, scripts, and variables in any scope (dialog, document, application). The VoiceXML interpreter creates a new execution context for the subdialog while it executes. This context is destroyed when the subdialog returns.