Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal.

Similar presentations


Presentation on theme: "© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal."— Presentation transcript:

1 © 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal Application

2 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application2 Introduction - need Design a simple multimodal architecture Architecture supports all possible kinds of multimodal applications starting from simple form filling to Interactive movie including animation. Small required resources - runs on PDA and on Internet Use open standards when possible No compromises in multimodality - let the user freely change between voice (VUI) and GUI Simple and fast development IBM ViaVoice

3 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application3 Key Components - approach IBM Embedded ViaVoice linklink Embedded VoiceXML Browser (EVB) - research prototype Standard HTML browser – Internet Explorer or Firefox The Adobe Flash Player (XML) protocol which enables the control of the browser by the external application

4 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application4 Embedded ViaVoice overview Embedded ViaVoice® delivers IBM speech technology to mobile devices and automobile components. Robust speech-recognition with low error rate and text-to-speech SLM and action classification supporting freeform commands – no need for user’s manual Embedded grammars or large lists of over 100 000 words N-best, confidence score, out of vocabulary detection Speaker and noisy environment adaptation Push to activate button, automatic gain control, automatic end of utterance detection, transient noise detection, Broad range of languages Eclipse based easy-to-use developer toolkit C/C++ highly portable, scalable, small footprint, low CPU MIPS code. IBM provides porting, integration, testing and consulting services, along with customized development workshops IBM ViaVoice

5 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application5 IBM Embedded VoiceXML Browser overview Small, fast, and portable Embedded VoiceXML Browser (EVB) VoiceXML 2.0 compliant. Written in plain C++ (no templates, etc.) Compact and portable code. Targeted to small portable devices - PDA, handhelds, set-top boxes, etc. Runs on top of the IBM's Embedded Speech Engine and TTS. Ported to Win32, WinCE (iPAQ), and Linux. Runs as a viewer, VoiceXML snippets are pushed to the EVB EVB

6 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application6 Flash Player - overview The Adobe Flash Player is a widely distributed multimedia and application player created and distributed by Macromedia (a division of Adobe Systems). Flash Player runs SWF files that can be created by the Adobe Flash authoring tool, by Adobe Flex or by a number of other Macromedia and third party tools.MacromediaAdobe SystemsSWFAdobe FlashAdobe FlexMacromedia Flash Player has support for an embedded scripting language called ActionScript (AS), which is based on ECMAScript. ActionScript matured from a script without variables to one that supports object-oriented code.ActionScriptECMAScript

7 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application7 HTML Browsers - overview HTML Browser MS IE 6, IE 7 Firefox Browsers support add-ons

8 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application8 PDA architecture EVB GUI – Adobe Flash Player VUI – Embedded VoiceXML Browser – viewer mode Application control ActionScript ActionScripts synchronizes GUI and VUI and generates: VoiceXML snippets of code, Dynamic grammars, grammars, prompts (links) All other dialog parameters Result processing (n-best, disambiguation, similarity, OOV,...)

9 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application9 Internet Extensions EVB Life-Cycle Manager Add-on starting, initializing, running shutting down the browser prevent multiply VXML browsers running at the same time version policy mechanism providing new version notification The Security Server permits to open a socket in a different domain. Communicate with EVB Life Cycle Manager Security Server

10 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application10 Internet Architecture Life cycle manager Security server EVB Add-ons Browser Client Internet

11 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application11 Sample application - Literacy Tutor IBM, Corporate Citizenship & Corporate Affairs Project goals Use speech recognition technology - over the web - to help children and adults improve their literacy skills Value to customer Gain literacy skills through practice and positive reinforcement Improve pronunciation in a private setting Interaction with tutor character introduces ‘fun’ and increases computer skills Web = Anywhere/anytime access:  Can resume where left off  Can share progress with family  Build and share books on the web www.readingcompanion.org

12 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application12 Home page

13 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application13 Functionality Practice Reading – main application Flash application that uses EVB+EVV to decode speech Flash animates a tutor character that interacts with the reader Reporting – performance reports for teachers indicating strengths as well as problem areas for students Book Library – add/remove books from classroom, rate books, book browser Classroom Management – add/delete students, adjust reading level, add/delete classrooms as well as teachers and schools Book Authoring – separate tool to author new books

14 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application14 Bookshelf

15 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application15 Children’s book/character

16 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application16 Adult book/character

17 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application17 Student Performance

18 IBM, VTS, Czech Republic, Prague © 2007 IBM Corporation SpeechTEK, August 21, 2007 Architecture for Web Multimodal Application18 Reading Companion - summary We currently have more than 200 schools and not-for-profit organizations participating in the grant program, involving more than 11,000 users (children and adults) in 9 countries, as follows: Canada, United States, Spain, United Kingdom, Ireland, South Africa, Mexico, Venezuela, India Community relations managers are reviewing proposals from prospective organizations since we hope to expand the program this year to 100 more sites. Market value: US$10,000 per site (regardless of number of users)

19 © 2007 IBM Corporation SpeechTEK, August 21, 2007 Thank You!


Download ppt "© 2007 IBM Corporation SpeechTEK, August 21, 2007 Jan Sedivy IBM, Voice Technologies and Systems, Czech Republic, Prague Architecture for Web Multimodal."

Similar presentations


Ads by Google