Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier.

Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier

 M.Cernak, A.Sannier,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002.  Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14-17, 2003, pp. 70-83. References This report discuss 2 implementations of Speech Interface to Virtual Reality Applications.

Agenda  Introduction  Paper I  Paper II  Conclusion  System design Discussion

Introduction  Both papers are newly published.(2002,2003)  These 2 papers address technical details of Speech-VR integration.\  The 2 nd paper take more modern approach.  Both of them use similar architecture.(and are also similar to ours!) Ex:Choosing VRML + Java Speech API platform and encountered serveral difficult problems such as java security constraint and were force to use a “brwoser as an application ” instead of “browser as an applet”

Paper I  M.Cernak, A.Sannier,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002.

Purposes of this paper  Describe an approach to control VR applications using multimodal command speech interface (CSI)based on dialog modeling.  Used to imporve the usability of VRAC’s C6. VRAC : Virtual Reality Applications Center C6 is a Virtual Reality System developed by VRAC.

Multimodal Interaction  U :MoleBio  S :Yes  U :(Targeting the atom 512 by mouse)  U :Go There !  S :OK (goto Atom number 512 ). U: User, S: System Command Addressing,used to trigger system start to record user’s voice for recognition.

System Architecture Dialog Management and Speech facilitiesVR System

System Architecture  VR : VRAC’s C6  TTS : Festival  SR : CSLU Toolkit  Platform : Windows OS on PII 400

Three Main Components(1)  Speech Synthesis (TTS) : Festival.

Three Main Components(2)  CSLU Toolkit :Dialog Modeling, Speech Recognition and Nature Language Processing.  CSLU was implemented in C and Tcl/tk, developed by OGI (Oregon Graduate Institute ) CSLU (Center of Spoken Language Understanding)

Three Main Components(3)  Communication Bridge to VR application.  To Integrate CSLU(Speech) and C6(VR).

How to Integrate CSLU and C6  Initial Attempt : CORBA C6 support CORBA. Try to use “Combat” as tcl extension as CORBA Client but failed. Try to use “Tcl Blend”: -Tck->Java->CORBA->C6 (efficient problems) Result : use TCP socket.

Natural Language Processing  Instead of using standard JSGF, the authors use a custom grammar and wrote a specific parser to evaluate it.  Very similar to JSGF.  We will not discuss the custom grammar in detail here.

SCI Test Environment  A RAD (GUI) tool that help developers to quickly build the dialog flow.

Paper I Conclusion  Major advantage of this system is quick deployment.  The problematic area is the Speech Recognition Accuracy(provided by CSLU) was poor.  US Navy also developed a Speech Inteface to VR System, they will imporved the interaction with VR in terms of their method.

Future Work  Change TTS and SR to IBM ViaVoice. Support JSAPI(Java Speech API) Java is easier to communicate with C6 via CORBA.

Paper II  Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14- 17, 2003, pp. 70-83.

Introduction  This paper intruduce 2 systems which help newly-aboard crews of US Navy ships to be familiar with their environment quickly. User : Tell me where is Rom 101 !

Motivation  Architects of US Navy Ships heavily use CAD tools to design ship models.  CAD file can be transferred to 3D model format with little effort.  Accroding to author’s previous research,this Virtual Envirionment did shorten crews’ learning time.

Systems introduced  2 Systems MSFT(Multimodal Ship Familiarization Tool) ISFS(Interactive Ship Familiarization System)  ISFS is a recent transition fo MSFT.

System Architecture:MSFT Run as different process

MSFT  VE veiwer component and speech interface run as two separate processes.  Speech interface : using a total IBM solution : ViaVoice. IBM’s SMAPI. IBM’s SRCL grammar. Platform : PIII 500MHz

ISFS  A recent transistion of MSFT.  Using VRML as 3D modeling language.  Using JSAPI as interface to speech engine. ViaVoice totally support JSAPI. VRML support Java as a scripting language  Other structure is identical to MSFT system. Platform : Xeon 2.0GHz ->Need more computing power!

Why Chose to Use Standalone VRML Brwoser?  Security Limitations. (detail will be discussed later)  VM Limitations. (detail will be discussed later)  Provide opportunities to customize interface to VRML browser. In my personal experience,system usually become unstable when speech engine work with VRML Plug- in via EAI’s Java interface.

Security Limitations  JRE imposes security limitations on Java Applets.  JSAPI was unable to establish a connection with speech engine unless we explicitly reconfig the security settings.

Limited VM  Most VRML Browser ‘s EAI were implemented using ActiveX thus only support Microsoft’s old VM which dosen’t support most modern functions of Java. Ex:This may force us to use Java AWT instead of swing which provide better GUI.

Providing GUI as VUI Fallback  GUI provides a fallback in case the speech recognizer is having trouble accurately transcribing the user’s voice.  GUI is adjusted dynamically to provide one-to-one correspondence to VUI.

Paper 2 Conclusion  The Speech Interface is needed because GUI and VE Viewer both rely on direct manipulation and keep our hand too busy.  As HCI become increasingly multimodel,care must be taken to integrate in natural manner.

Future Work  VRML is more close to Object –oriented and tree-structured.  It is hard to represent them in RDBMS.  Must find some way to store model data easily and efficiently. Personal thought : Using XML Database.

Switchable! Discussions

Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier.

Similar presentations

Presentation on theme: "Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier.

Similar presentations

Presentation on theme: "Speech Interface to Virtual Reality Applications Reporter Chun-Feng Liao Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier."— Presentation transcript:

Similar presentations

About project

Feedback