1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99.

1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett mbett@cs.cmu.edu 3/26/99

2 1 - Overview Description of the Multimodal Toolkit (MMI) What MMI is... Integrated Speech, Handwriting, and Gesture Recognizer Java Based API Integrated Recording Feature Plug-n-Play Recognizer Interface. Allows recognizers to be replaced Internet Enabled Interface. Recognizers may run remotely over the internet Simultaneous Multiple User Support Supports Natural Interface Development

3 2 - Architecture Overview Multimodal Server Janus/Speech Recognizer Gestures Handwriting Recognizer Gesture Recognizer Handwriting Speech Vocabulary Acoustic Model Language Model MMI is a toolkit that allows multiple modalities to be easily integrated into applications. Applications can mixed modalites (speech, gesture, and handwriting) The Java based API communicates directly with each recognizer The multimodal applet is the user interface; the applet window presents a view onto a domain- dependent representation of application data and state in the form of objects to be manipulated. Sample Application Which Uses Multimodal Error Repair Multimodal Applet

4 3 - Component Description The following modalites have the following level of support in multimodal toolkit

5 4 - External Interfaces The user defines their grammer using six probabilistically weighted nodes:  A Toplevel represents an entire input model and contains one or more sequences, each of which contains exactly one AFrame;  An AFrame represents an action frame and contains one or more sequences, each of which consists of one or more PSlots;  A PSlot represents a parameter slot and contains one or more UnimodalNodes (at most one for each input modality);  A UnimodalNode specifies a sub-grammar for a single input modality and has the same structure as a NonTerm, with the addition of a label specifying the modality;  A NonTerm is a non-terminal node consisting of one or more sequences, each of which contains zero or more NonTerms or Literals;  A Literal is a terminal node containing a text string representing one or more input tokens.

6 4 - External Interfaces The Multimodal Server sends a series of points to the pen and gesture recognizers. The audio is sent to the speech recognizer. The pen, gesture and speech recognizers return their hypothesis to the multimodal toolkit which is responsible for integrating the results in an optimizing programming search as shown below. [Minh Tue Voh Dissertation 1998 CMU]

7 5 - Existing Software “Bridges” The multimodal toolkit uses a Java API which allows applets or applications to incorporate multimodal functionality

8 6 - Information Flow Part 1 - Specify how other CPOF components can send and receive data to your system - Please be explicit Components may directly interface with the multimodal server Part 2 - What are the inputs to your system - Please specify formats and protocol - provide details Multimodal grammar Part 3 - What are the outputs of your system - Please specify format and protocol - provide details Hypothesis according to the multimodal grammer

9 7 - Plug-n-play Part 1 - We have not currently identified how our components interact with other CPOF components. Please present a diagram that shows this interaction TBD Part 2 - Are there components in your system that are functionally “similar” to another CPOF component? TBD Part 3 - Are any of your components complementing other CPOF components? (e.g ZUI and Sage/Visage) TBD

10 8 - Operating Environments and COTS Component Name Required Hardware Operating System Language Required COTS Multimodal Server PC or SunIndependentJava JDK 1.1.* JanusSun - Ultra 60Solaris 2.5.1 Tcl/tk C Tcl/Tk NPen++Sun or PC Solaris 2.5.1 or Windows NT C++ None Gesture Recognizer Sun or PC Solaris 2.5.1 or Windows NT C++ None

11 9 - Hardware Platform Requirement Specify the hardware required to support your system: MMI can run on a PC with a minimum of 32 Meg RAM and 200 Mhz processor. The Speech Recognizer requires a Sun Ultra 60 dual processor with 500 Meg RAM minimum. (Current recognizer under development will require 500 Mhz Pentium III with a 128 Meg minimum, 256 Meg preferred) Video capture cards, Soundblaster compatitable sound cards, table top and lapel microphones, pan tilt and stationary cameras are required.

1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99.

Similar presentations

Presentation on theme: "1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99.

Similar presentations

Presentation on theme: "1 Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett 3/26/99."— Presentation transcript:

Similar presentations

About project

Feedback