A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Overview - 1 Context: Conception phase of the EU IST/MIAMM project (Multidimensional Information Access using Multiple Modalities - with DFKI, TNO, Sony, Canon) Study of the design factors for a future haptic PDA like device Underlying application: multidimensional access to a musical database

Overview - 2 Objectives: Design and implementation of a unified representation language within the MIAMM demonstrator MMIL: Multimodal interface language “Blind” application of (Bunt & Romary 2002)

Methodology Basic components Represent the general organization of any semantic structure Parameterized by data categories taken from a common registry application specific data categories General mechanisms To make the thing work General categories Descriptive categories available to all formats + strict conformance to existing standards

MIAMM - wheel mode

MIAMM architecture dependancies Dialogue Manager MultiModalFusion (MMF) MiaDomo Database Dialogue History y Visual configuration Action Planner (AP) Sentences Haptic Device Display Haptic Processor Visualization Haptic-Visual Generation Visual-Haptic Processing (VisHapTac) Speaker Speech Synthesis Language Generation Scheduling Information Speech Generation Haptic-Visual Interpretation Microphone (Headset) Continuous Speech Recognizer Structural Analysis (SPIN) Word/Phoneme Lattice Speech Analysis Word/Phoneme sequence

Various processing steps - 1 Reco: Provides word lattices Out of our scope (MPEG7 word and phone lattice module) SPIN: Template based (en-de) or TAG-based (fr) dependency structures Low level semantic constructs

Various processing steps - 2 MMF (Multimodal Fusion) Fully interpreted structures Referential (MMILId) and temporal anchoring Dialogue history update AP (Action Planner) Generates MIAMM internal actions Request to MiaDoMo Actions to be generated (Language+VisHapTac)

Various processing steps - 3 VisHaptac Informs MMF of current graphical and haptic configuration (hierarchies of objects, focus, selection) MMIL: must answer those needs But not at the same time

Main characteristics of MMIL Basic ontology Events and participants (organized as hierarchies) Restrictions on events and participant Relations among these Additional mechanisms Temporal anchoring of events Ranges and alternatives Representation Flat meta-model

MMIL meta-model (UML)

An overview of data categories Underlying ontology for a variety of formats Distinction between abstract definition and implementation (e.g. in XML) Standardization objective: implementing a reference registry for NLP applications Wider set of DatCats than just semantics ISO 11179 (meta-data registries) as a reference standard for implementing such a registry

DatCat example: Addressee /Addressee/ Definition: The entity that is the intended hearer of a speech event. The scope of this data category is extended to deal with any multimodal communication event (e.g. haptics and tactile) Source: (implicit) an event, whose evtType should be /Speak/ Target: a participant (user or system)

Styles and vocabularies Style: design choice to impement a data actegory as an XML element, a database field, etc. Vocabulary: the names to be provided for a given style E.g. (for /Addressee/) Style: Element Vocabulary: {“addressee”} Note: Multilingualism

Time stamping /Starting point/ Def: indicates the beginning of the event Values: dateTime Anchor: time level Style: attribute Vocabulary: {“startPoint”} Example yearPeriod 1991 <tempSpan startPoint=“1991-01-01T00:00:00” endPoint=“1991-12-31T24:59:59”/>

Application: a family of formats Openness: a requirement for MIAMM Specific formats for input and output of each module Each format is defined within the same generic MMIL framework: Same meta-model for all Specific DatCat specification for each

The MIAMM family of formats SPIN-O MMF-O AP-O VisHapTac-O MMF-I MMIL+ The specifications provide typing information for all these formats

SPIN-O example Spiel mir den lied bitte vor (Please play the song) e0e0 e1e1 p1p1 destination evtType=speak dialogueAct=request evtType=play lex=vorspielen p2p2 objectType=user objType=tune refType=definite refStatus=pending object propContent speaker

speak request play vorspielen user tune definite pending

The use of perceptual grouping Reference domains and visual contexts « these three objects »  {,, } « the triangle »  { } « the two circles »  {, } The use of salience

VisHapTac-O e0e0 set 1 s1s1 s2s2 set 2 s 25 … description s 2-1 s 2-2 s 2-3 inFocus inSelection Visual haptic state Participant setting Sub-divisions

VisHapTac output - 1 HGState galaxy <tempSpan startPoint=“2000-01-20T14:12:06” endPoint=“2002-01-20T14:12:13”/> …

VisHapTac output - 2 … Let it be set inFocus Lady Madonna … inSelection Revolution 9 …

Conclusion Most of the properties we wanted are fulfilled: Uniformity, incrementality, partiality, openness and extensibility Discussion point: Semantic adequacy: Not a direct input to an inference system (except for underlying ontology) Semantics provided through specification

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

Similar presentations

Presentation on theme: "A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy.

Similar presentations

Presentation on theme: "A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy."— Presentation transcript:

Similar presentations

About project

Feedback