German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Wolfgang Wahlster SmartKom: Modality Fusion for a Mobile Companion based on Semantic Web Technologies Cyber Assist Consortium Second International Symposium - Information Environment for Mobile and Ubiquitous Computing Era - Tokyo, 25 March 2003

© W. Wahlster Multimodal UMTS Systems Intelligent Interaction with Mobile Internet Services Access to web content and web services anywhere and anytime Access to corporate networks and virtual private networks from any device Access to edutainment and infotainment services Access to edutainment and infotainment services Access to all messages (voice, email, multimedia, MMS) from any single device Access to all messages (voice, email, multimedia, MMS) from any single device Personalization Localization

© W. Wahlster MM Dialogue Back- Bone Home: Consumer Electronics EPG Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Application Layer SmartKom-Mobile SmartKom-Public SmartKom-Home/Office SmartKom: A Highly Portable Multimodal Dialogue System

© W. Wahlster SmartKom`s SDDP Interaction Metaphor SDDP = Situated Delegation-oriented Dialogue Paradigm Anthropomorphic Interface = Dialogue Partner User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 Webservices Personalized Interaction Agent See: Wahlster et al. 2001, Eurospeech

© W. Wahlster SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor Scientific Director W. Wahlster DFKI Saarbrücken The SmartKom Consortium: Project Budget: € 25.5 million, funded by BMBF (Dr. Reuse) and industry Project Duration: 4 years (September 1999 – September 2003) Ulm

© W. Wahlster Outline of the Talk 1.The Markup Language Layer Model of SmartKom 2.Modality Fusion in SmartKom 3.The Role of the Semantic Web Language M3L 4.Providing Coherence in Multimodal Dialogs by Ontology-based Overlay 5. Conclusions

© W. Wahlster Personalization Mapping Web Content Onto a Variety of Structures and Layouts From the “one-size fits-all“ approach of static webpages to the “perfect personal fit“ approach of adaptive webpages Structure XML 1 XML 2 XML n Content M3L Layout HTML 11 HTML 1m HTML 21 HTML 2o HTML 31 HTML 3p

© W. Wahlster The Markup Language Layer Model of SmartKom M3L MultiModal Markup Language OIL Ontology Inference Layer XMLS eXtended Markup Language Schema RDFS Resource Description Framework Schema XML eXtended Markup Language RDF Resource Description Framework HTML Hypertext Markup Language

© W. Wahlster Symbolic and Subsymbolic Fusion of Multiple Modes Speech Recognition Gesture Recognition Prosody Recognition Facial Expression Recognition Lip Reading Subsymbolic Fusion - Neuronal Networks - Hidden Markov Models Symbolic Fusion - Graph Unification - Bayesian Networks Reference Resolution and Disambiguation Modality-Free Semantic Representation

© W. Wahlster Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens) User: Switch on the TV. Smartakus: Okay, the TV is on. User: Which channels are presenting the latest news right now? Smartakus: CNN and NTV are presenting news. User: Please record this news channel on a videotape. Smartakus: Okay, the VCR is now recording the selected program. Example: Multimodal Access to Electronic Program Guides for TV

© W. Wahlster Using Facial Expression Recognition for Affective Personalization (3’) Smartakus: Which of these features do you want to see? Processing ironic or sarcastic comments (1) Smartakus: Here you see the CNN program for tonight. (2)User: That’s great.  (3)Smartakus: I’ll show you the program of another channel for tonight. (2’)User: That’s great. 

© W. Wahlster Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Word Hypothesis Graph with Acoustic Scores Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Intention Recognizer Selection of Most Likely Interpretation Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph

© W. Wahlster SmartKom‘s Computational Mechanisms for Modality Fusion and Fission Modality Fusion Modality Fission Ontological Inferences Unification Overlay Operations Planning Constraint Propagation M3L: Modality-Free Semantic Representation

© W. Wahlster The Role of the Semantic Web Language M3L M3L (Multimodal Markup Language) defines the data exchange formats used for communication between all modules of SmartKom M3L is partioned into 40 XML schema definitions covering SmartKom‘s discourse domains The XML schema event.xsd captures the semantic representation of concepts and processes in SmartKom‘s multimodal dialogs

© W. Wahlster Using Ontologies to Extract Information from the Web MyOnto-Movie :title :description :actors MyOnto-Person :name :birthday :director Film.de-Movie :title :description Kinopolis.de-Movie :name :critics :o-title :main actor Mapping of Metadata

© W. Wahlster SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers External Services Modality Fusion Discourse Modeling Action Planning Modality Fission Generators Speech Gestures Facial Expressions Speech Graphics Gestures Dialogue Manager

© W. Wahlster DO 1 DO 2 VO 1 DO 10 DO 3 DO 9 Modality Layer Discourse Layer System: This [  ] is a list of films showing in Heidelberg. heidelberg list LO 2 LO 3... Domain Layer DomainObject 1 ticketfirst DO 11 DO 12 reserve LO 4 LO 5 LO 6 DomainObject 2 GO 1...  User: Please reserve a ticket for the first one. Smartkom‘s Three-Tiered Discourse Model DO = Discourse Object, LO = Linguistic Object GO = Gestural Object, VO = Visual Object cf. M. Löckelt et. al. 2002, N. Pfleger 2002

© W. Wahlster theater: MovieTheater movie: Movie reservationNumber: PositiveInteger SmartKom’s Domain Model based on M3L Used for communication in the back-bone Frame-based ontology; representation as Typed Feature Structures in M3L (XML) name: String director: Person cast: PersonList yearOfProduction: PositiveInteger… address: Address seats: SeatStructure… CinemaReservation Application objects composed of subobjects Slots: Feature paths meaningful for the dialogue (entities that can be talked about / referenced to); e.g. movie:director:lastName in a CinemaReservation object Slots can recursively contain other slots firstName: String lastName: String…

© W. Wahlster Overlay Operations Using the Discourse Model Augmentation and Validation –compare with a number of previous discourse states: fill in consistent information compute a score –for each hypothesis - background pair: –Overlay (covering, background) Covering: Background: Intention Hypothesis Lattice Selected Augmented Hypothesis Sequence

© W. Wahlster The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: –conflicting atomic values overwrite background (old) with covering (new) –type clash assimilate background to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001

© W. Wahlster "Formal" Definition Overlay Let –co be covering –bg be background Step 1: –Assimilate(co,bg) T bg co Step 2: –Overlay(co,assimilate(co,bg)) If co and bg are frames: recursion If co is empty: use bg If bg is empty: use co If conflict: use co

© W. Wahlster Domain Models with Multiple Inheritance Assimilate(co,bg) –Compute the set of minimal upper bounds (MUB) –Specialize the MUBs –Unify the specialized MUBs T cobg Overlay remains untouched MUB

© W. Wahlster Overlay - Scoring Four fundamental scoring parameters: –Number of features from Covering (co) –Number of features from Background (bg) –Number of type clashes (tc) –Number of conflicting atomic values (cv) Codomain [-1,1] Higher score indicates better fit (1  overlay(c,b)  unify(c,b))

© W. Wahlster Analysis of U4: Example: Enrichment and Validation U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context

© W. Wahlster Example: Enrichment and Validation U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context Analysis of U6: Overlay ( U6, U4) Result: (Score: 0.8666)

© W. Wahlster Example: Enrichment and Validation U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context Analysis of U6: Overlay ( U6, U2) Result: (Score: -1)

© W. Wahlster Animation of Scoring Parameters Background Covering Number of features from Covering (co) Number of features from Background (bg) Number of type clashes (tc) Number of conflicting atomic values (cv) Result: 12 1 2 0

© W. Wahlster SmartKom‘s Presentation Planner The Presentation Planner generates a Presentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent PresentAddSmartakus DoLayout EvaluatePersonaNode Inform TryToPresentTVOverview ShowTVOverview SetLayoutData ShowTVOverview SetLayoutData PersonaAction SendScreenCommand....... Generation of Layout Smartakus Actions GenerateText... Speak cf. J. Müller, P. Poller, V. Tschernomas 2002

© W. Wahlster Seamless integration and mutual disambiguation of multimodalinput and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent Salient Characteristics of SmartKom

© W. Wahlster Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output. Conclusions

© W. Wahlster First International Conference on Perceptive & Multimodal User Interfaces (PMUI’03) November 5-7 th, 2003 Delta Pinnacle Hotel, Vancouver, B.C., Canada Conference Chair Sharon Oviatt, Oregon Health & Science Univ., USA Program Chairs Wolfgang Wahlster, DFKI, Germany Mark Maybury, MITRE, USA PMUI’03 is sponsored by ACM, and will be co-located in Vancouver with ACM’s UIST’03. This meeting follows three successful Perceptive User Interface Workshops (with PUI’01 held in Florida) and three International Multimodal Interface Conferences initiated in Asia (with ICMI’02 held in Pittsburgh).

http://smartkom.dfki.de/

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Similar presentations

Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Similar presentations

Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."— Presentation transcript:

Similar presentations

About project

Feedback