Presentation is loading. Please wait.

Presentation is loading. Please wait.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Similar presentations

Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."— Presentation transcript:

1 German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: WWW: Wolfgang Wahlster Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture and Facial Expression 26th Annual German Conference on Artificial Intelligence (KI 2003) 16 September 2003, Hamburg

2 © W. Wahlster Spoken Dialogue Graphical User interfaces Gestural Interaction Multimodal Interaction SmartKom: Merging Various User Interface Paradigms Facial Expressions Biometrics

3 © W. Wahlster but: the user must input more and more complex commands to specify his information needs. Broadband mobile Internet access technologies via UMTS or mobile hotspots pave the way for a wide spectrum of added-value web services. PDAs and smartphones with tiny keyboards and mice are useless for mobile settings. Multimodal Dialogue Systems for Mobile Systems The Need for Mobile Multimodal Dialogue Systems

4 © W. Wahlster The Fusion of Multimodal Input Multiple modalities increase the uncertainty of interpretation Uncertainty of Signal Interpretation in Perceptive User Interfaces ???? Speech Recognition Prosody Recognition Gesture Recognition Facial Expression Recognition

5 © W. Wahlster The Fusion of Multimodal Input Dialog Context Fusion with mutual reduction of uncertainties by the exclusion of nonsensical combinations but: the semantic fusion of multiple modalities in the dialog context ensures an unambiguous interpretation Multiple modalities increase the uncertainty of interpretation Speech Recognition Prosody Recognition Gesture Recognition Facial Expression Recognition

6 © W. Wahlster The SmartKom Consortium MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken Project duration: September 1999 – September 2003 Final presentation focusing on the mobile version: 5th September, Stuttgart Ulm

7 © W. Wahlster MAJOR SCIENTIFIC GOALS SmartKom‘s Major Scientific Goals Explore and design new symbolic and statistical methods for the seamless fusion and mutual disambiguation of multimodal input on semantic and pragmatic levels. Generalize advanced discourse models for spoken dialogue systems so that they can capture a broad spectrum of multimodal discourse phenomena. Explore and design new constraint-based and plan-based methods for multimodal fission and adaptive presentation layout. Integrate all these multimodal capabilities in a reusable, efficient and robust dialogue shell, that guarantees flexible configuration, domain independence and plug- and-play functionality.

8 © W. Wahlster Outline of the Talk 1.Towards Symmetric Multimodality 2.SmartKom: A Flexible and Adaptive Multimodal Dialogue Shell 3. Perception and Action under Multimodal Conditions 4. Multimodal Fusion and Fission in SmartKom 5. Ontological Inferences and the Three-Tiered Discourse Model of SmartKom 6. The Economic and Scientific Impact of SmartKom 7. Conclusions

9 © W. Wahlster Input Speech Gestures Facial Expressions Multimodal Fusion SmartKom Provides Full Symmetric Multimodality Symmetric multimodality means that all input modes (speech, gesture, facial expression) are also available for output, and vice versa. Challenge: A dialogue system with symmetric multimodality must not only understand and represent the user's multimodal input, but also its own multimodal output. Output Speech Gestures Facial Expressions Multimodal Fission USER SYSTEM The modality fission component provides the inverse functionality of the modality fusion component.

10 © W. Wahlster SmartKom Covers the Full Spectrum of Multimodal Discourse Phenomena Multimodal Discourse Phenomena mutual disambiguation of modalities multimodal deixis resolution and generation crossmodal reference resolution and generation multimodal turn-taking and backchannelling multimodal ellipsis resolution and generation multimodal anaphora resolution and generation Symmetric multimodality is a prerequisite for a principled study of these discourse phenomena.

11 © W. Wahlster Infrared Camera for Gestural Input, Tilting CCD Camera for Scanning, Video Projector Microphone Multimodal Control of TV-Set Multimodal Control of VCR/DVD Player Camera for Facial Analysis Projection Surface Speakers for Speech Output SmartKom’s Multimodal Input and Output Devices 3 dual Xeon 2.8 Ghz processors with 1.5 GB main memory

12 © W. Wahlster MM Dialogue Back- Bone Home: Consumer Electronics EPG Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Application Layer SmartKom-Mobile Mobile Travel Companion that helps with navigation SmartKom-Public: Communication Companion that helps with phone, fax, email, and authetification SmartKom-Home/Office: Infotainment Companion that helps select media content SmartKom: A Flexible and Adaptive Shell for Multimodal Dialogues

13 © W. Wahlster Here is a map with movie theatres. Generating Maps, Animations and Information Displays on the Fly

14 © W. Wahlster I would like to see this movie. Reference Resolution is based on a Symbolic Representation of the Smart Graphics Output

15 © W. Wahlster The route from Palais Moraß to Kino im Karlstor is marked on the map. Synchronization of Map Update and Character Behaviour

16 © W. Wahlster Please place your hand with spread fingers on the marked area. Interactive Biometric Authentication by Hand Contour Recognition

17 © W. Wahlster SmartKom-Home as an Infotainment Companion that helps select media content and runs on a tablet PC

18 © W. Wahlster SmartKom-Public Biometric Authentication, Telephony and Document Scanning and Forwarding in a Multimodal Dialogue

19 © W. Wahlster SmartKom-Mobile as a Travel Companion in the Car

20 © W. Wahlster SmartKom-Mobile as a Travel Companion for Pedestrians

21 © W. Wahlster The High-Level Control Flow of SmartKom

22 © W. Wahlster The High-Level Control Flow of SmartKom

23 © W. Wahlster The High-Level Control Flow of SmartKom

24 © W. Wahlster The High-Level Control Flow of SmartKom

25 © W. Wahlster The High-Level Control Flow of SmartKom

26 © W. Wahlster The High-Level Control Flow of SmartKom

27 © W. Wahlster The High-Level Control Flow of SmartKom

28 © W. Wahlster The High-Level Control Flow of SmartKom

29 © W. Wahlster The High-Level Control Flow of SmartKom

30 © W. Wahlster The High-Level Control Flow of SmartKom

31 © W. Wahlster The High-Level Control Flow of SmartKom

32 © W. Wahlster The High-Level Control Flow of SmartKom

33 © W. Wahlster The High-Level Control Flow of SmartKom

34 © W. Wahlster The High-Level Control Flow of SmartKom

35 © W. Wahlster SmartKom‘s Language Model and Lexicon is Augmented on the Fly with Named Entities Cinema Info - movie titles - actor names SmartKom‘s Basic Vocabulary 5500 Words TV Info - names of TV features - actor names Geographic Info - street names - names of points-of-interest e.g. all cinemas in one city > 200 new words e.g. TV programm of one day > 200 new words e.g. one city > more than 500 new names After a short dialogue sequence the lexicon includes > 10 000 words.

36 © W. Wahlster Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Word Hypothesis Graph with Acoustic Scores Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Intention Recognizer Selection of Most Likely Interpretation Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph

37 © W. Wahlster […] acoustic 0.96448 gesture 0.99791 understanding 0.91667 set epg_info […] featureFilm Enemy of the State […] […] […] acoustic 0.96448 gesture 0.99791 understanding 0.91667 set epg_info […] featureFilm Enemy of the State […] […] Confidence in the Speech Recognition Result Confidence in the Gesture Recognition Result Planning Act Object Reference Confidence in the Speech Understanding Result M3L Representation of an Intention Lattice Fragment I would like to know more about this

38 © W. Wahlster Please reserve these three seats. SmartKom Understands Complex Encircling Gestures

39 © W. Wahlster Using Facial Expression Recognition for Affective Personalization (3’) Smartakus: Which of these features do you want to see? Processing ironic or sarcastic comments (1) Smartakus: Here you see the CNN program for tonight. (2)User: That’s great.  (3)Smartakus: I’ll show you the program of another channel for tonight. (2’)User: That’s great. 

40 © W. Wahlster Fusing Symbolic and Statistical Information in SmartKom Early Fusion on the Signal Processing Level Face Camera Microphone Facial Expressions Affective User State Emotional Prosody - anger - joy Multiple Recognizers for a Single Modality time-stamped and scored hypotheses Speech Signal Boundary Prosody Emotional Prosody Speech Recognition

41 © W. Wahlster SmartKom‘s Computational Mechanisms for Modality Fusion and Fission Modality Fusion Modality Fission Ontological Inferences Unification Overlay Operations Planning Constraint Propagation M3L: Modality-Free Semantic Representation

42 © W. Wahlster The Markup Language Layer Model of SmartKom M3L MultiModal Markup Language OIL Ontology Inference Layer XMLS eXtended Markup Language Schema RDFS Resource Description Framework Schema XML eXtended Markup Language RDF Resource Description Framework HTML Hypertext Markup Language

43 © W. Wahlster March 2003 ISBN 0-262-06232-1 8 x 9, 392 pp., 98 illus. $40.00/£26.95 (CLOTH) Edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang Wahlster Foreword by Tim Berners-Lee Spinning the Semantic Web

44 © W. Wahlster Personalization Mapping Digital Content Onto a Variety of Structures and Layouts From the “one-size fits-all“ approach of static presentations to the “perfect personal fit“ approach of adaptive multimodal presentations Structure XML 1 XML 2 XML n Content M3L Layout HTML 11 HTML 1m HTML 21 HTML 2o HTML 31 HTML 3p

45 © W. Wahlster The Role of the Semantic Web Language M3L M3L (Multimodal Markup Language) defines the data exchange formats used for communication between all modules of SmartKom M3L is partioned into 40 XML schema definitions covering SmartKom‘s discourse domains The XML schema event.xsd captures the semantic representation of concepts and processes in SmartKom‘s multimodal dialogs

46 © W. Wahlster OIL2XSD: Using XSLT Stylesheets to Convert an OIL Ontology to an XML Schema

47 © W. Wahlster Using Ontologies to Extract Information from the Web MyOnto-Movie :title :description :actors MyOnto-Person :name :birthday :director :title :description :name :critics :o-title :main actor Mapping of Metadata

48 © W. Wahlster I would like to send an email to Dr.Reuse....................................................... M3L as a Meaning Representation Language for the User‘s Input

49 © W. Wahlster Exploiting Ontological Knowledge to Understand and Answer the User‘s Queries 2002-05-10T10:25:46 Schwarzenegger/name> Pro7 Which movies with Schwarzenegger are shown on the Pro7 channel?

50 © W. Wahlster SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers External Services Modality Fusion Discourse Modeling Action Planning Modality Fission Generators Speech Gestures Facial Expressions Speech Graphics Gestures Dialogue Manager

51 © W. Wahlster list epg_browse now 2003-03-20T19:42:32 2003-03-20T22:00:00 2003-03-20T19:50:00 2003-03-20T19:55:00 Today’s Stock News ARD …….. A Fragment of a Presentation Goal, as specified in M3L

52 © W. Wahlster Today's Stock News Everybody Loves Raymond The King of Queens Evening News Still Standing Yes, Dear Crossing Jordan Bonanza Passions Mr. Personality Down to Earth Weather Forecast Today Here is a listing of tonight's TV broadcasts. A Dynamically Generated Multimodal Presentation based on a Presentation Goal

53 © W. Wahlster Domain Layer Discourse Layer Modality Layer OO1 TV broadcasts on 20/3/2003 DO 1 DO 11 DO 12 DO 13 OO2 Broadcast of „The King of Queens“ on 20/3/2003 DO 2 DO 3 DO 4 DO 5 LO 5 third one LO 1 listing VO 1 GO 1 here (pointing) LO 2 tonight LO 3 TV broadcast LO 4 tape An Excerpt from SmartKom’s Three-Tiered Multimodal Discourse Model

54 © W. Wahlster Overlay Operations Using the Discourse Model Augmentation and Validation –compare with a number of previous discourse states: fill in consistent information compute a score –for each hypothesis - background pair: Overlay (covering, background) Covering: Background: Intention Hypothesis Lattice Selected Augmented Hypothesis Sequence

55 © W. Wahlster The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: –conflicting atomic values overwrite background (old) with covering (new) –type clash assimilate background to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001

56 © W. Wahlster Example for Overlay User: "What films are on TV tonight?" System: [presents list of films] User: "That‘s a boring program, I‘d rather go to the movies." How do we inherit “tonight” ?

57 © W. Wahlster Overlay Simulation Go to the moviesFilms on TV tonight Assimilation Background Covering

58 © W. Wahlster Overlay - Scoring Four fundamental scoring parameters: –Number of features from Covering (co) –Number of features from Background (bg) –Number of type clashes (tc) –Number of conflicting atomic values (cv) Codomain [-1,1] Higher score indicates better fit (1  overlay(c,b)  unify(c,b))

59 © W. Wahlster SmartKom‘s Presentation Planner The Presentation Planner generates a Presentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent PresentAddSmartakus DoLayout EvaluatePersonaNode Inform TryToPresentTVOverview ShowTVOverview SetLayoutData ShowTVOverview SetLayoutData PersonaAction SendScreenCommand....... Generation of Layout Smartakus Actions GenerateText... Speak cf. J. Müller, P. Poller, V. Tschernomas 2002

60 © W. Wahlster Adaptive Layout and Plan-Based Animation in SmartKom‘s Multimodal Presentation Generator

61 © W. Wahlster Seamless integration and mutual disambiguation of multimodalinput and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent Salient Characteristics of SmartKom

62 © W. Wahlster The Economic and Scientific Impact of SmartKom 51 patents + 29 spin-off products 13 speech recognition 10 dialogue management 6 biometrics 3 video-based interaction 2 multimodal interfaces 2 emotion recognition Economic Impact 246 publications 117 keynotes / invited talks 66 masters and doctoral theses 27 new projects use results 5 tenured professors 10 TV features 81 press articles Scientific Impact

63 © W. Wahlster The virtual mouse has been installed in a cell phone with a camera. When the user holds a normal pen about 30cm in front of the camera, the system recognizes the tip of the pen as a mouse pointer. A red point then appears at the the tip on the display. An Example of Technology Transfer: The Virtual Mouse

64 © W. Wahlster Former Employees of DFKI and Researchers from the SmartKom Consortium have Founded Five Start-up Companies Eyeled ( CoolMuseum GmbH ( Mineway GmbH ( Location-aware mobile information systems Multimodal systems for music rerieval Agent-based middleware Sonicson GmbH ( Quadox AG (

65 © W. Wahlster SmartKom’s Impact on International Standardization SmartKom‘s Multimodal Markup Language M3L Standard for Multimodal Content Representation Scheme ISO, TC37, SC4 Standard for Natural Markup Language ISO W3C

66 © W. Wahlster SmartKom‘s Impact on Software Tools and Resources for Research on Multimodality MULTIPLATFORM Software Framework... 15 Sites all over Europe COMIC, EU, FP5 Conversational Multimodal Interaction with Computers 1.6 Terabytes 448 WOZ Sessions - audio transcripts - gesture and emotion labeling BAS ELRA LDC Germany Europe World

67 © W. Wahlster Burning Research Issues in Multimodal Dialogue Systems Multimodality: from alternate modes of interaction towards mutual disambiguation and synergistic combinations Discourse Models: from information-seeking dialogs towards argumentative dialogs and negotiations Domain Models: from closed world assumptions towards the open world of web services Dialog Behaviour: from automata models towards a combination of probabilistic and plan-based models

68 © W. Wahlster Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output. Conclusions

69 Further Presentations about SmartKom at KI 2003: Adelhard, Shi, Frank, Zeißler, Batliner, Nöth, Niemann: Multimodal User State Recognition in a Modern Dialogue System, p. 591-605 Müller, Poller, Tschernomas: A Multimodal Fission Approach with a Presentation Agent in the Dialog System SmartKom, p. 633 -645 URL of this Presentation:

70 © W. Wahlster © 2003 DFKI Design by R.O. Thank you very much for your attention

Download ppt "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."

Similar presentations

Ads by Google