Presentation is loading. Please wait.

Presentation is loading. Please wait.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Similar presentations


Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."— Presentation transcript:

1 German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Wolfgang Wahlster SmartKom: Dialog-based Human Computer Interaction by Coordinated Analysis and Generation of Multiple Modalities BMBF Status Conference "Human Computer Interaction" 2003 June 3, Berlin Symmetric Multimodality in an Adaptive and Reusable Dialogue Shell

2 © W. Wahlster Spoken Dialogue Graphical User interfaces Gestural Interaction Multimodal Interaction SmartKom: Merging Various User Interface Paradigms Facial Expressions Biometrics

3 © W. Wahlster The SmartKom Consortium MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken Project duration: September 1999 – September 2003 Final presentation focusing on the mobile version: 5th September, Stuttgart Ulm

4 © W. Wahlster MAJOR SCIENTIFIC GOALS SmartKom‘s Major Scientific Goals Explore and design new symbolic and statistical methods for the seamless fusion and mutual disambiguation of multimodal input on semantic and pragmatic levels. Generalize advanced discourse models for spoken dialogue systems so that they can capture a broad spectrum of multimodal discourse phenomena. Explore and design new constraint-based and plan-based methods for multimodal fission and adaptive presentation layout. Integrate all these multimodal capabilities in a reusable, efficient and robust dialogue shell, that guarantees flexible configuration, domain independence and plug- and-play functionality.

5 © W. Wahlster Outline of the Talk 1.Towards Symmetric Multimodality 2.SmartKom: A Flexible and Adaptive Multimodal Dialogue Shell 3. Perception and Action under Multimodal Conditions 4. Multimodal Fusion and Fission in SmartKom 5. Ontological Inferences and the Three-Tiered Discourse Model of SmartKom 6. The Economic and Scientific Impact of SmartKom 7. Conclusions

6 © W. Wahlster Input Speech Gestures Facial Expressions Multimodal Fusion SmartKom Provides Full Symmetric Multimodality Symmetric multimodality means that all input modes (speech, gesture, facial expression) are also available for output, and vice versa. Challenge: A dialogue system with symmetric multimodality must not only understand and represent the user's multimodal input, but also its own multimodal output. Output Speech Gestures Facial Expressions Multimodal Fission USER SYSTEM The modality fission component provides the inverse functionality of the modality fusion component.

7 © W. Wahlster SmartKom Covers the Full Spectrum of Multimodal Discourse Phenomena Multimodal Discourse Phenomena mutual disambiguation of modalities multimodal deixis resolution and generation crossmodal reference resolution and generation multimodal turn-taking and backchannelling multimodal ellipsis resolution and generation multimodal anaphora resolution and generation Symmetric multimodality is a prerequisite for a principled study of these discourse phenomena.

8 © W. Wahlster Infrared Camera for Gestural Input, Tilting CCD Camera for Scanning, Video Projector Microphone Multimodal Control of TV-Set Multimodal Control of VCR/DVD Player Camera for Facial Analysis Projection Surface Speakers for Speech Output SmartKom’s Multimodal Input and Output Devices 3 dual Xeon 2.8 Ghz processors with 1.5 GB main memory

9 © W. Wahlster Smartkom‘s Control Panel

10 © W. Wahlster MM Dialogue Back- Bone Home: Consumer Electronics EPG Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Application Layer SmartKom-Mobile Mobile Travel Companion that helps with navigation SmartKom-Public: Communication Companion that helps with phone, fax, email, and authetification SmartKom-Home/Office: Infotainment Companion that helps select media content SmartKom: A Flexible and Adaptive Shell for Multimodal Dialogues

11 © W. Wahlster SmartKom`s SDDP Interaction Metaphor SDDP = Situated Delegation-oriented Dialogue Paradigm Anthropomorphic Interface = Dialogue Partner User specifies goal delegates task cooperate on problems asks questions presents results Service 1 Service 2 Service 3 Webservices Personalized Interaction Agent See: Wahlster et al. 2001, Eurospeech

12 © W. Wahlster SmartKom‘s Language Model and Lexicon is Augmented on the Fly with Named Entities Cinema Info - movie titles - actor names SmartKom‘s Basic Vocabulary 5500 Words TV Info - names of TV features - actor names Geographic Info - street names - names of points-of-interest e.g. all cinemas in one city > 200 new words e.g. TV programm of one day > 200 new words e.g. one city > more than 500 new names After a short dialogue sequence the lexicon includes > 10 000 words.

13 © W. Wahlster Now you can remove the document. The German Federal President E-mailing a Scanned Image with SmartKom’s Help

14 © W. Wahlster Please place your hand with spread fingers on the marked area. Interactive Biometric Authentication by Hand Contour Recognition

15 © W. Wahlster My name is Norbert Reithinger. I require authentication from you. I have found the record of Norbert Reithinger. I require a signature authentication for Norbert Reithinger. Please sign in the write-in field. The authentication was successful. I like to send a document to Wolfgang Wahlster. I have found the record for Wolfgang Wahlster. Please place the document on the marked area. Please remove it now. The documents was successfully scanned. The document has now been sent. Scanning a Document and Sending the Captured Image as an Email Attach- ment SmartKom bridges the full loop from multimodal perception to physical action:

16 © W. Wahlster Adaptive Perceptual Feedback on the System State

17 © W. Wahlster Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Word Hypothesis Graph with Acoustic Scores Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Intention Recognizer Selection of Most Likely Interpretation Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph

18 © W. Wahlster […] acoustic 0.96448 gesture 0.99791 understanding 0.91667 set epg_info […] featureFilm Enemy of the State […] […] […] acoustic 0.96448 gesture 0.99791 understanding 0.91667 set epg_info […] featureFilm Enemy of the State […] […] Confidence in the Speech Recognition Result Confidence in the Gesture Recognition Result Planning Act Object Reference Confidence in the Speech Understanding Result M3L Representation of an Intention Lattice Fragment I would like to know more about this

19 © W. Wahlster Fusing Symbolic and Statistical Information in SmartKom Early Fusion on the Signal Processing Level Face Camera Microphone Facial Expressions Affective User State Emotional Prosody - anger - joy Multiple Recognizers for a Single Modality time-stamped and scored hypotheses Speech Signal Boundary Prosody Emotional Prosody Speech Recognition

20 © W. Wahlster SmartKom‘s Computational Mechanisms for Modality Fusion and Fission Modality Fusion Modality Fission Ontological Inferences Unification Overlay Operations Planning Constraint Propagation M3L: Modality-Free Semantic Representation

21 © W. Wahlster The Markup Language Layer Model of SmartKom M3L MultiModal Markup Language OIL Ontology Inference Layer XMLS eXtended Markup Language Schema RDFS Resource Description Framework Schema XML eXtended Markup Language RDF Resource Description Framework HTML Hypertext Markup Language

22 © W. Wahlster Personalization Mapping Digital Content Onto a Variety of Structures and Layouts From the “one-size fits-all“ approach of static presentations to the “perfect personal fit“ approach of adaptive multimodal presentations Structure XML 1 XML 2 XML n Content M3L Layout HTML 11 HTML 1m HTML 21 HTML 2o HTML 31 HTML 3p

23 © W. Wahlster The Role of the Semantic Web Language M3L M3L (Multimodal Markup Language) defines the data exchange formats used for communication between all modules of SmartKom M3L is partioned into 40 XML schema definitions covering SmartKom‘s discourse domains The XML schema event.xsd captures the semantic representation of concepts and processes in SmartKom‘s multimodal dialogs

24 © W. Wahlster OIL2XSD: Using XSLT Stylesheets to Convert an OIL Ontology to an XML Schema

25 © W. Wahlster Using Ontologies to Extract Information from the Web MyOnto-Movie :title :description :actors MyOnto-Person :name :birthday :director Film.de-Movie :title :description Kinopolis.de-Movie :name :critics :o-title :main actor Mapping of Metadata

26 © W. Wahlster I would like to send an email to Dr.Reuse....................................................... M3L as a Meaning Representation Language for the User‘s Input

27 © W. Wahlster Exploiting Ontological Knowledge to Understand and Answer the User‘s Queries 2002-05-10T10:25:46 Schwarzenegger/name> Pro7 Which movies with Schwarzenegger are shown on the Pro7 channel?

28 © W. Wahlster SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers External Services Modality Fusion Discourse Modeling Action Planning Modality Fission Generators Speech Gestures Facial Expressions Speech Graphics Gestures Dialogue Manager

29 © W. Wahlster list epg_browse now 2003-03-20T19:42:32 2003-03-20T22:00:00 2003-03-20T19:50:00 2003-03-20T19:55:00 Today’s Stock News ARD …….. A Fragment of a Presentation Goal, as specified in M3L

30 © W. Wahlster Today's Stock News Everybody Loves Raymond The King of Queens Evening News Still Standing Yes, Dear Crossing Jordan Bonanza Passions Mr. Personality Down to Earth Weather Forecast Today Here is a listing of tonight's TV broadcasts. A Dynamically Generated Multimodal Presentation based on a Presentation Goal

31 © W. Wahlster Domain Layer Discourse Layer Modality Layer OO1 TV broadcasts on 20/3/2003 DO 1 DO 11 DO 12 DO 13 OO2 Broadcast of „The King of Queens“ on 20/3/2003 DO 2 DO 3 DO 4 DO 5 LO 5 third one LO 1 listing VO 1 GO 1 here (pointing) LO 2 tonight LO 3 TV broadcast LO 4 tape An Excerpt from SmartKom’s Three-Tiered Multimodal Discourse Model

32 © W. Wahlster Overlay Operations Using the Discourse Model Augmentation and Validation –compare with a number of previous discourse states: fill in consistent information compute a score –for each hypothesis - background pair: Overlay (covering, background) Covering: Background: Intention Hypothesis Lattice Selected Augmented Hypothesis Sequence

33 © W. Wahlster The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: –conflicting atomic values overwrite background (old) with covering (new) –type clash assimilate background to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001

34 © W. Wahlster Example for Overlay User: "What films are on TV tonight?" System: [presents list of films] User: "That‘s a boring program, I‘d rather go to the movies." How do we inherit “tonight” ?

35 © W. Wahlster Overlay Simulation Go to the moviesFilms on TV tonight Assimilation Background Covering

36 © W. Wahlster Overlay - Scoring Four fundamental scoring parameters: –Number of features from Covering (co) –Number of features from Background (bg) –Number of type clashes (tc) –Number of conflicting atomic values (cv) Codomain [-1,1] Higher score indicates better fit (1  overlay(c,b)  unify(c,b))

37 © W. Wahlster SmartKom‘s Presentation Planner The Presentation Planner generates a Presentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent PresentAddSmartakus DoLayout EvaluatePersonaNode Inform TryToPresentTVOverview ShowTVOverview SetLayoutData ShowTVOverview SetLayoutData PersonaAction SendScreenCommand....... Generation of Layout Smartakus Actions GenerateText... Speak cf. J. Müller, P. Poller, V. Tschernomas 2002

38 © W. Wahlster Adaptive Layout and Plan-Based Animation in SmartKom‘s Multimodal Presentation Generator

39 © W. Wahlster Seamless integration and mutual disambiguation of multimodalinput and output on semantic and pragmatic levels Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models Adaptive generation of coordinated, cohesive and coherent multimodal presentations Semi- or fully automatic completion of user-delegated tasks through the integration of information services Intuitive personification of the system through a presentation agent Salient Characteristics of SmartKom

40 © W. Wahlster The Economic and Scientific Impact of SmartKom 51 patents + 29 spin-off products 13 speech recognition 10 dialogue management 6 biometrics 3 video-based interaction 2 multimodal interfaces 2 emotion recognition Economic Impact 246 publications 117 keynotes / invited talks 66 masters and doctoral theses 27 new projects use results 5 tenured professors 10 TV features 81 press articles Scientific Impact

41 © W. Wahlster The virtual mouse has been installed in a cell phone with a camera. When the user holds a normal pen about 30cm in front of the camera, the system recognizes the tip of the pen as a mouse pointer. A red point then appears at the the tip on the display. An Example of Technology Transfer: The Virtual Mouse

42 © W. Wahlster Former Employees of DFKI and Researchers from the SmartKom Consortium have Founded Five Start-up Companies Eyeled (www.eyeled.com) CoolMuseum GmbH (www.coolmuseum.de) Mineway GmbH (www.mineway.de) Location-aware mobile information systems Multimodal systems for music rerieval Agent-based middleware Sonicson GmbH (www.sonicson.com) Quadox AG (www.quadox.com)

43 © W. Wahlster SmartKom’s Impact on International Standardization SmartKom‘s Multimodal Markup Language M3L Standard for Multimodal Content Representation Scheme ISO, TC37, SC4 Standard for Natural Markup Language w3.org/TR/nl-spec ISO W3C

44 © W. Wahlster SmartKom‘s Impact on Software Tools and Resources for Research on Multimodality MULTIPLATFORM Software Framework... 15 Sites all over Europe COMIC, EU, FP5 Conversational Multimodal Interaction with Computers 1.6 Terabytes 448 WOZ Sessions - audio transcripts - gesture and emotion labeling BAS ELRA LDC Germany Europe World

45 © W. Wahlster Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output. Conclusions


Download ppt "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."

Similar presentations


Ads by Google