Presentation is loading. Please wait.

Presentation is loading. Please wait.

German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49.

Similar presentations


Presentation on theme: "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."— Presentation transcript:

1 German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster Cyber Assist International Symposium 2001 Tokyo, March 6, 2001 Prof. Wolfgang Wahlster SmartKom: Multimodal Dialogs with Mobile Web Users

2 © W. Wahlster Natural Language Dialog Graphical User interfaces Gestural Interaction Multimodal Interaction Merging Various User Interface Paradigms

3 © W. Wahlster System Input Channels Output Channels Storage HD Drive CD-ROM visual tactile auditory haptic MEDIA (physical information carriers) MODALITIES (human senses) languagegraphicsgesture User CODE (systems of symbols) mimics Code, Media and Modalities

4 © W. Wahlster SmartKom: Intuitive Multimodal Interaction MediaInterface European Media Lab Uinv. Of Munich Univ. of Stuttgart Saarbrücken Aachen Dresden Berkeley Stuttgart MunichUniv. of Erlangen Heidelberg Main Contractor DFKI Saarbrücken The SmartKom Consortium: Project Budget: € 25 M Project Duration: 4 years Ulm

5 © W. Wahlster SmartKom-Home/Office: A Versatile Agent-based Interface SmartKom-Public: A Multimodal Communication Booth SmartKom-Mobile: A Handheld Communication Assistant Media Analysis Kernel of SmartKom Interface Agent Interaction Management Application Manage- ment Media Design SmartKom: A Transportable and Transmutable Interface Agent

6 © W. Wahlster User(s) Media Analysis Design Media Fusion Output Rendering Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Interaction Management Media Analysis Input Processing Information Applications People Intention Recognition Media Design Application Interface Discourse Modeling User Modeling Presentation Design Language Graphics Gesture Biometrics Language Graphics Gesture Animated Presentation Agent The Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998)

7 © W. Wahlster Camera GPS Microphone Loudspeaker Stylus-Activated Sketch Pad Wearable Compute Server Docking Station for Car PC Biosensor for Authentication & Emotional Feedback GSM for Telephone, Fax, Internet Connectivity SmartKom-Mobile: A Handheld Communication Assistant

8 © W. Wahlster Smartcard/ Credit Card for authentication and billing Docking station for PDA/Notebook/ Camcorder high speed and broad bandwidth Internet connectivity High-resolution scanner Loudspeaker Room microphone Face-tracking camera Virtual touchscreen protected against vandalism Multipoint video conferencing SmartKom-Public: A Multimodal Communication Booth

9 © W. Wahlster SpeechMike Virtual Touchscreen Natural Gesture Recognition SmartKom-Home/Office: Versatile Agent-based Interface

10 © W. Wahlster Integration of Speech and Gesture Advantages: For the sender: Economic specification of referents -The description becomes shorter and may be underspecified. For the recipient: Fast recognition of referents - Speech processing and orientation in an intended direction are performed simultanuously. Speech and gesture input disambiguate each other. Disadvantages: Employing gestures leads to an increase of elliptic utterances (  speech analysis is getting more complex). Multiple pointing gestures in one utterance may lead to reference problems.

11 © W. Wahlster XTRA: Interpretation of pointing gestures (eXpert TRAnslator, Wahlster et al. 1986)

12 © W. Wahlster Multimodal Input and Output in the SmartKom System

13 © W. Wahlster Unification-based Media Fusion “MOVE THIS HERE” Source: Michael Johnston

14 © W. Wahlster Unification-based Media Fusion “MOVE THIS HERE” Source: Michael Johnston

15 © W. Wahlster

16 Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Web Access Mobile Dialog with a Virtual Tourist Guide for the Heidelberg Castle Location-adaptive Query Interpretation

17 © W. Wahlster Multimodal Route Description Mobile Speech Translation and Multilingual Information Access Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Web Access

18 © W. Wahlster Speech-based Access to 3D Virtual Views Multimodal Output from a Digital Library and Speech-based Access to Internet Content Augmented Reality: Combining Speech, Gestures and Graphics for Mobile Web Access

19 © W. Wahlster Multimodal Input and Output in SmartKom Input by the UserOutput by the Presentation agent Speech Gesture Mimics + + + + + +

20 © W. Wahlster Semantic Representation Language Semantic Representation Language Mimics Description Language Mimics Description Language Gesture Description Language Gesture Description Language Ontologies Knowledge Representation Language Inference Component Knowledge Representation Language Inference Component DBMS/ KBMS/ WWW DBMS/ KBMS/ WWW Mimics Analysis Mimics Generation Gesture Analysis Gesture Generation Parsing Mimics Gestures Modality-Specific Representation Languages as an Intermediate Representation before Media Fusion Speech Input M3L based on XML

21 © W. Wahlster The SmartKom Control GUI

22 © W. Wahlster SmartKom‘s Data Collection of Multimodal Dialogs User Side-view Camera Face-tacking Camera with Microphone Environmental Noise Microphone Array Screen Projected Webpage Face-tacking Camera Loudspeaker Microphone Array User Bird’s-eye Camera LCD Beamer SIVIT- Camera

23 © W. Wahlster ANVIL: Multi-Track Annotation of Video and Language Annotation Tool for Multimodal Interaction trans-literated speech rhetorical relations theme-rheme Postures, Gestures http://www.dfki.de/~kipp/research/anvil.html...

24 © W. Wahlster Mobile Presentation Unit for SmartKom-Public 2 Sony DSR-PD100AP Video Cameras LCD-Beamer ASK C5 SIVIT Gesture Recognition Unit Microphones (Microphone Array) Speakers 3 Dual Pentiums III, 500

25 © W. Wahlster Which feature films are shown tonight on TV? Combination of Speech and Gesture in SmartKom I show you a survey of tonight's TV films. I can't find anything interesting. Then I'll go to the movies. Here you see a programme listing of the movies shown in Heidelberg today. This one I would like to see. Where is it shown? On this map all movie theatres are highlighted, that are showing "A Little Christmas Story".

26 © W. Wahlster Three Levels of Mark-up Languages for the Web Content : Structure : Form = 1 : n : m WWW Document Content Structure Form OIL/M3L XML HTML

27 © W. Wahlster Frame Languages Object-oriented Modelling Primitives Frame Languages Object-oriented Modelling Primitives Concept Languages/ Terminological Logics Formal Semantics Subsumption, Inferences Concept Languages/ Terminological Logics Formal Semantics Subsumption, Inferences Web Languages XML and RDF Syntax Web Languages XML and RDF Syntax M3L M3L Integrates Three Language Families

28 © W. Wahlster [...] cinema_17a Europa 225 230 [...] 0.5542 0.1950 0.9892 0.7068 pid1234 [...] [...] cinema_17a Europa 225 230 [...] 0.5542 0.1950 0.9892 0.7068 pid1234 [...] M3L Representation of the Multimodal Discourse Context Blackboard with Presentation Context of the Previous Dialog Turn

29 © W. Wahlster M3L Representation of the Word Lattice Produced by the Speech Recognizer for “ There [  ] I would like to get a reservation.“ 2000-12-07T13:44:37.900Z shortPause [...] 5 7 gern 6.51343 PT0.57S PT0.84S 5 7 gerne 6.19579 PT0.57S PT0.84S [...] 2000-12-07T13:44:37.900Z shortPause [...] 5 7 gern 6.51343 PT0.57S PT0.84S 5 7 gerne 6.19579 PT0.57S PT0.84S [...]

30 © W. Wahlster 2000-12-07T14:45:03.125 PT0.040S 2000-12-07T14:45:03.125 PT0.040S 0.872641 0.477261 tarrying dynamic Gesture Recognition and Gesture Analysis “There [  ] I would like to get a reservation.“ Gesture Lattice as Result of Gesture Recognition Result of Gesture Analysis [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa 225 230 [...] [...] tarrying dynStructId30 1 dynStructId28 2 [...] cinema_17a Europa 225 230 [...]

31 © W. Wahlster Language Analysis and Media Fusion: Turn8: “There [  ] I would like to get a reservation.“ [...] acoustic 60.95448 understanding 0.928571 reserve cinema_17a Europa [...] [...] acoustic 60.95448 understanding 0.928571 reserve cinema_17a Europa [...] Confidence in the Speech Recognition Result Confidence in the Speech Understanding Result Planning Act Object Reference

32 © W. Wahlster Result of the Action Planner: Presentation Tasks and Presentation Results list add [...] 20:00 [...] list add [...] 20:00 [...]

33 © W. Wahlster Input into the Language Generator list Meine Braut, ihr Vater und ich Europa [...] list Meine Braut, ihr Vater und ich Europa [...]

34 © W. Wahlster Language Generation [...] die Anfangszeiten [...] Auf der Übersicht sehen Sie die Anfangszeiten des Films Schmalspurganoven im Kino Europa

35 © W. Wahlster Output Synchronization: Speech, Gesture, Graphics, Animation 11 declarative [...] eine 2.1539 2.2829 Übersicht 2.2829 3.2997 [...] 11 declarative [...] eine 2.1539 2.2829 Übersicht 2.2829 3.2997 [...]


Download ppt "German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49."

Similar presentations


Ads by Google