Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Architecture Dream Team Schloss Dagshul, Germany October 2001.

Similar presentations

Presentation on theme: "The Architecture Dream Team Schloss Dagshul, Germany October 2001."— Presentation transcript:


2 The Architecture Dream Team Schloss Dagshul, Germany October 2001

3 Page 2 Would you build your dream house without a blueprint?

4 Page 3 What you hope to get

5 Page 4 … what you might get

6 Page 5 User(s) Information Applications People Today’s Conventional Architecture Presentation Application Interface Dialog Control

7 Page 6 CHAMELEON Platform (Intelimedia Workbench) Paul McKevitt Speech synthesizer Speech recognizer Laser pointer Black board NL parser Microphone array Domain model Gesture recognizer Dialogue manager Frame semantics Topsy

8 Page 7 Microsoft Derek Jacoby MIPAD Architecture A Typical DrWho App

9 Page 8 Harry Bunt Context Input InterpretationOutput Synthesis Context Management Dialogue Management API Application Pending Context linguistic semantic physical perceptual cognitive social

10 Page 9 Art Exploration Oliviero Stock explicit input (e.g., pointing) input analyzer composer engine implicit input (e.g., movement) presentation Physical space model Hypermedia information visitor models interaction history Audio message to headphonelinks and image to UI

11 Page 10 COLLAGEN Sidner et al.

12 Page 11 IBM’s Responsive Information Architect (RIA) Michelle Zhou speech gesture Multimodal Interpreter Conversational Facilitator Presentation Broker Media Producer Visual Designer Language Designer Models of: Design Domain User Conversation Environment user IRIS Info Server

13 Page 12 Interact Kristiina Jokinen Input Manager Presentation Manager Dialogue Manager Task Agents/Acts Information Storage Database Dialogue Agents/Acts (e.g., Q, A, State) ASR Language Understanding Topic Recognition TTS Generator Agents

14 Page 13 EMBASSI Conceptual Architecture l Z-Axis: -Underlying HW-Infrastructure -Software-Infrastructure (Agent / Distr. Comp. Middleware) -Functional building blocks of conceptual architecture (Multimodal Assistant Componentware, MAC) -Application-level Assistants (not shown) l XY-Plane of MAC -Dialogic Assistance -Effectual Assistance -Situational Assistance -Explicit and implied generic (= application independent) ontologies, defining component interfaces

15 Page 14 A n Assistent X 2 Tuner Strg. X 3 EPG Strg. X 5 Display Strg. X 4 VCR Strg. X 1 Embassi Strg. G 1 VCR G 2 Set-top Box G 3 Display S 1 Biometrie S 2 Umgeb. Sensor Umgebungs / Situations DB User DB Applikations DB Resourcen DB I 1 GUI Input I 2 Sprach- erkennung O 1 Audio Ausgabe. O 2 Display F 1 GUI- analyse F 2 Sprach- analyse PMI (Medien- fusion) PMO (Präsen- tation) R 3 Textge- nerierung R 2 GUI Renderer R 1 Avatar- Controller Unimodale I/O Geräte “Lexik.” Ebene Multimod. Datenauf- bereitung “Syntakt.” Ebene Dialog- manage- ment Assistenz- methoden “Semant.” Ebene Strategie Ebene Gerätestrgs. Ebene Ausführungs- komponenten Geräteinfra- struktur GiGi XiXi … … D Dialog- mgr. I 3 Gestik- erkennung I 4 Blickricht. erkennung O 3 Avatar- Renderer F 2 Geräte- auswahl Kontext-Manager EMBASSI Architecture “Ich will das auf dem da aufnehmen!”

16 Page 15 SMARTKOM Wolfgang Wahlster

17 Page 16

18 Page 17 DARPA Galaxy Communicator Language Generation Language Generation Text-to-Speech Conversion Text-to-Speech Conversion Audio Server Audio Server Dialogue Management Dialogue Management Application Backend Application Backend Context Tracking Context Tracking Frame Construction Frame Construction Speech Recognition Speech Recognition Hub The Galaxy Communicator Software Infrastructure (GCSI) is a distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems Open source and documentation available at and

19 Page 18 An Example: Communicator-Compliant Emergency Management Interface MITRE I/O podium displays input and output text MITRE I/O podium displays input and output text MIT phone connectivity connects audio to a telephone line MIT phone connectivity connects audio to a telephone line Database MITRE SQL generation converts abstract requests to SQL MITRE dialogue management tracks information, decides what to do, and formulates answers Frame construction extracts information from input text Frame construction extracts information from input text Speech recognition converts speech to text Speech recognition converts speech to text Hub Text-to-speech converts output text to audio Text-to-speech converts output text to audio CMU Festival engine, Colorado wrapper MIT SUMMIT engine and wrapper Colorado Phoenix engine, MITRE wrapper Open source PostGres engine, MITRE wrapper

20 Page 19 Communicator Protocol l All communication is in terms of objects, which bear a message type and object type l Messages encoded in XDR (public domain data representation) broker object broker start broker end new message message reply error reply destroy reply postponement disconnect message type broker connection message sizeobject type string integer float frame list integer array (8, 16, 32., 64 bits) float array (32, 64 bits) object data

21 Page 20 Frames and Messages l A frame is an attribute-value structure consisting of a name, a frame type (always a clause), and a collection of pairs of keys and associated typed values (string, integer, float, frame, list, etc.) l Frames can be constructed using API calls or parsed from a string representation {c main :utterance_id 0 :domain “travel” } name frame type keys integer valuestring value l A message is a frame passed between the Hub and a server -A message can be new (initiating an action) or a reply

22 Page 21 Definitions l Abstract Architecture -Components, connections (protocols), and constraints (IEEE definition) -Data/knowledge structures, data flow and protocols, control flow -Consider use cases, e.g., l In-car navigation system l Desktop, kiosk, mobile device interaction l Media conversion

23 Page 22 Requirements l Functional -Modality integration (input and output) -Situation (User, task, application) appropriate real-time sensing/response (e.g., supporting barge-in, perceptual sensing/feedback) -Representation of level of granularity (modules and data structures) -Manage feedback - local and global, when/where? -Support incremental processing -Support incremental development (and scaleability) l System/Technical -Support for processing/fusing multimodal input (e.g., parallel processing) -Modular, composable (possibly distributed processing) -Efficient implementation -Time scale, Temporal and spatial resolution -Accessible (even partial) data structures -Open and extensible protocols

24 Page 23 Components l Media/mode Analysis -Multimodal fusion -Mutual disambiguation and reference resolution l Media/mode Design -Content selection, media design, allocation, coordination, layout l Discourse Management -Attention management -Selects dialogue act/interpretation -Error handling l Context Management -physical/spatial, temporal state l User Modeling -Capabilities, beliefs, intentions -User ID l Knowledge sources, states, histories available to all processes

25 User(s) Information Applications People Media Fusion Interaction Management Intention Recognition Discourse Modeling User Modeling Presentation Design Representation and Inference User Model Discourse Model Domain Model Task Model Media Models Media Analysis Media/Mode Analysis Language Graphics Gesture Biometrics Design Media/Mode Design Language Graphics Gesture Animated Presentation Agent Media Input Processing Media Output Rendering Architecture of the SmartKom Agent (cf. Maybury/Wahlster 1998) Presentation Dialog Control Application Interface Application Interface

26 Information, Applications, People User(s) User Modeling Discourse Management Intention Recognition Interaction Management Media/ Mode Analysis Language Graphics Gesture Sound Media Input Processing Media Output Rendering Architecture Context Management Lexicon Management User ID Biometrics Application Interface Integrate Respond Request Terminate Initiate T A V G G Mode Coordination Presentation Design Multimodal Reference Resolution Multimodal Fusion A A V G G Media/ Mode Design Language Graphics Gesture Sound Animated Presentation Agent Select Content Design Allocate Coordinate Layout User Model Discourse Model Domain Model Media Models Task Model Representation and Inference, States and Histories Application Models Context Model Reference Resolution Action Planning

27 The Architecture Dream Team Schloss Dagshul, Germany October 2001

28 Page 27 Media Fusion Media Fusion Media Analysis Media/Mode Analysis Spoken Language Lip Reading Gesture Media Fusion S V V

29 Page 28 COLLAGEN Sidner et al. Speech interpretation Planning and discourse Agent Application USER Speech Window events Student Model Mel ViaVoice

Download ppt "The Architecture Dream Team Schloss Dagshul, Germany October 2001."

Similar presentations

Ads by Google