KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology.

Slides:



Advertisements
Similar presentations
An Application Component Architecture for SIP Jonathan Rosenberg Chief Scientist.
Advertisements

1 Quality of Service Issues Network design and security Lecture 12.
Ch:8 Design Concepts S.W Design should have following quality attribute: Functionality Usability Reliability Performance Supportability (extensibility,
Database Architectures and the Web
Rob Marchand Genesys Telecommunications
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
1 Opentest Architecture Table of Content –The Design Basic Components High-Level Test Architecture Test Flow –Services provided by each Layer Test Mgt.
Managing Complexity: 3rd Generation Speech Applications Roberto Pieraccini August 7, 2006.
Object-Oriented Analysis and Design
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
Chapter 15 Design, Coding, and Testing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Design Document The next step in the Software.
Application architectures
WP1 UGOT demos 2nd year review Saarbrucken Mar 2006.
Course Instructor: Aisha Azeem
Chapter 10: Architectural Design
Application architectures
1 Semester 2 Module 2 Introduction to Routers Yuda college of business James Chen
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
Separating VUI from business logic Caller Experience-centered design approach Alex Kurganov, CTO Parus Interactive
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
The Design Discipline.
Connector Types Interaction services broadly categorize connectors Many details are left unexplained. They fail to provide enough detail to be used in.
Framework: ISA-95 WG We are here User cases Studies
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
DEVSView: A DEVS Visualization Tool Wilson Venhola.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Client Server Technologies Middleware Technologies Ganesh Panchanathan Alex Verstak.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
Automatic Generation Tools UNICOS Application Builder Overview 11/02/2014 Ivan Prieto Barreiro - EN-ICE1.
Pattern Oriented Software Architecture for Networked Objects Based on the book By Douglas Schmidt Michael Stal Hans Roehnert Frank Buschmann.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
CHAPTER FOUR COMPUTER SOFTWARE.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
Systems Analysis and Design in a Changing World, 3rd Edition
Dept. of Computer Science University of Rochester Rochester, NY By: James F. Allen, Donna K. Byron, Myroslava Dzikovska George Ferguson, Lucian Galescu,
Faculty of Applied Engineering and Urban Planning Software Engineering Department Software Engineering Lab Use Cases Faculty of Information system Technology.
EEE440 Computer Architecture
1 Representing New Voice Services and Their Features Ken Turner University of Stirling 11th June 2003.
Enterprise Integration Patterns CS3300 Fall 2015.
CH10 Input/Output DDDData Transfer EEEExternal Devices IIII/O Modules PPPProgrammed I/O IIIInterrupt-Driven I/O DDDDirect Memory.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 13. Review Shared Data Software Architectures – Black board Style architecture.
Abstract A Structured Approach for Modular Design: A Plug and Play Middleware for Sensory Modules, Actuation Platforms, Task Descriptions and Implementations.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
ENTERFACE 08 Project #1 “ MultiParty Communication with a Tour Guide ECA” Final presentation August 29th, 2008.
1 Channel Access Concepts – IHEP EPICS Training – K.F – Aug EPICS Channel Access Concepts Kazuro Furukawa, KEK (Bob Dalesio, LANL)
Chapter – 8 Software Tools.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Chavez, Melesan Karen De Luna, Lin Detera, Patrick Kevin Martinez, Jellene Joy Dental Clinic Database System Functional Requirements.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
Application architectures Advisor : Dr. Moneer Al_Mekhlafi By : Ahmed AbdAllah Al_Homaidi.
M. Caprini IFIN-HH Bucharest DAQ Control and Monitoring - A Software Component Model.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Database Architectures and the Web
Supporting Mobile Collaboration with Service-Oriented Mobile Units
Context-Aware Computing
Deploying and Configuring SSIS Packages
Database Architectures and the Web
Ch > 28.4.
Ch 15 –part 3 -design evaluation
Presentation transcript:

KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology platform –SesaME - generic dialogue manager

Waxholm Olga Gulan August AdApt KTH multimodal dialogue systems

The Waxholm system NLP DIALOGUE MANAGEMENT GRAFIK ASR TTS & MULTIMODALA GENT “WIZARD OF OZ” IN OUT LEXICON DATABASES SPEECH

Common features built on in-house components –under continuos development limited reuse of software resources during development: –expert knowledge is required –highly labor intensive

Atlas

Flat model TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR ASVTTS audio device animated agent SQL desktop audio audio coder application, dialog engine

Single-layer model TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL desktop audio audio coder application, dialog engine

Multi-layer model (1) TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder application, dialog engine

TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL component interaction services high-level primitives dialog components speech-tech API desktop audio audio coder application, dialog engine Multi-layer model (2)

Components component APIs bridge (J)SAPI ASR pseudo ASR ASR stub Broker, CORBA ASR pseudo ASR stub ? Communicator ASR pseudo ASR

Middleware levels (1) Component interaction –resource handling (create, monitor, allocate,..) –media streams (connect, disconnect, split) –representing information (text-hypotheses, syntactic and semantic info, speaker info,...)

Middleware levels (2) Services –resource access –play load and send media data make media device(s) render it log the action –say TTS send media data to media device(s) make media device(s) render it log the action

Middleware levels (3) Services –listen engage media processors (ASR, ASV, parser, …) make media device record data detect utterance send data in right format to processor(s), file(s), and other objects make processors work wait for processors to finish fuse results and deliver the “answer” log actions and results

Middleware levels (4) High-level primitives –ask ‘say’ prompt ‘listen’ to answer give caller full access to processors and their results log actions and results –askSimple same as ask, but returns fused results only

Middleware levels (5) Dialog components –user interaction for a special purpose –has domain knowledge –error handling/recovery no answer invalid amount, account, etc. re-ask, formulation variation –can provide help –database lookup –cf. Nuance “SpeechObjects”, Philips “Speech Blocks”,...

Middleware levels (6) Dialog components (cont.) –login procedure one or more operations (steps) each step produces or validates speaker hypotheses procedure returns a speaker hypothesis with status includes database lookup, etc. –enrollment procedure special case of login procedure enrollment operation is iterative when asking for data

Middleware levels (7) Dialog components (cont.) –“complex question”: –in CTT-bank money amount account name yes/no

ATLAS component APIs [atlas.rc.api] component interaction [atlas.rc / media / rc.audio / uinfo] services [atlas.app.SpeechIO / rc.api.AppResources] high-level prim. [atlas.app.SpeechActs] dialog comp. [atlas.login,..] speech-tech API application, dialog engine (atlas.app) [atlas.internal.rc][atlas.broker.rc] [atlas.communicator.rc]

Core packages atlas.basic atlas.uinfoatlas.mediaatlas.rc atlas.rc.audio atlas.rc.api atlas.app atlas.terminal ATLAS

System model Terminal 1 Terminal 2 Terminal N Application Session Resources

Project packages atlas.* atlas.internal.*atlas.broker.* cttbank.*per.* broker.*

TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder CTT-bank, PER Common platform Generic dialogue management? ATLAS

SesaME

SesaME – the playground focus on simple task oriented dialogues –accessing information (personal, public) –controlling appliances & services hypothesis - task oriented dialogues can be described in a formalised way

Common platform TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR ASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder Generic dialogue manager - SesaME ATLAS Application / Service platform Common platform dialogue descriptions

SesaME - goals platform for research & demonstrators dialogue management –task oriented –generic, dynamic –asynchronous support for –multi-domain approach –adaptations & personalisation –user modeling –situation awareness

SesaME features: –dynamic plug & play dialogues –modular, agent based architecture –information state approach –event based dialogue management –domain descriptions are based on extended VoiceXML descriptions

Major components Interaction manager – IM controls the in formation flow interaction management with –system components –user Dialog engine - DE dialogue interpretation Application interface - AI application specific component communication with the application/service

On start AI – collects all available – Dialogue Descriptions Dialogue Descriptions represented in an extended VoiceXML formalism –seminar.vxml, meeting.vxml, curs.vxml, visitor.vxml IM - builds a register over available DD –the Dialogue Description Collection DDC –a vector is built on topics and associated keywords –”seminarium”, ”möte”, ”besök”... IM – controls the activation of the DD

New utterance ”Jag vill gå på Mats Blombergs seminarium.” Prediction of the most plausible DD - through topic prediction ”seminarium” other mechanism are planed (context, user models) DE activates the chosen DD –seminar.vxml internal data structures – are created DE performs the dialogue interpretation

Interaction Manager controls and synchronises the components priority structures topic prediction – predicts which DD to use supervises the DE may suggest plausible parameters based on the context & user models supervises the interaction with the user error detection, management deadline management etc.

Interaction Manager – How? event based autonomous modules (software agents) –carry out one atomic task each –are triggered by a set of preconditions –high level of parallelism –concurrency –cooperation centralised information management - blackboard –all information is available for all modules –information is not destroyed –information handling through: prenumerate – notify – fetch mechanism

ATLAS Speech Technology API Black board Interaction Manager Dialogue Engine A-Agent Keyword handler VoiceXML notify Dialogue bridge Dialog interpreter VoiceXML activator (JAXB translator) Dialogue description collection Plug & play dialogues Application Interface

Dialogue Engine Internal parallel slot structures system prompt acceptable answers reprompts etc. Parallel system slots used for predictions, available for UM, CM Parallel application specific slots related information available for DKM

Interpretation go to next empty slot –ask the prompt –interpret the answer fill the slot … or re-prompt if all slots filled - successful transaction AI sends the required parameters, commands to the application eventual next DD is activated unsuccessful transaction the DD with all parameters is saved specific DD for error management is activated error management

What is left to be done? NLP analysis to be integrated in Atlas and SesaME NLP generation in SesaME more elaborated dialogue management formalism in SesaME support for adaptation and pesonalisation enabling conversational dialogues

The End