Presentation is loading. Please wait.

Presentation is loading. Please wait.

KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology.

Similar presentations


Presentation on theme: "KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology."— Presentation transcript:

1 KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology platform –SesaME - generic dialogue manager

2 Waxholm Olga Gulan August AdApt KTH multimodal dialogue systems

3 The Waxholm system NLP DIALOGUE MANAGEMENT GRAFIK ASR TTS & MULTIMODALA GENT “WIZARD OF OZ” IN OUT LEXICON DATABASES SPEECH

4 Common features built on in-house components –under continuos development limited reuse of software resources during development: –expert knowledge is required –highly labor intensive

5 Atlas

6 Flat model TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR ASVTTS audio device animated agent SQL desktop audio audio coder application, dialog engine

7 Single-layer model TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL desktop audio audio coder application, dialog engine

8 Multi-layer model (1) TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder application, dialog engine

9 TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL component interaction services high-level primitives dialog components speech-tech API desktop audio audio coder application, dialog engine Multi-layer model (2)

10 Components component APIs bridge (J)SAPI ASR pseudo ASR ASR stub Broker, CORBA ASR pseudo ASR stub ? Communicator ASR pseudo ASR

11 Middleware levels (1) Component interaction –resource handling (create, monitor, allocate,..) –media streams (connect, disconnect, split) –representing information (text-hypotheses, syntactic and semantic info, speaker info,...)

12 Middleware levels (2) Services –resource access –play load and send media data make media device(s) render it log the action –say TTS send media data to media device(s) make media device(s) render it log the action

13 Middleware levels (3) Services –listen engage media processors (ASR, ASV, parser, …) make media device record data detect utterance send data in right format to processor(s), file(s), and other objects make processors work wait for processors to finish fuse results and deliver the “answer” log actions and results

14 Middleware levels (4) High-level primitives –ask ‘say’ prompt ‘listen’ to answer give caller full access to processors and their results log actions and results –askSimple same as ask, but returns fused results only

15 Middleware levels (5) Dialog components –user interaction for a special purpose –has domain knowledge –error handling/recovery no answer invalid amount, account, etc. re-ask, formulation variation –can provide help –database lookup –cf. Nuance “SpeechObjects”, Philips “Speech Blocks”,...

16 Middleware levels (6) Dialog components (cont.) –login procedure one or more operations (steps) each step produces or validates speaker hypotheses procedure returns a speaker hypothesis with status includes database lookup, etc. –enrollment procedure special case of login procedure enrollment operation is iterative when asking for data

17 Middleware levels (7) Dialog components (cont.) –“complex question”: –in CTT-bank money amount account name yes/no

18 ATLAS component APIs [atlas.rc.api] component interaction [atlas.rc / media / rc.audio / uinfo] services [atlas.app.SpeechIO / rc.api.AppResources] high-level prim. [atlas.app.SpeechActs] dialog comp. [atlas.login,..] speech-tech API application, dialog engine (atlas.app) [atlas.internal.rc][atlas.broker.rc] [atlas.communicator.rc]

19 Core packages atlas.basic atlas.uinfoatlas.mediaatlas.rc atlas.rc.audio atlas.rc.api atlas.app atlas.terminal ATLAS

20 System model Terminal 1 Terminal 2 Terminal N Application Session Resources

21 Project packages atlas.* atlas.internal.*atlas.broker.* cttbank.*per.* broker.*

22 TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR component APIs ASRASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder CTT-bank, PER Common platform Generic dialogue management? ATLAS

23 SesaME

24 SesaME – the playground focus on simple task oriented dialogues –accessing information (personal, public) –controlling appliances & services hypothesis - task oriented dialogues can be described in a formalised way

25 Common platform TTSASR TTSASR ASVTTS desktop audio animated agent SQL datab. ASR ASVTTS audio device animated agent SQL speech-tech API desktop audio audio coder Generic dialogue manager - SesaME ATLAS Application / Service platform Common platform dialogue descriptions

26 SesaME - goals platform for research & demonstrators dialogue management –task oriented –generic, dynamic –asynchronous support for –multi-domain approach –adaptations & personalisation –user modeling –situation awareness

27 SesaME features: –dynamic plug & play dialogues –modular, agent based architecture –information state approach –event based dialogue management –domain descriptions are based on extended VoiceXML descriptions

28

29 Major components Interaction manager – IM controls the in formation flow interaction management with –system components –user Dialog engine - DE dialogue interpretation Application interface - AI application specific component communication with the application/service

30 On start AI – collects all available – Dialogue Descriptions Dialogue Descriptions represented in an extended VoiceXML formalism –seminar.vxml, meeting.vxml, curs.vxml, visitor.vxml IM - builds a register over available DD –the Dialogue Description Collection DDC –a vector is built on topics and associated keywords –”seminarium”, ”möte”, ”besök”... IM – controls the activation of the DD

31 New utterance ”Jag vill gå på Mats Blombergs seminarium.” Prediction of the most plausible DD - through topic prediction ”seminarium” other mechanism are planed (context, user models) DE activates the chosen DD –seminar.vxml internal data structures – are created DE performs the dialogue interpretation

32 Interaction Manager controls and synchronises the components priority structures topic prediction – predicts which DD to use supervises the DE may suggest plausible parameters based on the context & user models supervises the interaction with the user error detection, management deadline management etc.

33 Interaction Manager – How? event based autonomous modules (software agents) –carry out one atomic task each –are triggered by a set of preconditions –high level of parallelism –concurrency –cooperation centralised information management - blackboard –all information is available for all modules –information is not destroyed –information handling through: prenumerate – notify – fetch mechanism

34 ATLAS Speech Technology API Black board Interaction Manager Dialogue Engine A-Agent Keyword handler VoiceXML notify Dialogue bridge Dialog interpreter VoiceXML activator (JAXB translator) Dialogue description collection Plug & play dialogues Application Interface

35 Dialogue Engine Internal parallel slot structures system prompt acceptable answers reprompts etc. Parallel system slots used for predictions, available for UM, CM Parallel application specific slots related information available for DKM

36 Interpretation go to next empty slot –ask the prompt –interpret the answer fill the slot … or re-prompt if all slots filled - successful transaction AI sends the required parameters, commands to the application eventual next DD is activated unsuccessful transaction the DD with all parameters is saved specific DD for error management is activated error management

37 What is left to be done? NLP analysis to be integrated in Atlas and SesaME NLP generation in SesaME more elaborated dialogue management formalism in SesaME support for adaptation and pesonalisation enabling conversational dialogues

38 The End


Download ppt "KTH speech platform Generic framework –for building demonstrators –for research –built mostly on in-house components Two major components –Atlas – speech-technology."

Similar presentations


Ads by Google