Download presentation
Presentation is loading. Please wait.
Published byLillie Abbot Modified over 9 years ago
1
MIDOS Architecture Meeting Aaron Adler March 2, 2009
2
MIDOS: Multimodal Interactive DialOgue System Architecture and code overview ~45k lines of Java code in the package ~6k lines of C# code Feel free to ask questions
3
Outline Quick motivation High level architecture More details Code layout
4
Multimodal Interaction
5
Motivation Many unspecified parameters – Large cognitive effort to specify all parameters Multimodal interaction – Symmetric Dynamic dialogue – Driven by qualitative physics
6
Motivation Sketching is great for communicating designs – Some things are hard to sketch Use speech – Some details are uncertain Have a conversation Multimodal dialogue – speech and sketching – makes the computer more of a partner
7
User Study 1: Multimodal Device Descriptions Empirical study of informal speech Key results – Disfluencies – Pauses -> topic change – Talking and sketching about the same topics
8
User Study 2: Human Multimodal Dialogue Experimenter, participant Speech, sketching recording, shared sketching surface Key results: – Pen color: refer back, new topic, real world – Disfluent speech – Concurrent speech/sketching about same topics – Simple questions
9
Project Sketch
10
Color Correspondence
11
Color Paths
12
Sketch Modification
13
Architecture JavaC# Sketch input Sketch output Speech recognition Speech synthesis Commands (open, save, etc.) Sketching (himetric units) Speech Physics calculations Question selection Speech / Sketch generation File opening / saving
14
Information Request Flow Physics ? Generates Select and generate question Question Ask Reply Information Request Question Reply Erase the ink that was used in the question and reply Information Request Information Request Information Request Statement Velocity Update Velocity Update Velocity Update Velocity Update Information Request Information Request Information Request
15
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output
16
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output Analyze shapes: Motion paths Conflicting motion Spring directions Analyze shapes: Motion paths Conflicting motion Spring directions
17
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output Analyze physics: Collisions Missing information Under-specification Analyze physics: Collisions Missing information Under-specification
18
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input 1) Pick request 2) Form question 1) Pick request 2) Form question 3) Ask the question
19
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Sketch Output Speech Output 2) Compare to expectations 3) Determine top score 4)Update physics 2) Compare to expectations 3) Determine top score 4)Update physics 1)User reply
20
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output
21
Physics Simulator Qualitative physics – Don’t have exact numbers, do have angles Modest – Won’t be a complete simulation Generate sensible questions – If something is ambiguous can ask the user Goal is to generate dialogues with the user
22
Simulation Method Two possible approaches: – Calculation approach: calculate how far things move directly – Step approach: use small, incremental movements Using the calculation approach
23
Motion Objects can have translational motion OR rotational motion, but not both Translational motion – Velocity angle, or set of angles Rotational motion – Clockwise, counter clockwise, none, unknown Stores velocity and acceleration for each – Acceleration merged with velocity at each time step – Gravity is just a downward acceleration
24
Torque Possibilities ?? Counterclockwise Clockwise Ambiguous – Ask the user
25
Shapes All the objects are turned into polygons – Used for the collision detection and calculations Even ellipses are approximated at polygons Uses polylines for ropes, springs Seems like a reasonable approximation Reduces the number of special cases
26
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output
27
Generating Questions Physics engine determines what information is unknown or incomplete – Information Requests are generated Rank and pick the Information Request to process Generate a question from the Information Request – Generate speech and sketching – Questions have expected speech and sketch responses
28
What Question to Ask? Question applicability governed by: – Current arrangement/state of the shapes – Facts that are currently known – Previous questions (to follow up on conflicts, etc.) Some questions are dependent on facts from answer to previous question – If we know that a collision occurs – Then we can ask about which block hits the other one (the specifics of the collision)
29
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input
30
Multimodal Output Coordinate speech and sketching outputs – AT&T Natural Voices speech synthesizer – Computer generated strokes Simple language to generate output timing – Allow strokes to be drawn during a particular part of the speech utterance
31
What direction does (this shape) rotate in? Which of {these directions} does this shape move in? Multimodal Language Examples Draw one stroke Draw one or more strokes
32
Multimodal Language Examples
33
(These two) (bodies) collide (here.) Where on (this) body does the contact occur? Pause Clear all the strokes
34
Multimodal Output Coordinate speech and sketching outputs – Complex to do by hand – Generate human-like sketch output No low-level access to the speech synthesizer Calculate word and phrase timings ahead of time Language to generate output timing
35
Multimodal Output Language (): match one stroke {}: match one or more strokes automatic: adjust for number agreement Allows the computer to pause where needed to keep speech and sketching synchronized
36
Multimodal Output How to time the outgoing speech and sketching Previously used hand coded timings Too hard to do with complicated or dynamic output
37
Multimodal Output No low-level access to the speech synthesizer – Don’t know when it’s done speaking Calculate word and phrase timings ahead of time – Can’t even just add word timing together Build a little language to help
38
Underestimate or Overestimate Speech Times? Best to underestimate the word timing – why? – If you overestimate, possibly start speaking again too soon – If you underestimate, speech won’t overlap May try to start talking sooner, but it can’t output more than one thing at once
39
Overestimating Speech Estimated Speech Actual Speech Sketching
40
Underestimating Speech Estimated Speech Sketching Actual Speech Estimated Speech
41
Speech Synthesizer AT&T Natural Voices Synthesizer – Easy to integrate – Sounds reasonable Java and C# interfaces – Using C# interface Has documentation and samples
42
Sketch Output Timed display of strokes and speech – Should allow strokes to be drawn during a particular part of the speech utterance Computer-generated strokes – Can be drawn in a specified duration – Circle objects – Points moved so that strokes appear more like human-drawn strokes
43
System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Sketch Output Speech Output
44
Multimodal Input Sketching Input – N-best interpretations (line, arc, polyline, etc.) – Use context to determine type – Compare to expected stroke Speech Input – Microsoft speech recognizer in dictation mode – N-best results list – Compare to expected speech
45
Processing Steps 1.Ask the user a question 2.Match and score the user speech against expected speech 3.Match and score the user sketching against the expected sketching 4.Pick the best scoring combination 5.Evaluate the best scoring combination 1.If successful, continue 2.If not, ask a follow-up question, return to first step 6.Generate statements about the new information and update the system state
46
Speech Recognizer Tried various recognizers Microsoft recognizer proves easiest to get running – Reasonable results – N-best list of results – Easy API from C#
47
Sketch Input Using C# allows the capture of pen pressure data – C# handles the rendering of the stokes Passing the data to Java allows the reuse of code – Saving to XML – Existing sketch recognition code (Simple Classifier)
48
Scoring the Reply, Speech Uses n-best speech results – Decreasing score as you move down the n-best list and decreasing score as you match less of the expected speech Yes it is100 yes it is90 this is80 as it is70 U.S. is60 yes is50 U.S. news40 U.S. in his30 yes in his20 as in his10 Expected Yes Yeah += Yes it is100 yes it is90 this is0 as it is0 U.S. is0 yes is50 U.S. news0 U.S. in his0 yes in his20 as in his0
49
Scoring the Reply, Sketching Uses scores from the sketch recognition results – Decreasing score for worse matches Not expecting particular shapes, Expecting selection, path or location interpretation Polyline100 Line80 Arc60 Complex55 Ellipse0 Path 90 Path 70 Location 60 Selection 20 …
50
Interesting Complications – Optional or addition speech and sketching User provides additional, redundant, extraneous, or conflicting information – Responses that answer multiple questions or provide extra information For example, “This shape moves in this direction, and so does this one.” or “Yes these two shapes collide, and this shape collides into the other one.”
51
Combining Input Modalities Pick best scoring inputs Input timing is critical Recognize conflicting or missing information Update physics or ask for clarification
52
Spring Direction Input Example “It moves in this direction” “It contracts” “It expands”
53
Yes/No Input Table
54
Multimodal Input Table Details Possible operators: – Identity, Not, Param, Star Two lists of operators: – Required, Optional Possible results: – Success, Conflict, Insufficient, None
55
Multimodal Input Table Revisiting user input handling Need to: – Handle different multimodal combinations of input – Recognize conflicting or missing information – Look for input that matches expectations – Look at timing of input Generalizes code and makes it easier to add new questions
56
Spring Direction Input Table
57
Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions
58
Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions
59
Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions
60
Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions
61
Possible Additions Fine tuning modality combination – How long to leave strokes on the screen? Focus the user’s attention Length of the user’s reply Speech lattice? Dialogue management – Picking questions, colors, gestures to use? Initial sketch input (LADDER?)
62
Code In “converse” package Backend: core system, lots of I/O code here Dialogue: core system Display: Java debugging UI Information: the main information request code – Table: code for the input checking table – Requests: the specific information request classes Physics: the qualitative physics simulator, hopefully you don’t need too much of this Sketch: sketch related code Speech: speech related code
63
Code Data all stored in MultimodalActionHistory – Can technically store the audio wav files too? – Stores all the sketching – Could be replayed, haven’t tried though – Could back up to a previous state as all states are stored, but haven't done that either Initializer (Java) times all the speech output and stores to a file – If you add new outgoing speech you’ll get an error if you haven’t run this
64
Code Multimodal output alignment Input speech / sketching in chunks Matching is greedy, can’t match strokes after a {} Chunking gets around this problem. New step: determine number / singular / plural word forms (separate file) Timing is a separate step – Takes into account the time the strokes will take to draw etc., how long the speech will take to speak
65
Java Code display.DesignAssistant: – main class, starts backend backend – process incoming and outgoing messages send messages to C#, get messages back dialogue – speech and stroke scoring. output alignment, dialogue management Display – ``fact'' display -- really statements, code still calls facts – Java editing operations, coordinate conversion
66
Java Code information – types of input, types of responses. information.QuestionGenerator – actually generates the questions – stores the text for the questions – Pie areas, strokes, offset speech and sketching classes help with timing information.Reply – Class for the user reply information.requests – all the information requests information.table – table operators, identity, not, param, star,
67
Java Code (physics) physics – lots of angle stuff, rotation, translation, directions – constraints physics.facts – these might need to get moved – perhaps made more general physics.filters – various sorts for looking at more global properties, collisions, etc physics.modifiers – updating physics physics.projectors – predicting the next state
68
Java Code Sketch – sketched shapes, etc. – Some domain specific stuff in here Speech – recognized / chosen speech
69
C# Code Converse UI solution Communications – file IO / sketch / speech comm with java ConverseUI – the UI itself DesignAssistant (main class) DataProcessing – Queues, Catalog, MUGPoint, MUGStroke, Projection, SolidDraw, etc. – Also has the code that does the stroke generation – Catalog keeps track of what should be displayed and the different types of displayed objects PointConverter – Getting the right timing on incoming stroke points. Speech – Interface with MS speech recognizer
70
Egg Cracker Walkthrough
71
S: U: Hi. Motivation: Get the user used to talking / hearing the system. Verify that it has started up.
72
S: Are all these bodies the same? U: No, just these two. Motivation: Asking about similar objects – detect objects of same size and shape or same components
73
S: Are any of these fixed? U: Only this one. Motivation: Asking if stationary/fixed – Objects are not attached to anything
74
S: ?? I’ve updated the diagram to show that the body is anchored Action: Updated screen to reflect that the block is anchored.
75
S: So is this body touching anything? U: Yes, this one and that one. S: And this one? U: No.
76
S: Is this touching these bodies? U: Yes it is. Motivation: Asking about touching objects – unconnected object, not fixed, close to other objects.
77
S: I’ve updated the position of the body. Action: Update the position of the block so that it is touching the other blocks.
78
S: The block has a pivot and torques in both directions. Is it initially balanced? U: Yes. [Could also ask this later on in the conversation, need to try it out]
79
S: How does this work? U: First this block is pulled up. Motivation: Signaling that the preliminary questions are over; time to talk about the rest of it.
80
S: How far does it move? U: Until the bottom is here. S: Ok. Motivation: motion indicated with no objects blocking indicated path. Need to clarify motion range
81
S: I’ve updated the position of the body. Does the body move this way or that way? Action: Update block position. Motivation: Change of contact from last update
82
U: This way.
83
S: So the body hits this body pushing it to the right, does it hit this surface? U: Maybe – it doesn’t matter. It goes down and hits this platform. Motivation: Confirming assumption derived from physics which was indicated by the collision. Then asking about a surface that is in the path of motion.
84
Note: Round ball going off edge – not sure how the DF change there would work
85
S: So after the collision, it rotates counterclockwise? U: Yes.
86
S: Do these two collide? U: Yes Motivation: Verify collisions
87
S: What direction does this rotate in initially? U: It is balanced to start. And so is this one. Motivation: Unknown rotation/balance 1 2
88
S: The collision causes this to rotate. How far does it rotate? U: It rotates at least this far so that the block falls into here. This rotates too. Motivation: Unknown rotation/balance 1 2
89
S: Does this body collide with this one? U: Yes. Motivation: Verify predicted collision
90
S: Does the body bounce or stop after the collision? U: It stops. Motivation: Ambiguous collision
91
S: What about this body? Where does it end up? U: It doesn’t matter. Motivation: Ambiguous collision
92
S: Is there anything else? U: No that’s it. Action: Moved body off screen
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.