Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIDOS Architecture Meeting Aaron Adler March 2, 2009.

Similar presentations


Presentation on theme: "MIDOS Architecture Meeting Aaron Adler March 2, 2009."— Presentation transcript:

1 MIDOS Architecture Meeting Aaron Adler March 2, 2009

2 MIDOS: Multimodal Interactive DialOgue System Architecture and code overview ~45k lines of Java code in the package ~6k lines of C# code Feel free to ask questions

3 Outline Quick motivation High level architecture More details Code layout

4 Multimodal Interaction

5 Motivation Many unspecified parameters – Large cognitive effort to specify all parameters Multimodal interaction – Symmetric Dynamic dialogue – Driven by qualitative physics

6 Motivation Sketching is great for communicating designs – Some things are hard to sketch Use speech – Some details are uncertain Have a conversation Multimodal dialogue – speech and sketching – makes the computer more of a partner

7 User Study 1: Multimodal Device Descriptions Empirical study of informal speech Key results – Disfluencies – Pauses -> topic change – Talking and sketching about the same topics

8 User Study 2: Human Multimodal Dialogue Experimenter, participant Speech, sketching recording, shared sketching surface Key results: – Pen color: refer back, new topic, real world – Disfluent speech – Concurrent speech/sketching about same topics – Simple questions

9 Project Sketch

10 Color Correspondence

11 Color Paths

12 Sketch Modification

13 Architecture JavaC# Sketch input Sketch output Speech recognition Speech synthesis Commands (open, save, etc.) Sketching (himetric units) Speech Physics calculations Question selection Speech / Sketch generation File opening / saving

14 Information Request Flow Physics ? Generates Select and generate question Question Ask Reply Information Request Question Reply Erase the ink that was used in the question and reply Information Request Information Request Information Request Statement Velocity Update Velocity Update Velocity Update Velocity Update Information Request Information Request Information Request

15 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output

16 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output Analyze shapes: Motion paths Conflicting motion Spring directions Analyze shapes: Motion paths Conflicting motion Spring directions

17 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output Analyze physics: Collisions Missing information Under-specification Analyze physics: Collisions Missing information Under-specification

18 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input 1) Pick request 2) Form question 1) Pick request 2) Form question 3) Ask the question

19 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Sketch Output Speech Output 2) Compare to expectations 3) Determine top score 4)Update physics 2) Compare to expectations 3) Determine top score 4)Update physics 1)User reply

20 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output

21 Physics Simulator Qualitative physics – Don’t have exact numbers, do have angles Modest – Won’t be a complete simulation Generate sensible questions – If something is ambiguous can ask the user Goal is to generate dialogues with the user

22 Simulation Method Two possible approaches: – Calculation approach: calculate how far things move directly – Step approach: use small, incremental movements Using the calculation approach

23 Motion Objects can have translational motion OR rotational motion, but not both Translational motion – Velocity angle, or set of angles Rotational motion – Clockwise, counter clockwise, none, unknown Stores velocity and acceleration for each – Acceleration merged with velocity at each time step – Gravity is just a downward acceleration

24 Torque Possibilities ?? Counterclockwise Clockwise Ambiguous – Ask the user

25 Shapes All the objects are turned into polygons – Used for the collision detection and calculations Even ellipses are approximated at polygons Uses polylines for ropes, springs Seems like a reasonable approximation Reduces the number of special cases

26 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input Sketch Output Speech Output

27 Generating Questions Physics engine determines what information is unknown or incomplete – Information Requests are generated Rank and pick the Information Request to process Generate a question from the Information Request – Generate speech and sketching – Questions have expected speech and sketch responses

28 What Question to Ask? Question applicability governed by: – Current arrangement/state of the shapes – Facts that are currently known – Previous questions (to follow up on conflicts, etc.) Some questions are dependent on facts from answer to previous question – If we know that a collision occurs – Then we can ask about which block hits the other one (the specifics of the collision)

29 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Speech Input Sketch Input

30 Multimodal Output Coordinate speech and sketching outputs – AT&T Natural Voices speech synthesizer – Computer generated strokes Simple language to generate output timing – Allow strokes to be drawn during a particular part of the speech utterance

31 What direction does (this shape) rotate in? Which of {these directions} does this shape move in? Multimodal Language Examples Draw one stroke Draw one or more strokes

32 Multimodal Language Examples

33 (These two) (bodies) collide (here.) Where on (this) body does the contact occur? Pause Clear all the strokes

34 Multimodal Output Coordinate speech and sketching outputs – Complex to do by hand – Generate human-like sketch output No low-level access to the speech synthesizer Calculate word and phrase timings ahead of time Language to generate output timing

35 Multimodal Output Language (): match one stroke {}: match one or more strokes automatic: adjust for number agreement Allows the computer to pause where needed to keep speech and sketching synchronized

36 Multimodal Output How to time the outgoing speech and sketching Previously used hand coded timings Too hard to do with complicated or dynamic output

37 Multimodal Output No low-level access to the speech synthesizer – Don’t know when it’s done speaking Calculate word and phrase timings ahead of time – Can’t even just add word timing together Build a little language to help

38 Underestimate or Overestimate Speech Times? Best to underestimate the word timing – why? – If you overestimate, possibly start speaking again too soon – If you underestimate, speech won’t overlap May try to start talking sooner, but it can’t output more than one thing at once

39 Overestimating Speech Estimated Speech Actual Speech Sketching

40 Underestimating Speech Estimated Speech Sketching Actual Speech Estimated Speech

41 Speech Synthesizer AT&T Natural Voices Synthesizer – Easy to integrate – Sounds reasonable Java and C# interfaces – Using C# interface Has documentation and samples

42 Sketch Output Timed display of strokes and speech – Should allow strokes to be drawn during a particular part of the speech utterance Computer-generated strokes – Can be drawn in a specified duration – Circle objects – Points moved so that strokes appear more like human-drawn strokes

43 System Components Dialogue Manager Interpret and Score Question Generation Qualitative Physics Simulator Information Request Generator Sketch Output Speech Output

44 Multimodal Input Sketching Input – N-best interpretations (line, arc, polyline, etc.) – Use context to determine type – Compare to expected stroke Speech Input – Microsoft speech recognizer in dictation mode – N-best results list – Compare to expected speech

45 Processing Steps 1.Ask the user a question 2.Match and score the user speech against expected speech 3.Match and score the user sketching against the expected sketching 4.Pick the best scoring combination 5.Evaluate the best scoring combination 1.If successful, continue 2.If not, ask a follow-up question, return to first step 6.Generate statements about the new information and update the system state

46 Speech Recognizer Tried various recognizers Microsoft recognizer proves easiest to get running – Reasonable results – N-best list of results – Easy API from C#

47 Sketch Input Using C# allows the capture of pen pressure data – C# handles the rendering of the stokes Passing the data to Java allows the reuse of code – Saving to XML – Existing sketch recognition code (Simple Classifier)

48 Scoring the Reply, Speech Uses n-best speech results – Decreasing score as you move down the n-best list and decreasing score as you match less of the expected speech Yes it is100 yes it is90 this is80 as it is70 U.S. is60 yes is50 U.S. news40 U.S. in his30 yes in his20 as in his10 Expected Yes Yeah += Yes it is100 yes it is90 this is0 as it is0 U.S. is0 yes is50 U.S. news0 U.S. in his0 yes in his20 as in his0

49 Scoring the Reply, Sketching Uses scores from the sketch recognition results – Decreasing score for worse matches Not expecting particular shapes, Expecting selection, path or location interpretation Polyline100 Line80 Arc60 Complex55 Ellipse0 Path 90 Path 70 Location 60 Selection 20 …

50 Interesting Complications – Optional or addition speech and sketching User provides additional, redundant, extraneous, or conflicting information – Responses that answer multiple questions or provide extra information For example, “This shape moves in this direction, and so does this one.” or “Yes these two shapes collide, and this shape collides into the other one.”

51 Combining Input Modalities Pick best scoring inputs Input timing is critical Recognize conflicting or missing information Update physics or ask for clarification

52 Spring Direction Input Example “It moves in this direction” “It contracts” “It expands”

53 Yes/No Input Table

54 Multimodal Input Table Details Possible operators: – Identity, Not, Param, Star Two lists of operators: – Required, Optional Possible results: – Success, Conflict, Insufficient, None

55 Multimodal Input Table Revisiting user input handling Need to: – Handle different multimodal combinations of input – Recognize conflicting or missing information – Look for input that matches expectations – Look at timing of input Generalizes code and makes it easier to add new questions

56 Spring Direction Input Table

57 Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions

58 Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions

59 Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions

60 Dynamic Dialogue Computer driven interaction – No predetermined conversation – Questions determined by physics – Differs from other systems where user asks questions

61 Possible Additions Fine tuning modality combination – How long to leave strokes on the screen? Focus the user’s attention Length of the user’s reply Speech lattice? Dialogue management – Picking questions, colors, gestures to use? Initial sketch input (LADDER?)

62 Code In “converse” package Backend: core system, lots of I/O code here Dialogue: core system Display: Java debugging UI Information: the main information request code – Table: code for the input checking table – Requests: the specific information request classes Physics: the qualitative physics simulator, hopefully you don’t need too much of this Sketch: sketch related code Speech: speech related code

63 Code Data all stored in MultimodalActionHistory – Can technically store the audio wav files too? – Stores all the sketching – Could be replayed, haven’t tried though – Could back up to a previous state as all states are stored, but haven't done that either Initializer (Java) times all the speech output and stores to a file – If you add new outgoing speech you’ll get an error if you haven’t run this

64 Code Multimodal output alignment Input speech / sketching in chunks Matching is greedy, can’t match strokes after a {} Chunking gets around this problem. New step: determine number / singular / plural word forms (separate file) Timing is a separate step – Takes into account the time the strokes will take to draw etc., how long the speech will take to speak

65 Java Code display.DesignAssistant: – main class, starts backend backend – process incoming and outgoing messages send messages to C#, get messages back dialogue – speech and stroke scoring. output alignment, dialogue management Display – ``fact'' display -- really statements, code still calls facts – Java editing operations, coordinate conversion

66 Java Code information – types of input, types of responses. information.QuestionGenerator – actually generates the questions – stores the text for the questions – Pie areas, strokes, offset speech and sketching classes help with timing information.Reply – Class for the user reply information.requests – all the information requests information.table – table operators, identity, not, param, star,

67 Java Code (physics) physics – lots of angle stuff, rotation, translation, directions – constraints physics.facts – these might need to get moved – perhaps made more general physics.filters – various sorts for looking at more global properties, collisions, etc physics.modifiers – updating physics physics.projectors – predicting the next state

68 Java Code Sketch – sketched shapes, etc. – Some domain specific stuff in here Speech – recognized / chosen speech

69 C# Code Converse UI solution Communications – file IO / sketch / speech comm with java ConverseUI – the UI itself DesignAssistant (main class) DataProcessing – Queues, Catalog, MUGPoint, MUGStroke, Projection, SolidDraw, etc. – Also has the code that does the stroke generation – Catalog keeps track of what should be displayed and the different types of displayed objects PointConverter – Getting the right timing on incoming stroke points. Speech – Interface with MS speech recognizer

70 Egg Cracker Walkthrough

71 S: U: Hi. Motivation: Get the user used to talking / hearing the system. Verify that it has started up.

72 S: Are all these bodies the same? U: No, just these two. Motivation: Asking about similar objects – detect objects of same size and shape or same components

73 S: Are any of these fixed? U: Only this one. Motivation: Asking if stationary/fixed – Objects are not attached to anything

74 S: ?? I’ve updated the diagram to show that the body is anchored Action: Updated screen to reflect that the block is anchored.

75 S: So is this body touching anything? U: Yes, this one and that one. S: And this one? U: No.

76 S: Is this touching these bodies? U: Yes it is. Motivation: Asking about touching objects – unconnected object, not fixed, close to other objects.

77 S: I’ve updated the position of the body. Action: Update the position of the block so that it is touching the other blocks.

78 S: The block has a pivot and torques in both directions. Is it initially balanced? U: Yes. [Could also ask this later on in the conversation, need to try it out]

79 S: How does this work? U: First this block is pulled up. Motivation: Signaling that the preliminary questions are over; time to talk about the rest of it.

80 S: How far does it move? U: Until the bottom is here. S: Ok. Motivation: motion indicated with no objects blocking indicated path. Need to clarify motion range

81 S: I’ve updated the position of the body. Does the body move this way or that way? Action: Update block position. Motivation: Change of contact from last update

82 U: This way.

83 S: So the body hits this body pushing it to the right, does it hit this surface? U: Maybe – it doesn’t matter. It goes down and hits this platform. Motivation: Confirming assumption derived from physics which was indicated by the collision. Then asking about a surface that is in the path of motion.

84 Note: Round ball going off edge – not sure how the DF change there would work

85 S: So after the collision, it rotates counterclockwise? U: Yes.

86 S: Do these two collide? U: Yes Motivation: Verify collisions

87 S: What direction does this rotate in initially? U: It is balanced to start. And so is this one. Motivation: Unknown rotation/balance 1 2

88 S: The collision causes this to rotate. How far does it rotate? U: It rotates at least this far so that the block falls into here. This rotates too. Motivation: Unknown rotation/balance 1 2

89 S: Does this body collide with this one? U: Yes. Motivation: Verify predicted collision

90 S: Does the body bounce or stop after the collision? U: It stops. Motivation: Ambiguous collision

91 S: What about this body? Where does it end up? U: It doesn’t matter. Motivation: Ambiguous collision

92 S: Is there anything else? U: No that’s it. Action: Moved body off screen


Download ppt "MIDOS Architecture Meeting Aaron Adler March 2, 2009."

Similar presentations


Ads by Google