Presentation is loading. Please wait.

Presentation is loading. Please wait.

Manfred Pinkal Course website:

Similar presentations


Presentation on theme: "Manfred Pinkal Course website:"— Presentation transcript:

1 Manfred Pinkal Course website: www.coli.uni-saarland.de/courses/late2
Language Technology II Language-Based Interaction: Dialogue design and evaluation Manfred Pinkal Course website:

2 LaTeII: Language-based Interaction
Outline The Software Development Cycle Dialogue Design Wizard-of-Oz Experiments Dialogue System Evaluation aaa LaTeII: Language-based Interaction Manfred Pinkal

3 The Software Development Cycle
Requirements Analysis Design Implementation Testing and Evaluation Integration Maintenance LaTeII: Language-based Interaction Manfred Pinkal

4 The Software Development Cycle
Requirements Analysis Design Implementation Testing and Evaluation Integration Maintenance LaTeII: Language-based Interaction Manfred Pinkal

5 LaTeII: Language-based Interaction
Outline The Software Development Cycle Dialogue Design Wizard-of-Oz Experiments Dialogue System Evaluation LaTeII: Language-based Interaction Manfred Pinkal

6 Dialogue Design: Overall Aims
Effectiveness (Task Success) Efficiency User Satisfaction LaTeII: Language-based Interaction Manfred Pinkal

7 Dialogue Design: General Steps
The following slides are compiled from slides Rolf Schwitters and Bernd Plannerer 1. Make sure you understand what you are trying to achieve(use scenarios and build a conceptual model). 2. See if you can decompose the task into smaller meaningful subtasks. 3. Identify the information tokens you need for each task or subtask. 4. Decide how you will obtain this information from the user. 5. Sketch a dialogue model that capture this information. 6. Test your dialogue model. 7. Revise the dialogue model and repeat Step 6 … LaTeII: Language-based Interaction Manfred Pinkal

8 Dialogue Design: Principal Decisions
Specification of Target Group and Supported languages Frequency of usage Regional / National Monolingual / multilingual / foreign language speakers Age Environment Quiet Environment: Home, Office Noisy Environment: Car, Outdoor, Noisy Working Environments Choice of Persona and Voice Dialogue Structure LaTeII: Language-based Interaction Manfred Pinkal

9 Dialogue Design: Practical Tips
Guide the user towards responses that maximize – clarity and – unambiguousness. Allow for the user not knowing the active vocabulary the answer to a question or – understanding a question. Guide users toward natural ‘in vocabulary’ responses. Version 1: Welcome to ABC Bank. How can I help you? Version 2: Welcome to ABC Bank. What would you like to do? Version 3: Welcome to ABC Bank. You can check an account balance, transfer funds, or pay a bill. What would you like to do? LaTeII: Language-based Interaction Manfred Pinkal

10 LaTeII: Language-based Interaction
More Practical Tips Do not give too many options at once (maximum 5) Keep prompts brief to encourage the user to be brief. Supply confirmation messages frequently, especially when the cost or likelihood of a recognition error is high. Prefer implicit over explicit grounding. Use recognizer confidence values to avoid unnecessary grounding steps. LaTeII: Language-based Interaction Manfred Pinkal

11 LaTeII: Language-based Interaction
More Practical Tips Assume a frequent user will have a rapid learning curve. Allow shortcuts: Switch to expert mode/ command level. Combine different steps in one. Barge-In Assume errors are the fault of the recognizer, not the user. Allow the user to access (context-sensitive) help at any state. Provide escape commands. Design graceful recovery when the recognizer makes an error. LaTeII: Language-based Interaction Manfred Pinkal

12 LaTeII: Language-based Interaction
Outline The Software Development Cycle Dialogue Design Wizard-of-Oz Experiments Dialogue System Evaluation LaTeII: Language-based Interaction Manfred Pinkal

13 Dialogue Design: General Steps
1. Make sure you understand what you are trying to achieve(use scenarios and build a conceptual model). 2. See if you can decompose the task into smaller meaningful subtasks. 3. Identify the information tokens you need for each task or subtask. 4. Decide how you will obtain this information from the user. 5. Sketch a dialogue model that capture this information. 6. Test your dialogue model. 7. Revise the dialogue model and repeat Step 6 … LaTeII: Language-based Interaction Manfred Pinkal

14 Wizard-of-Oz Experiments
Central parts of the system are simulated by a human "wizard". Experimental WoZ systems allow to test a dialogue system (to some extent) before it has been (fully) implemented, thus uncovering basic problems of the dialogue model. Also, they allow to collect data about dialogue behavior of subjects the used syntax and lexicon (to hand-code language models) speech data (to train statistical language models) at an early stage. LaTeII: Language-based Interaction Manfred Pinkal

15 Wizard-of-Oz Experiments
The WoZ is not just a person in a box: The WoZ system must: perform as poor as a computer: "artificial" speech output by typing and TTS system, simulation of shortcomings in recognition: wizard sees typed input (no prosody), maybe even with simulated recognition failure (e.g., by randomly overwriting words in typed input). perform as efficient as a computer: support of quick database access, complex real time decisions, e.g., by displaying dialogueflow diagram, marking the current state, offering menus with contextually appropriate dialogue moves and system prompts. impose constraints on the options of the wizard (to support impression of artificiality), and allow to vary those constraints (to test different dialogue strategies. log all kinds of data in an appropriate and easily accessible form. LaTeII: Language-based Interaction Manfred Pinkal

16 Wizard-of-Oz Experiments
Ideally, a WoZ system is set up in a modular way, allowing to replace functions contributed by humans subsequently in the course of system implementation. Gradual transition between WoZ and fully artificial system. An example: The DiaMant tool, run in WoZ mode. LaTeII: Language-based Interaction Manfred Pinkal

17 Motivations for WoZ experiments
The original motivation: Eearly testing, avoiding time-consuming and expensive programming. Studying dialogues disregarding the bottle-neck of unreliable speech recognition. Changing conditions: Configuration and design of dialogue systems is becoming comfortable, recognizers are becoming pretty reliable: Are WoZ experiments necessary? Dialogue interaction is becoming increasingly flexible, adaptive, complex. Are WoZ experiments feasible? A shift in motivation: From: exploration of the user's behavior, given constraint and schematic system's behavior To: exploration of alternative wizard's behavior, who is given a range of freedom for his/her reaction. LaTeII: Language-based Interaction Manfred Pinkal

18 LaTeII: Language-based Interaction
An example A WoZ study in the TALK project, Spring 2005 MP3 Player Multi-modal dialogue, language German In-car/in-home scenario Saarland University, DFKI, CLT LaTeII: Language-based Interaction Manfred Pinkal

19 LaTeII: Language-based Interaction
Tasks for the Subjects MP3 domain “in-car” with primary task Lane Change Task (LCT) “in-home” domain without LCT Tasks for the subject: Play a song from the album "New Adventures in Hi-Fi" by REM Find a song with “believe” in the title and play it. Task for the wizard: Help the user reach their goals (Deliberately vague!) LaTeII: Language-based Interaction Manfred Pinkal

20 Goals of WOZ MP3 Experiment
Gather pilot data on human multi-modal turn planning Collect wizard dialogue strategies Collect wizard media allocation decisions Collect wizard speech data Collect user data (speech signals and spontaneous speech) LaTeII: Language-based Interaction Manfred Pinkal

21 LaTeII: Language-based Interaction
User View Primary task: driving Secondary task on second screen: MP3 player LaTeII: Language-based Interaction Manfred Pinkal

22 LaTeII: Language-based Interaction
Video Recording LaTeII: Language-based Interaction Manfred Pinkal

23 LaTeII: Language-based Interaction
DFKI/USAAR WOZ system System features: 14 (via OAA) communicating components distributed over 5 machines (3 windows, 2 linux) Plus LCT on a seperate machine People involved to run an experiment: 5 1 experiment leader 1 wizard 1 subject 2 typists LaTeII: Language-based Interaction Manfred Pinkal

24 LaTeII: Language-based Interaction
Data Flow Wizard Subject graphics synthesized audio data text audio data audio data Typist Typist LaTeII: Language-based Interaction Manfred Pinkal

25 A Walk Through the final turns
Wizard: “Ich zeige Ihnen die Liste an.” I am displaying the list. User: “Ok. Zeige mir bitte das Lied aus dem ausgewählten Album und spiel das vor.” Ok. Please show me that song (“Believe”) from the selected album and play it. LaTeII: Language-based Interaction Manfred Pinkal

26 A Walk Through the Final Turns
Wizard's actions: Database search Select “album presentation” (vs. songs or artists) Select “list presentation” (vs. tables or textual summary) Utterance: “Ich zeige Ihnen die Liste an.” I am displaying the list. Audio is sent to typist Text is sent to speech synthesis User: “Ok. Zeige mir bitte das Lied aus dem ausgewählten Album und spiel das vor.” Ok. Please show me that song (“Believe”) from the selected album and play it. LaTeII: Language-based Interaction Manfred Pinkal

27 LaTeII: Language-based Interaction
Example(1) Wizard says: “Ich zeige Ihnen die Liste an.” (I am displaying the list.) and clicks on the list presentation LaTeII: Language-based Interaction Manfred Pinkal

28 LaTeII: Language-based Interaction
LaTeII: Language-based Interaction Manfred Pinkal

29 LaTeII: Language-based Interaction
LaTeII: Language-based Interaction Manfred Pinkal

30 Options presenter with User-Tab
LaTeII: Language-based Interaction Manfred Pinkal

31 LaTeII: Language-based Interaction
Data Flow Wizard Subject graphics audio data Typist Typist LaTeII: Language-based Interaction Manfred Pinkal

32 Example(2) WizardTypist
types the wizard’s spoken text I am displaying the list. LaTeII: Language-based Interaction Manfred Pinkal

33 LaTeII: Language-based Interaction
Data Flow Wizard Subject graphics synthesized audio data audio data Typist Typist LaTeII: Language-based Interaction Manfred Pinkal

34 LaTeII: Language-based Interaction
Example(3) User Listens to wizard text synthesized by Mary and receives the selected list presentation LaTeII: Language-based Interaction Manfred Pinkal

35 LaTeII: Language-based Interaction
LaTeII: Language-based Interaction Manfred Pinkal

36 LaTeII: Language-based Interaction
Example(4) User Selects one album and says: “Ok. Zeige mir bitte das Lied aus dem aus gewählten Album und spiel das vor.” Ok. Please show me that song (“Believe”) from the selected album and play it. LaTeII: Language-based Interaction Manfred Pinkal

37 Automatically updated wizard screen with check
LaTeII: Language-based Interaction Manfred Pinkal

38 LaTeII: Language-based Interaction
Data Flow Wizard Subject graphics synthesized audio data text audio data audio data Typist Typist LaTeII: Language-based Interaction Manfred Pinkal

39 LaTeII: Language-based Interaction
Example(5) UserTypist Types the user’s spoken text Ok. Please show me that song (“Believe”) from the selected album and play it. LaTeII: Language-based Interaction Manfred Pinkal

40 LaTeII: Language-based Interaction
Data Flow Wizard Subject graphics synthesized audio data text audio data audio data Typist Typist LaTeII: Language-based Interaction Manfred Pinkal

41 LaTeII: Language-based Interaction
Example(6) Wizard Gets a correspondingly updated TextBox Window LaTeII: Language-based Interaction Manfred Pinkal

42 The current experimmental setup
Usability Lab, Building C7 4

43 GUI Development Old: New:

44 LaTeII: Language-based Interaction
Outline The Software Development Cycle Dialogue Design Wizard-of-Oz Experiments Dialogue System Evaluation LaTeII: Language-based Interaction Manfred Pinkal

45 Different levels of evaluation
Technical evaluation Usability evaluation Customer evaluation According to: L. Dybkjaer/ N.Bernsen/ W.Minker, "Overview of evaluation and usability", in: W. Minker et al., Spoken multimodal human-computer dialogue in mobile environments, Springer 2005 LaTeII: Language-based Interaction Manfred Pinkal

46 Different levels of evaluation
Technical evaluation Typically component evaluation (ASR, TTS, Grammar, but e.g.: System robustness) Quantitative and objective, to some extent Usability evaluation Customer evaluation LaTeII: Language-based Interaction Manfred Pinkal

47 Evaluation of ASR Systems
WER Speed (real-time performance) Size of lexicon Perplexity LaTeII: Language-based Interaction Manfred Pinkal

48 LaTeII: Language-based Interaction
Evaluation of TTS Intuitive evaluation by users with respect to intellegibility pleasantness naturalness No objective (though quantitative) criteria, but extremely important for user satisfation LaTeII: Language-based Interaction Manfred Pinkal

49 Different levels of evaluation
Technical evaluation Usability evaluation Evaluation of user satisfaction Typically end-to-end evaluation Mostly subjective and qualitative measures Customer evaluation LaTeII: Language-based Interaction Manfred Pinkal

50 Different levels of evaluation
Technical evaluation Usability evaluation Customer evaluation, including aspects like: Costs Platform compatibility Maintenance LaTeII: Language-based Interaction Manfred Pinkal

51 LaTeII: Language-based Interaction
Usability Evaluation Mostly soft criteria: "Usability Guidelines", best-practice rules, form the basis of expert evaluation or user questionnaires. LaTeII: Language-based Interaction Manfred Pinkal

52 LaTeII: Language-based Interaction
Usability Guidelines … from Dybkjaer et al.: Feedback adequacy: The user must feel confident that the system has understood the information input in the way it was intended … Naturalness of the dialogue structure Sufficiency of interaction guidance Sufficiency of adaptation to user differences LaTeII: Language-based Interaction Manfred Pinkal

53 LaTeII: Language-based Interaction
Usability Evaluation Mostly soft criteria: "Usability Guidelines", best-practice rules, form the basis of expert evaluation or user questionnaires. Hard, measurable criteria often contradict each other: Systems with high task success may lack efficiency, and vice versa. Is it possible to evaluate usability in a objective, predictive, and general way? Is there one (maybe parametrized) measure for User Satisfaction? LaTeII: Language-based Interaction Manfred Pinkal

54 LaTeII: Language-based Interaction
PARADISE An attempt to provide an objective, quantitative, operational basis for qualitative user assessments M. Walker/ D. Litman/C.Kamm/A.Abella: "PARADISE: A framework for evaluating spoken dialogue agents", Proc. of ACL 1997 LaTeII: Language-based Interaction Manfred Pinkal

55 LaTeII: Language-based Interaction
PARADISE: The Idea The top criterion for usability evaluation is user satisfaction – it is an intuitive criterion which can not be directly measured, but is only accessible through qualitative user judgments. User satisfaction is correlated to task success (effectiveness) inversely correlated to the dialogue costs. There are features that can be easily and objectively extracted from dialogue logfiles, which approximate both task success and dialogue costs. LaTeII: Language-based Interaction Manfred Pinkal

56 LaTeII: Language-based Interaction
PARADISE: The Idea Take a set of dialogues produced by interaction of a dialogue system A with different subjects. Let the users assess their satisfaction with the dialogue. Calculate the task success, and read the different measures for dialogue costs off the log-files. Compute the correlation between satisfaction assessment and quantitative measures (via multiple linear regression). Results: Prediction of user satisfaction for new individual dialogues with system A, or or for dialogues with a modified system A'. Comparison of different dialogue systems A and B with respect to user satisfaction. LaTeII: Language-based Interaction Manfred Pinkal

57 PARADISE: The Structure
Maximise user satisfaction Maximise task success Minimize costs Efficiency measures Qualitative measures LaTeII: Language-based Interaction Manfred Pinkal

58 Efficiency and Quality Measures
Efficiency measures Elapsed time System turns User turns Quality measures # of timeout prompts # of rejects # of helps # of cancels # of barge-ins Mean ASR score LaTeII: Language-based Interaction Manfred Pinkal

59 A Measure for Task Success
Option 1: Yes/No evaluation for the complete dialogue Option 2, available for dialog systems using the form-filling paradigm: Let task success be determined by the fields in the form filled with correct values. This and the following 3 slides will not be part of the exam LaTeII: Language-based Interaction Manfred Pinkal

60 Tasks as Attribute-Value Matrices
LaTeII: Language-based Interaction Manfred Pinkal

61 LaTeII: Language-based Interaction
An Instance LaTeII: Language-based Interaction Manfred Pinkal

62 A Measure for Task Success
Identify task success with the  value for agreement between actual and intended values for the AVM ( is usually employed for measuring inter-annotator agreement). P(A) –P(E) 1- P(E) P(A) is the actual relative frequency of coincidence between values, P(E) the expected frequency.  = LaTeII: Language-based Interaction Manfred Pinkal

63 PARADISE: The Structure
Maximise user satisfaction Maximise task success Minimize costs Efficiency measures Qualitative measures LaTeII: Language-based Interaction Manfred Pinkal

64 LaTeII: Language-based Interaction
User Satisfaction Measured by adding the scores assigned to 8 questions by the subjects. LaTeII: Language-based Interaction Manfred Pinkal

65 A user satisfaction questionnaire
Was the system easy to understand? Did the system understand what you said? Was it easy to find the information you wanted? Was the pace of interaction with the system appropriate? Did you know what you could say at each point in the dialogue? LaTeII: Language-based Interaction Manfred Pinkal

66 A user satisfaction questionnaire
How often was the system sluggish and slow to reply to you? Did the system work the way you expected it to? From your current experience with using the system, do you think you would use the system regularly? LaTeII: Language-based Interaction Manfred Pinkal

67 A hypothetical example
This and the following slide will not be part of the exam LaTeII: Language-based Interaction Manfred Pinkal

68 The Performance Function
N is a normalisation function, based on standard deviation, N() is normalised task success N(ci) are the normalised cost factors,  and wi are weights on  and the ci, respectively. LaTeII: Language-based Interaction Manfred Pinkal

69 LaTeII: Language-based Interaction
Comments on PARADISE Criterion for the feature selection is the easy availability of features through log-files. Is it really the interesting features that are selected? There is no strong theoretical foundation for the choice of questions in the user questionnaire. Does the methodology extend to more complex dialogue applications in real-world environments? LaTeII: Language-based Interaction Manfred Pinkal

70 LaTeII: Language-based Interaction
General Comments A trade-off between precision/objectivity and usefulness: PARADISE: (More or less) Precise and objective, but of limited practical use. Evaluation Guidelines: Of some practical use, but not really objective. The most useful device is intuition – If it is, at least in part, an artist's intuition: Dialogue design is art, as well as technology. LaTeII: Language-based Interaction Manfred Pinkal


Download ppt "Manfred Pinkal Course website:"

Similar presentations


Ads by Google