Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Slides:

Advertisements

Similar presentations

( current & future work ) explicit confirmation implicit confirmation unplanned implicit confirmation request constructing accurate beliefs in spoken dialog.

Advertisements

Is This Conversation on Track? Utterance Level Confidence Annotation in the CMU Communicator spoken dialog system Presented by: Dan Bohus

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Key architectural details RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda Dan BohusAlex Rudnicky School of.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Non-Native Users in the Let s Go!! Spoken Dialogue System: Dealing with Linguistic Mismatch Antoine Raux & Maxine Eskenazi Language Technologies Institute.

Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.

Hybrid Context Inconsistency Resolution for Context-aware Services

Imbalanced data David Kauchak CS 451 – Fall 2013.

Error Handling in the RavenClaw Dialog Management Framework Dan Bohus, Alexander I. Rudnicky Computer Science Department, Carnegie Mellon University (

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.

TT Centre for Speech Technology Early error detection on word level Gabriel Skantze and Jens Edlund Centre for Speech Technology.

Multimedia Call: Lessons to be learned from research on instructed SLA by Carol chapelle Iowa State University Daniel, Rania, Alice.

Uncertainty Corpus: Resource to Study User Affect in Complex Spoken Dialogue Systems Kate Forbes-Riley, Diane Litman, Scott Silliman, Amruta Purandare.

An Investigation into Recovering from Non-understanding Errors Dan Bohus Dialogs on Dialogs Reading Group Talk Carnegie Mellon University, October 2004.

constructing accurate beliefs in task-oriented spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University.

Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems Thesis Proposal Dan Bohus Carnegie Mellon University, January 2004 Thesis Committee.

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

Error detection in spoken dialogue systems GSLT Dialogue Systems, 5p Gabriel Skantze TT Centrum för talteknologi.

Belief Updating in Spoken Dialog Systems Dialogs on Dialogs Reading Group June, 2005 Dan Bohus Carnegie Mellon University, January 2004.

NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

What can humans do when faced with ASR errors? Dan Bohus Dialogs on Dialogs Group, October 2003.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

Speech recognition, understanding and conversational interfaces Alexander Rudnicky School of Computer Science

Modeling the Cost of Misunderstandings in the CMU Communicator System Dan BohusAlex Rudnicky School of Computer Science, Carnegie Mellon University, Pittsburgh,

Online supervised learning of non-understanding recovery policies Dan Bohus Computer Science Department Carnegie.

Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.

1 error handling – Higgins / Galatea Dialogs on Dialogs Group July 2005.

ERROR HANDLING IN SPOKEN DIALOGUE SYSTEMS August 28th-31st, 2003 Chateau-d'Oex, Switzerland.

Error Detection in Human-Machine Interaction Dan Bohus DoD Group, Oct 2002.

Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus

A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements.

misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department Carnegie Mellon.

1 Spontaneous-Speech Dialogue System In Limited Domains ( ) Development of an oral human-machine interface, by way of dialogue, for a semantically.

1 Shaping in Speech Graffiti: results from the initial user study Stefanie Tomko Dialogs on Dialogs meeting 10 February 2006.

belief updating in spoken dialog systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA acknowledgements Alex Rudnicky,

“k hypotheses + other” belief updating in spoken dialog systems Dialogs on Dialogs Talk, March 2006 Dan Bohus Computer Science Department

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part I:Issues,Data Collection,Rejection Tuning Dan Bohus Sphinx Lunch.

A principled approach for rejection threshold optimization Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department.

Identification System Errors Guide to Biometrics – Chapter 6 Handbook of Fingerprint Recognition Presented By: Chris Miles.

Beyond Usability: Measuring Speech Application Success Silke Witt-Ehsani, PhD VP, VUI Design Center TuVox.

1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,

Interaction Design Session 12 LBSC 790 / INFM 718B Building the Human-Computer Interface.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.

Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.

Learning from Observations Chapter 18 Through

Adaptive Spoken Dialogue Systems & Computational Linguistics Diane J. Litman Dept. of Computer Science & Learning Research and Development Center University.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Ensemble Methods in Machine Learning

Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,

Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,

circle Spoken Dialogue for the Why2 Intelligent Tutoring System Diane J. Litman Learning Research and Development Center & Computer Science Department.

1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.

Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research

Chapter 6. Data Collection in a Wizard-of-Oz Experiment in Reinforcement Learning for Adaptive Dialogue Systems by: Rieser & Lemon. Course: Autonomous.

Towards Emotion Prediction in Spoken Tutoring Dialogues

Issues in Spoken Dialogue Systems

Spoken Dialogue Systems

Spoken Dialogue Systems

Presentation transcript:

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

2 problem spoken language interfaces lack robustness when faced with understanding errors.  stems mostly from speech recognition  spans most domains and interaction types

3 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

4 MIS understanding non- and misunderstandings NON understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

5 approaches for increasing robustness  gracefully handle errors through interaction  fix recognition  detect the problems  develop a set of recovery strategies  know how to choose between them ( policy )

6 six not-so-easy pieces … detection strategies policy misunderstandingsnon-understandings

7 belief updating detection misunderstandings  construct more accurate beliefs by integrating information over multiple turns S:Where would you like to go? U:Huntsville [SEOUL / 0.65] S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60]

8 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60]  given:  an initial belief P initial (C) over concept C  a system action SA  a user response R  construct an updated belief:  P updated (C) ← f (P initial (C), SA, R )

9 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

10 confidence annotation + heuristic updates  confidence annotation  traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar]  more recently: semantic confidence annotation [Walker, San-Segundo, Bohus] machine learning approach results fairly good, but not perfect  heuristic updates  explicit confirmation: no → don’t trust ; yes → trust  implicit confirmation: no → don’t trust ; o/w → trust  suboptimal for several reasons related work : restricted version : data : user response analysis : experiment & results : caveats & future work

11 correction detection  detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow]  machine learning approach  features from different knowledge sources in the system  results fairly good, but not perfect related work : restricted version : data : user response analysis : experiment & results : caveats & future work

12 integration  confidence annotation and correction detection are useful tools  but separately, neither solves the problem  bridge together in a unified approach to accurately track beliefs related work : restricted version : data : user response analysis : experiment & results : caveats & future work

13 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

14 belief updating: general form  given:  an initial belief P initial (C) over concept C  a system action SA  a user response R  construct an updated belief:  P updated (C) ← f (P initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work

15 restricted version: 2 simplifications  compact belief  system unlikely to “hear” more than 3 or 4 values single vs. multiple recognition results  in our data: max = 3 values, only 6.9% have >1 value  confidence score of top hypothesis  updates after confirmation actions  reduced problem  ConfTop updated (C) ← f (ConfTop initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work

16 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

17 data  collected with RoomLine  a phone-based mixed-initiative spoken dialog system  conference room reservation search and negotiation  explicit and implicit confirmations  confidence threshold model (+ some exploration)  unplanned implicit confirmations  I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one? related work : restricted version : data : user response analysis : experiment & results : caveats & future work

18 corpus  user study  46 participants (naïve users)  10 scenario-based interactions each  compensated per task success  corpus  449 sessions, 8848 user turns  orthographically transcribed  rich annotation: correct concepts, corrections, etc. related work : restricted version : data : user response analysis : experiment & results : caveats & future work

19 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

20 user response types  following Krahmer and Swerts  study on Dutch train-table information system  3 user response types  YES : yes, right, that’s right, correct, etc.  NO : no, wrong, etc.  OTHER  cross-tabulated against correctness of confirmations related work : restricted version : data : user response analysis : experiment & results : caveats & future work

21 user responses to explicit confirmations YESNOOther CORRECT94% [93%]0% [0%]5% [7%] INCORRECT1% [6%]72% [57%]27% [37%] ~10%  from transcripts [numbers in brackets from Krahmer&Swerts]  from decoded YESNOOther CORRECT87%1%12% INCORRECT1%61%38% related work : restricted version : data : user response analysis : experiment & results : caveats & future work

22 other responses to explicit confirmations  ~70% users repeat the correct value  ~15% users don’t address the question  attempt to shift conversation focus User does not correct User corrects CORRECT11590 INCORRECT 29 [10% of incor] 250 [90% of incor] related work : restricted version : data : user response analysis : experiment & results : caveats & future work

23 user responses to implicit confirmations YESNOOther CORRECT30% [0%]7% [0%]63% [100%] INCORRECT6% [0%]33% [15%]61% [85%]  Transcripts [numbers in brackets from Krahmer&Swerts]  Decoded YESNOOther CORRECT28%5%67% INCORRECT7%27%66% related work : restricted version : data : user response analysis : experiment & results : caveats & future work

24 ignoring errors in implicit confirmations User does not correct User corrects CORRECT5522 INCORRECT 118 [51% of incor] 111 [49% of incor]  users correct later (40% of 118)  users interact strategically  correct only if essential ~correct latercorrect later ~critical552 critical1447 related work : restricted version : data : user response analysis : experiment & results : caveats & future work

25 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

26 machine learning approach  need good probability outputs  low cross-entropy between model predictions and reality  cross-entropy = negative average log posterior  logistic regression  sample efficient  stepwise approach → feature selection  logistic model tree for each action  root splits on response-type related work : restricted version : data : user response analysis : experiment & results : caveats & future work

27 features. target.  initial situation  initial confidence score  concept identity, dialog state, turn number  system action  other actions performed in parallel  features of the user response  acoustic / prosodic features  lexical features  grammatical features  dialog-level features  target: was the value correct? related work : restricted version : data : user response analysis : experiment & results : caveats & future work

28 baselines  initial baseline  accuracy of system beliefs before the update  heuristic baseline  accuracy of heuristic rule currently used in the system  oracle baseline  accuracy if we knew exactly when the user is correcting the system related work : restricted version : data : user response analysis : experiment & results : caveats & future work

29 results: explicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

30 results: implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

31 results: unplanned implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

32 informative features  initial confidence score  prosody features  barge-in  expectation match  repeated grammar slots  concept id related work : restricted version : data : user response analysis : experiment & results : caveats & future work

33 outline  related work  a reduced version. approach  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

34 eliminate simplification 1  current restricted version  belief = confidence score of top hypothesis  only 6.9% of cases had more than 1 hypothesis  extend to  N hypotheses + 1 (other), where N is a small integer (2 or 3)  approach: multinomial generalized linear model  use information from multiple recognition hypotheses related work : restricted version : data : user response analysis : experiment & results : caveats & future work

35 eliminate simplification 2  current restricted version  only updates following system confirmation actions  extend to  updates after all system actions  users might correct the system at any point related work : restricted version : data : user response analysis : experiment & results : caveats & future work

36 shameless self promotion detection strategies policy misunderstandingsnon-understandings - rejection threshold adaptation - nonu impact on performance [Interspeech-05] - comparative analysis of 10 recovery strategies [SIGdial-05] - wizard experiment - towards learning nonu recovery policies [Sigdial-05]

37 shameless CMU promotion  Ananlada (Moss) Chotimongkol  automatic concept and task structure acquisition  Antoine Raux  turn-taking, conversation micro-management  Jahanzeb Sherwani  multimodal personal information management  Satanjeev Banerjee  meeting understanding  Stefanie Tomko  universal speech interface  Thomas Harris  multi-participant dialog  DoD / Young Researchers’ Roundtable

38 thankyou!

39 a more subtle caveat  distribution of training data  confidence annotator + heuristic update rules  distribution of run-time data  confidence annotator + learned model  always a problem when interacting with the world  hopefully, distribution shift will not cause large degradation in performance  remains to validate empirically  maybe a bootstrap approach?