Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Belief Updating in Spoken Dialog Systems Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217

2 problem spoken language interfaces lack robustness when faced with understanding errors.  stems mostly from speech recognition  spans most domains and interaction types

3 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

4 MIS understanding non- and misunderstandings NON understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………

5 approaches for increasing robustness  gracefully handle errors through interaction  fix recognition  detect the problems  develop a set of recovery strategies  know how to choose between them ( policy )

6 six not-so-easy pieces … detection strategies policy misunderstandingsnon-understandings

7 belief updating detection misunderstandings  construct more accurate beliefs by integrating information over multiple turns S:Where would you like to go? U:Huntsville [SEOUL / 0.65] S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60]

8 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60]  given:  an initial belief P initial (C) over concept C  a system action SA  a user response R  construct an updated belief:  P updated (C) ← f (P initial (C), SA, R )

9 outline  related work  a restricted version  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

10 confidence annotation + heuristic updates  confidence annotation  traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar]  more recently: semantic confidence annotation [Walker, San-Segundo, Bohus] machine learning approach results fairly good, but not perfect  heuristic updates  explicit confirmation: no → don’t trust ; yes → trust  implicit confirmation: no → don’t trust ; o/w → trust  suboptimal for several reasons related work : restricted version : data : user response analysis : experiment & results : caveats & future work

11 correction detection  detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow]  machine learning approach  features from different knowledge sources in the system  results fairly good, but not perfect related work : restricted version : data : user response analysis : experiment & results : caveats & future work

12 integration  confidence annotation and correction detection are useful tools  but separately, neither solves the problem  bridge together in a unified approach to accurately track beliefs related work : restricted version : data : user response analysis : experiment & results : caveats & future work

14 belief updating: general form  given:  an initial belief P initial (C) over concept C  a system action SA  a user response R  construct an updated belief:  P updated (C) ← f (P initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work

15 restricted version: 2 simplifications  compact belief  system unlikely to “hear” more than 3 or 4 values single vs. multiple recognition results  in our data: max = 3 values, only 6.9% have >1 value  confidence score of top hypothesis  updates after confirmation actions  reduced problem  ConfTop updated (C) ← f (ConfTop initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work

17 data  collected with RoomLine  a phone-based mixed-initiative spoken dialog system  conference room reservation search and negotiation  explicit and implicit confirmations  confidence threshold model (+ some exploration)  unplanned implicit confirmations  I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one? related work : restricted version : data : user response analysis : experiment & results : caveats & future work

18 corpus  user study  46 participants (naïve users)  10 scenario-based interactions each  compensated per task success  corpus  449 sessions, 8848 user turns  orthographically transcribed  rich annotation: correct concepts, corrections, etc. related work : restricted version : data : user response analysis : experiment & results : caveats & future work

20 user response types  following Krahmer and Swerts  study on Dutch train-table information system  3 user response types  YES : yes, right, that’s right, correct, etc.  NO : no, wrong, etc.  OTHER  cross-tabulated against correctness of confirmations related work : restricted version : data : user response analysis : experiment & results : caveats & future work

21 user responses to explicit confirmations YESNOOther CORRECT94% [93%]0% [0%]5% [7%] INCORRECT1% [6%]72% [57%]27% [37%] ~10%  from transcripts [numbers in brackets from Krahmer&Swerts]  from decoded YESNOOther CORRECT87%1%12% INCORRECT1%61%38% related work : restricted version : data : user response analysis : experiment & results : caveats & future work

22 other responses to explicit confirmations  ~70% users repeat the correct value  ~15% users don’t address the question  attempt to shift conversation focus User does not correct User corrects CORRECT11590 INCORRECT 29 [10% of incor] 250 [90% of incor] related work : restricted version : data : user response analysis : experiment & results : caveats & future work

23 user responses to implicit confirmations YESNOOther CORRECT30% [0%]7% [0%]63% [100%] INCORRECT6% [0%]33% [15%]61% [85%]  Transcripts [numbers in brackets from Krahmer&Swerts]  Decoded YESNOOther CORRECT28%5%67% INCORRECT7%27%66% related work : restricted version : data : user response analysis : experiment & results : caveats & future work

24 ignoring errors in implicit confirmations User does not correct User corrects CORRECT5522 INCORRECT 118 [51% of incor] 111 [49% of incor]  users correct later (40% of 118)  users interact strategically  correct only if essential ~correct latercorrect later ~critical552 critical1447 related work : restricted version : data : user response analysis : experiment & results : caveats & future work

26 machine learning approach  need good probability outputs  low cross-entropy between model predictions and reality  cross-entropy = negative average log posterior  logistic regression  sample efficient  stepwise approach → feature selection  logistic model tree for each action  root splits on response-type related work : restricted version : data : user response analysis : experiment & results : caveats & future work

27 features. target.  initial situation  initial confidence score  concept identity, dialog state, turn number  system action  other actions performed in parallel  features of the user response  acoustic / prosodic features  lexical features  grammatical features  dialog-level features  target: was the value correct? related work : restricted version : data : user response analysis : experiment & results : caveats & future work

28 baselines  initial baseline  accuracy of system beliefs before the update  heuristic baseline  accuracy of heuristic rule currently used in the system  oracle baseline  accuracy if we knew exactly when the user is correcting the system related work : restricted version : data : user response analysis : experiment & results : caveats & future work

29 results: explicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

30 results: implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

31 results: unplanned implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work

32 informative features  initial confidence score  prosody features  barge-in  expectation match  repeated grammar slots  concept id related work : restricted version : data : user response analysis : experiment & results : caveats & future work

33 outline  related work  a reduced version. approach  data  user response analysis  experiments and results  some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work

34 eliminate simplification 1  current restricted version  belief = confidence score of top hypothesis  only 6.9% of cases had more than 1 hypothesis  extend to  N hypotheses + 1 (other), where N is a small integer (2 or 3)  approach: multinomial generalized linear model  use information from multiple recognition hypotheses related work : restricted version : data : user response analysis : experiment & results : caveats & future work

35 eliminate simplification 2  current restricted version  only updates following system confirmation actions  extend to  updates after all system actions  users might correct the system at any point related work : restricted version : data : user response analysis : experiment & results : caveats & future work

36 shameless self promotion detection strategies policy misunderstandingsnon-understandings - rejection threshold adaptation - nonu impact on performance [Interspeech-05] - comparative analysis of 10 recovery strategies [SIGdial-05] - wizard experiment - towards learning nonu recovery policies [Sigdial-05]

37 shameless CMU promotion  Ananlada (Moss) Chotimongkol  automatic concept and task structure acquisition  Antoine Raux  turn-taking, conversation micro-management  Jahanzeb Sherwani  multimodal personal information management  Satanjeev Banerjee  meeting understanding  Stefanie Tomko  universal speech interface  Thomas Harris  multi-participant dialog  DoD / Young Researchers’ Roundtable

38 thankyou!

39 a more subtle caveat  distribution of training data  confidence annotator + heuristic update rules  distribution of run-time data  confidence annotator + learned model  always a problem when interacting with the world  hopefully, distribution shift will not cause large degradation in performance  remains to validate empirically  maybe a bootstrap approach?

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Similar presentations

Presentation on theme: "Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,

Similar presentations

Presentation on theme: "Belief Updating in Spoken Dialog Systems Dan Bohus Computer Science Department Carnegie Mellon University Pittsburgh,"— Presentation transcript:

Similar presentations

About project

Feedback