Download presentation
Presentation is loading. Please wait.
2
Belief Updating in Spoken Dialog Systems Dan Bohus www.cs.cmu.edu/~dbohus dbohus@cs.cmu.edu Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15217
3
2 problem spoken language interfaces lack robustness when faced with understanding errors. stems mostly from speech recognition spans most domains and interaction types
4
3 more concretely … S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
5
4 MIS understanding non- and misunderstandings NON understanding S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] S: Sorry I’m not sure I understood what you said. What city are you leaving from ? U: Urbana Champaign [FOR MINUTE SINCE HEY] S: I’m still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from? U: Chicago [CHICAGO] S: traveling from Chicago. Where would you like to go? U: Huntsville [SEOUL] S: traveling to Seoul. What day did you need to travel? U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M] S: traveling in the afternoon. Okay what day would you be departing chicago U: /uh/ the tenth of august [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: my destination is Birmingham [FLIGHT DESTINATION MR WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at ………
6
5 approaches for increasing robustness gracefully handle errors through interaction fix recognition detect the problems develop a set of recovery strategies know how to choose between them ( policy )
7
6 six not-so-easy pieces … detection strategies policy misunderstandingsnon-understandings
8
7 belief updating detection misunderstandings construct more accurate beliefs by integrating information over multiple turns S:Where would you like to go? U:Huntsville [SEOUL / 0.65] S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M / 0.60]
9
8 belief updating: problem statement S: traveling to Seoul. What day did you need to travel? destination = {seoul/0.65} destination = {?} [THE TRAVELING TO BERLIN P_M / 0.60] given: an initial belief P initial (C) over concept C a system action SA a user response R construct an updated belief: P updated (C) ← f (P initial (C), SA, R )
10
9 outline related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
11
10 confidence annotation + heuristic updates confidence annotation traditionally focused on word-level errors [Chase, Cox, Bansal, Ravinshankar] more recently: semantic confidence annotation [Walker, San-Segundo, Bohus] machine learning approach results fairly good, but not perfect heuristic updates explicit confirmation: no → don’t trust ; yes → trust implicit confirmation: no → don’t trust ; o/w → trust suboptimal for several reasons related work : restricted version : data : user response analysis : experiment & results : caveats & future work
12
11 correction detection detect if the user is trying to correct the system [Litman, Swerts, Hirschberg, Krahmer, Levow] machine learning approach features from different knowledge sources in the system results fairly good, but not perfect related work : restricted version : data : user response analysis : experiment & results : caveats & future work
13
12 integration confidence annotation and correction detection are useful tools but separately, neither solves the problem bridge together in a unified approach to accurately track beliefs related work : restricted version : data : user response analysis : experiment & results : caveats & future work
14
13 outline related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
15
14 belief updating: general form given: an initial belief P initial (C) over concept C a system action SA a user response R construct an updated belief: P updated (C) ← f (P initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work
16
15 restricted version: 2 simplifications compact belief system unlikely to “hear” more than 3 or 4 values single vs. multiple recognition results in our data: max = 3 values, only 6.9% have >1 value confidence score of top hypothesis updates after confirmation actions reduced problem ConfTop updated (C) ← f (ConfTop initial (C), SA, R) related work : restricted version : data : user response analysis : experiment & results : caveats & future work
17
16 outline related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
18
17 data collected with RoomLine a phone-based mixed-initiative spoken dialog system conference room reservation search and negotiation explicit and implicit confirmations confidence threshold model (+ some exploration) unplanned implicit confirmations I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one? related work : restricted version : data : user response analysis : experiment & results : caveats & future work
19
18 corpus user study 46 participants (naïve users) 10 scenario-based interactions each compensated per task success corpus 449 sessions, 8848 user turns orthographically transcribed rich annotation: correct concepts, corrections, etc. related work : restricted version : data : user response analysis : experiment & results : caveats & future work
20
19 outline related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
21
20 user response types following Krahmer and Swerts study on Dutch train-table information system 3 user response types YES : yes, right, that’s right, correct, etc. NO : no, wrong, etc. OTHER cross-tabulated against correctness of confirmations related work : restricted version : data : user response analysis : experiment & results : caveats & future work
22
21 user responses to explicit confirmations YESNOOther CORRECT94% [93%]0% [0%]5% [7%] INCORRECT1% [6%]72% [57%]27% [37%] ~10% from transcripts [numbers in brackets from Krahmer&Swerts] from decoded YESNOOther CORRECT87%1%12% INCORRECT1%61%38% related work : restricted version : data : user response analysis : experiment & results : caveats & future work
23
22 other responses to explicit confirmations ~70% users repeat the correct value ~15% users don’t address the question attempt to shift conversation focus User does not correct User corrects CORRECT11590 INCORRECT 29 [10% of incor] 250 [90% of incor] related work : restricted version : data : user response analysis : experiment & results : caveats & future work
24
23 user responses to implicit confirmations YESNOOther CORRECT30% [0%]7% [0%]63% [100%] INCORRECT6% [0%]33% [15%]61% [85%] Transcripts [numbers in brackets from Krahmer&Swerts] Decoded YESNOOther CORRECT28%5%67% INCORRECT7%27%66% related work : restricted version : data : user response analysis : experiment & results : caveats & future work
25
24 ignoring errors in implicit confirmations User does not correct User corrects CORRECT5522 INCORRECT 118 [51% of incor] 111 [49% of incor] users correct later (40% of 118) users interact strategically correct only if essential ~correct latercorrect later ~critical552 critical1447 related work : restricted version : data : user response analysis : experiment & results : caveats & future work
26
25 outline related work a restricted version data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
27
26 machine learning approach need good probability outputs low cross-entropy between model predictions and reality cross-entropy = negative average log posterior logistic regression sample efficient stepwise approach → feature selection logistic model tree for each action root splits on response-type related work : restricted version : data : user response analysis : experiment & results : caveats & future work
28
27 features. target. initial situation initial confidence score concept identity, dialog state, turn number system action other actions performed in parallel features of the user response acoustic / prosodic features lexical features grammatical features dialog-level features target: was the value correct? related work : restricted version : data : user response analysis : experiment & results : caveats & future work
29
28 baselines initial baseline accuracy of system beliefs before the update heuristic baseline accuracy of heuristic rule currently used in the system oracle baseline accuracy if we knew exactly when the user is correcting the system related work : restricted version : data : user response analysis : experiment & results : caveats & future work
30
29 results: explicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
31
30 results: implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
32
31 results: unplanned implicit confirmation Hard error (%)Soft error related work : restricted version : data : user response analysis : experiment & results : caveats & future work
33
32 informative features initial confidence score prosody features barge-in expectation match repeated grammar slots concept id related work : restricted version : data : user response analysis : experiment & results : caveats & future work
34
33 outline related work a reduced version. approach data user response analysis experiments and results some caveats and future work related work : restricted version : data : user response analysis : experiment & results : caveats & future work
35
34 eliminate simplification 1 current restricted version belief = confidence score of top hypothesis only 6.9% of cases had more than 1 hypothesis extend to N hypotheses + 1 (other), where N is a small integer (2 or 3) approach: multinomial generalized linear model use information from multiple recognition hypotheses related work : restricted version : data : user response analysis : experiment & results : caveats & future work
36
35 eliminate simplification 2 current restricted version only updates following system confirmation actions extend to updates after all system actions users might correct the system at any point related work : restricted version : data : user response analysis : experiment & results : caveats & future work
37
36 shameless self promotion detection strategies policy misunderstandingsnon-understandings - rejection threshold adaptation - nonu impact on performance [Interspeech-05] - comparative analysis of 10 recovery strategies [SIGdial-05] - wizard experiment - towards learning nonu recovery policies [Sigdial-05]
38
37 shameless CMU promotion Ananlada (Moss) Chotimongkol automatic concept and task structure acquisition Antoine Raux turn-taking, conversation micro-management Jahanzeb Sherwani multimodal personal information management Satanjeev Banerjee meeting understanding Stefanie Tomko universal speech interface Thomas Harris multi-participant dialog DoD / Young Researchers’ Roundtable
39
38 thankyou!
40
39 a more subtle caveat distribution of training data confidence annotator + heuristic update rules distribution of run-time data confidence annotator + learned model always a problem when interacting with the world hopefully, distribution shift will not cause large degradation in performance remains to validate empirically maybe a bootstrap approach?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.