Presentation is loading. Please wait.

Presentation is loading. Please wait.

A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements.

Similar presentations


Presentation on theme: "A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements."— Presentation transcript:

1 a “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements Tim Paek Eric Horvitz Microsoft Research

2 2/25 motivation spoken language interfaces are still very brittle [Parade, Sunday, March 26]

3 3/25 S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score misunderstandings Chicago  Huntsville  no no I’m traveling to Birmingham  the tenth of August  my destination is Birmingham  arrival = {Seoul / 0.65}

4 4/25 / 0.72 / 0.35 / 0.58 / 0.65 / 0.28 confidence score S: What city are you leaving from ? U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon… okay, what day would you be departing Chicago? U: [AT THE TENTH OF AUGUST] S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that? U: [flight destination mr WEEKEND] S: traveling on Saturday, August 12 th … I have a flight departing Chicago at 1:40pm arrives Seoul at …… misunderstandings arrival = {Seoul / 0.65} f arrival = ? arrival = { … } departure = { … }

5 5/25 belief updating: problem statement S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] arrival = {Seoul / 0.65} f arrival = ?  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R)

6 6/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

7 7/25 current solutions S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… what day did you need to travel? U: [THE TRAVELING to berlin P_M] / 0.65 / 0.35 confidence scores / detecting misunderstandings [Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus] / 0.72 detecting corrections [Litman, Swerts, Hirschberg, Krahmer, Levow] arrival = {Seoul / 0.65} f arrival = ?  track single values  use simple heuristic belief updating rules  explicit confirmations yes / no  implicit confirmations new values overwrite old values intro : current solutions : approach : experimental results : global performance : conclusion

8 8/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

9 9/25  given  an initial belief B initial (C) over concept C  a system action SA(C)  a user response R  construct an updated belief  B updated (C) ← f(B initial (C), SA(C), R) S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] belief updating: problem statement / 0.35 arrival = {Seoul / 0.65} arrival = ? f intro : current solutions : approach : experimental results : global performance : conclusion

10 10/25  probability distribution over the set of possible values belief representation B updated (C) ← f(B initial (C), SA(C), R)  however  system “hears” only a small number of conflicting values for a concept throughout a session max = 3 conflicting values heard ABERDEEN, TX ABILENE, TX ALBANY, NY ALBUQUERQUE, NM ALLENTOWN, PAALEXANDRIA, LA ALLAKAKET, AK ALLIANCE, NE ALPENA, MI ALPINE, TX YUMA, AZ departure intro : current solutions : approach : experimental results : global performance : conclusion

11 11/25  compressed belief representation  k hypotheses + other  dynamically add and drop hypotheses  remember m hypotheses, add n new ones (m+n=k) belief representation departure_city [k=3, m=2, n=1] Austin Boston Houstonother S: Did you say you were flying from Austin? U: [NO ASPEN] Aspen S: flying from Aspen… what is your destination? U: [NO NO I DIDN’T THAT THAT] Ø BostonAspenother BostonAustinother B updated (C) ← f(B initial (C), SA(C), R)  B … (C) is a multinomial variable of degree k+1 intro : current solutions : approach : experimental results : global performance : conclusion

12 12/25 request S:When would you like to take this flight? U:Friday [FRIDAY] / 0.65 explicit confirmation S:Did you say you wanted to fly this Friday? U:Yes [GUEST] / 0.30 implicit confirmation S:A flight for Friday … at what time? U:At ten a.m. [AT TEN A_M] / 0.86 no action / unexpected update S:okay. I will complete the reservation. Please tell me your name or say ‘guest user’ if you are not a registered user. U:guest user [THIS TUESDAY] / 0.55 system action B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

13 13/25 acoustic / prosodic acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause lexical number of words, presence of words highly correlated with corrections or acknowledgements grammatical number of slots (new and repeated), goodness-of- parse scores dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity priors priors for concept values confusability how confusable concept values are user response B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

14 14/25 approach  multinomial regression problem  multinomial generalized linear model  sample efficient  stepwise approach  feature selection  one separate model for each system action  B updated (C) ← f SA(C) (B initial (C), R) B updated (C) ← f(B initial (C), SA(C), R) intro : current solutions : approach : experimental results : global performance : conclusion

15 15/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

16 16/25 data  RoomLine  conference room reservations  explicit and implicit confirmations  user study  46 participants  10 scenario-based interactions each  corpus  449 sessions, 8848 user turns  transcribed & annotated misunderstandings, corrections, correct concept values intro : current solutions : approach : experimental results : global performance : conclusion

17 17/25 model performance Model (M) [k=2, all features] initial baseline (i) [error before update] heuristic baseline (h) [error after heuristic update] correction baseline (c) [error if we had perfect correction detection] 30.8 16.1 5.0 6.2 30% 20% 10% 0% ihMc explicit confirm c 30.3 26.0 15.0 21.5 30% 20% 10% 0% ihM implicit confirm 98.2 9.5 5.7 12% 8% 4% 0% ihM request 79.7 44.8 14.8 45% 30% 15% 0% ihM no action intro : current solutions : approach : experimental results : global performance : conclusion

18 18/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

19 19/25 a new user study …  implemented models in the system  2 nd, between-subjects experiment  control: using heuristic update rules  treatment: using belief updating models  40 participants, non-native users improvements more likely at high word-error-rates intro : current solutions : approach : experimental results : global performance : conclusion

20 20/25 effect on task success logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition probability of task success 16% word error rate p=0.009 20%40%60%80%100%0% word error rate 0% 20% 40% 60% 80% 100% 78% 30% word error rate 78% 64% treatment control  logistic ANOVA on task success intro : current solutions : approach : experimental results : global performance : conclusion

21 21/25 how about efficiency?  ANOVA on task duration for successful tasks Duration ← -0.21 + 0.013∙WER - 0.106∙Condition  significant improvement  equivalent to 7.9% absolute reduction in word-error p=0.0003 intro : current solutions : approach : experimental results : global performance : conclusion

22 22/25 outline  introduction  current solutions  approach  experimental results  effects on global performance  conclusion and future work intro : current solutions : approach : experimental results : global performance : conclusion

23 23/25 U: [CHICAGO] S: traveling from Chicago. Where would you like to go? U: [SEOUL] S: traveling to Seoul… What day did you need to travel? U: [THE TRAVELING to berlin P_M] S: traveling in the afternoon. Okay what day would you be departing chicago summary arrival = {Seoul / 0.65} / 0.72 / 0.35 / 0.65 arrival = ? f arrival = { … }departure = { … }  approach for constructing accurate beliefs  integrate information across multiple turns  significant gains in task success and efficiency intro : current solutions : approach : experimental results : global performance : conclusion

24 24/25 other advantages  learns from data  tuned to the domain in which it operates  sample efficient / scalable  local one-turn optimization, concepts are independent  RoomLine operates with 29 concepts cardinality: 2  several hundreds  portable  decoupled from dialog task specification  no assumptions about dialog management intro : current solutions : approach : experimental results : global performance : conclusion

25 25/25 future work  integrate information from n-best list  integrate other high-level knowledge  domain-specific constraints  inter-concept dependencies  investigate technique in other domains intro : current solutions : approach : experimental results : global performance : conclusion

26 26/25 thank you! questions …

27 27/25 improvements at different WER word-error-rate absolute improvement in task success

28 28/25 user study  10 scenarios, fixed order  presented graphically (explained during briefing)  participants compensated per task success

29 29/25 informative features  priors and confusability  initial confidence scores  concept identity  barge-in  expectation match  repeated grammar slots

30 30/25 Models (k=2, runtime features) # The model for the explicit confirm action new_1 other LR_MODEL(EC) k =-15.96 3.61 answer_type[YES] =-12.67 -5.90 answer_type[NO] = 4.55 3.15 answer_type[OTHER] = 1.20 -0.75 concept_id(equip) = 6.96 4.42 i_th_confusability = -3.67 -4.80 ih_diff_lexical_one_word =-15.99 -1.17 lexw1[SMALL] = 17.63 20.26 response_new_hyps_in_selh = 18.85 0.41 END

31 31/25 Models (k=2, runtime features) # The model for the implicit confirm action new_1 other LR_MODEL(IC) mark_confirm = 0.31 -1.74 mark_disconfirm = 3.39 1.57 i_th_conf = 0.39 -3.63 i_th_confusability = -4.17 -4.54 k = -16.83 3.75 lex[THREE] = -2.25 -2.68 response_new_hyps_in_selh = 20.88 1.70 turn_number = 0.01 0.03 END

32 32/25 Models (k=2, runtime features) # The model for the request action new_1 other LR_MODEL(REQ) k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93-13.91 dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71 total_num_parses = -1.06 -0.40 ur_selh_new_1_conf = 4.09 1.76 ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98 ur_selh_new_1_prior_>_1 = -1.00 -6.38 END


Download ppt "A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements."

Similar presentations


Ads by Google