Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various.

Similar presentations


Presentation on theme: "Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various."— Presentation transcript:

1 Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various recovery strategies Dan Bohus Sphinx Lunch Talk Carnegie Mellon University, March 2005

2 2 Non-understandings S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON- understanding  System cannot extract any meaningful information from the user’s turn  How can we prevent non-understandings?  How can we recover from them?  Detection  Set of recovery strategies  Policy for choosing between them review : sources : impact : strategy performance

3 3 Issues under investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

4 4 Data Collection: Experimental Design  Subjects interact over the telephone with RoomLine  Performed 10 of scenario-based tasks  Between-subjects experiment, 2 groups:  Control: system uses a random (uniform) policy for engaging the non-understanding recovery strategies  Wizard: policy is determined at runtime by a human (wizard)  46 subjects, balanced gender x native  449 sessions; 8278 user turns  Sessions transcribed & annotated review : sources : impact : strategy performance

5 5 REPROMPT NOTIFY MOVE-ON HELP REPEAT Non-understanding Strategies S: For when do you need the room? U: [non-understanding] 1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room? 2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … 5. ASK REPEAT (AREP) Could you please repeat that? 6. ASK REPHRASE (ARPH) Could you please try to rephrase that? 7. NOTIFY (NTFY) Sorry, I didn’t catch that... 8. YIELD TURN (YLD) … 9. REPROMPT (RP) For when do you need the conference room? 10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … review : sources : impact : strategy performance

6 6 Issues under Investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

7 7 Communication [Clark, Horvitz, Paek] Conversation Level Intention Level Signal Level Channel Level Channel Recognition ParsingInterpretation End-pointing Goal Semantics TextAudio User System review : sources : impact : strategy performance

8 8 Modeling and Breakdowns Conversation Level Intention Level Signal Level Channel Level Channel Recognition ParsingInterpretation End-pointing Goal Semantics TextAudio User System review : sources : impact : strategy performance

9 9 “Location” & “types” of errors Channel Recognition ParsingInterpretation End-pointing Goal Semantics TextAudio User System Out-of-domain Out-of-application False Rejections Out-of-grammar Out-of-relevance ASR errors accents noises review : sources : impact : strategy performance End-pointer errors

10 10 % of non-understandings Out-of-grammar Out-of-relevance ASR errors accents noises 12.89% 18.59% 8.02% 3.21% 56.05% 3.91% Out-of-domain Out-of-application False Rejections 0.14% review : sources : impact : strategy performance End-pointer errors

11 11 Out-of-application (13% of Nonu)  2 main classes, about equally split  Request for inexistent task functionality  “A room Monday or Tuesday”  “do you have anything anytime Thursday afternoon?”  Request for inexistent “meta” functionality  Corrections: “Can I change the date” “You got the time wrong” “Wrong day”  Q: How to better convey system boundaries?  Q: Extend system language for corrections? review : sources : impact : strategy performance

12 12 Out-of-grammar (8% of Nonu)  Imperfect grammar coverage  “Doesn’t matter”  “It doesn’t matter”  “Internet connection”  “Network connection”  “Vaguely”  “So so” / “Generally” / etc  Q: Bring users in grammar?  Carefully craft & use the “You Can Say” prompts  Q: Extend the grammar?  Online & in an unsupervised fashion? review : sources : impact : strategy performance

13 13 Grammaticality - Summary  It’s important: 25% of non-understandings  Stems (about equally) from:  Requests for inexistent task functionality  Requests for inexistent meta/corrections functionality  Lack of grammar coverage  Solutions  Offline: enlarge grammar, include correction language  Online Carefully design “You Can Say” All You Can Say [Collagen / USI] Unsupervised learning of new grammar expressions review : sources : impact : strategy performance

14 14 All You Can Say  How much of the system functionality is actually used? [under work]  Certain “task” and “meta” aspects of functionality are very rarely or never used UserSystem

15 15 % of non-understandings Out-of-grammar Out-of-relevance ASR errors accents noises 12.89% 18.59% 8.02% 3.21% 56.05% 3.91% Out-of-domain Out-of-application False Rejections 0.14% review : sources : impact : strategy performance End-pointer errors

16 16 Issues under Investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

17 17 Impact on system performance  Logistic regression model  Task Success  % Non-understandings per session  Natives are more likely to succeed at the same non-understandings rate  (Participants in the wizard condition also)  2 nd model (also use Misunderstandings)  Task success  % Non + % Mis  Better fit  Adding native information does not improve model  Non-u on average half as costly review : sources : impact : strategy performance

18 18 Issues under Investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones?  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

19 19 Issues under Investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones?  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

20 20 REPROMPT NOTIFY MOVE-ON HELP REPEAT Non-understanding Strategies S: For when do you need the room? U: [non-understanding] 1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room? 2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am … 5. ASK REPEAT (AREP) Could you please repeat that? 6. ASK REPHRASE (ARPH) Could you please try to rephrase that? 7. NOTIFY (NTFY) Sorry, I didn’t catch that... 8. YIELD TURN (YLD) … 9. REPROMPT (RP) For when do you need the conference room? 10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … review : sources : impact : strategy performance

21 21 How to evaluate performance?  Recovery  Next turn is okay (not a non-understanding, not a misunderstanding)  Finer-grained recovery  Next turn CER  Next turn concept transfer (dialog cost)  Time (+recovery) ??  Time lost: 0 if next turn okay, time lost otherwise  Time to recovery (has some problems)  [More stuff under construction] review : sources : impact : strategy performance

22 22 Which strategies are better? review : sources : impact : strategy performance

23 23 Which strategies are better?  Recovery performance ranked list, based on pair- wise t-tests: RNKMOVEHELPTYCSRPYCSARPHDRPNTFYAREPYLD MOVE1MOVE: ---1.311.331.351.711.81.912.06 HELP2HELP: ------1.551.641.731.87 HELP3TYCS: ------1.51.581.681.81 SIG4RP: --------1.461.58 HELP5YCS: --------1.441.55 SIG6ARPH: --------1.421.53 SIG?DRP: ---------- SIG?NTFY: ---------- SIG?AREP: ---------- SIG?YLD: ----------  CER evaluation shows similar results review : sources : impact : strategy performance

24 24 Which strategies are better? MoveOn ≥ Help > Signal RANKMOVEC1_HELPC1_SIG 1MOVE-1.19*1.65 2C1_HELP--1.38 3C1_SIG--- * p = 0.1089 review : sources : impact : strategy performance

25 25 What is the Impact on User Response?  Labeled user responses in 5 classes: [same tagging scheme as Shin, Choularton]  Answer (1 st )  Repeat  Rephrase  Change  Contradict  Other  Hang-up review : sources : impact : strategy performance

26 26 What is the Impact on User Response?  Labeled user responses in 5 classes: [same tagging scheme as Shin, Choularton]  Answer (1 st )  Repeat  Rephrase  Change  Contradict  Other  Hang-up 17.95% 44.30% 30.70% 3.63% 3.13% review : sources : impact : strategy performance

27 27 Comparing with other systems review : sources : impact : strategy performance

28 28 What responses are the best?  Recovery as a function of response type  Answer (1 st )  Repeat  Rephrase  Change  Contradict  Other  Hang-up 45.45% 39.33% 63.29% 19.05% review : sources : impact : strategy performance

29 29 More to come …  Per-strategy analysis  Barge-in & impact on recovery review : sources : impact : strategy performance

30 30 Issues under Investigation  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones?  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance

31 31 Refining the current set of strategies  Introduce more alternative dialog plans  opportunities for Move-On  “You Can Say”  Carefully tune the prompts  Smarter barge-in control  “All You Can Say”  “Speak shorter”  Anecdotal evidence  to be corroborated by analysis  “Speak louder / go to a quieter place”  Not so much in these experiments, but evidence from Let’s go!  More prevention measures  If someone has troubles, you can give the YCS prompts without waiting for a non-understanding to happen review : sources : impact : strategy performance

32 32 Thank You!!  Data Collection  Detection / Diagnosis  What are the main causes (sources) of non-understandings?  What is their impact on global performance?  Can we diagnose non-understandings at run-time?  Can we optimize the rejection process in a more principled way?  Set of recovery strategies  What is the relative performance of different recovery strategies?  Can we refine current strategies and find new ones?  Policy for choosing between them  Can we improve performance by making smarter choices?  If so, can we learn how to make these smarter choices? review : sources : impact : strategy performance


Download ppt "Sorry, I didn’t catch that … Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various."

Similar presentations


Ads by Google