Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213

2 systems often do not understand correctly S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON- understanding  System cannot extract any meaningful information from the user’s turn S: What city are you leaving from? U: Birmingham [BERLIN PM]  System extracts incorrect information from the user’s turn MIS- understanding  non-understandings and misunderstandings

3 systems often do not understand correctly S: What city are you leaving from? U: Urbana Champaign [OKAY IN THAT SAME PAY] NON- understanding  System cannot extract any meaningful information from the user’s turn  detection  strategies  policy (knowing how to engage the strategies)  large space of strategies  tradeoffs between them not well understood  typically trivial; although diagnosis is not  simple heuristics: “incremental prompting”

4 questions under investigation  what are the main causes of non-understandings?  how large is their impact on performance?  how do various recovery strategies compare to each other?  what are the relationships between strategies and user behaviors?  can we improve global dialog performance by using a smarter policy?  if yes, can we learn a better policy from data?  data

5 data collection  Roomline  phone-based, mixed-initiative system  conference room reservations  experimental design  control group: uninformed recovery policy  wizard group: recovery policy implemented by wizard  46 participants, first-time users  tasks & experimental procedure  up to 10 scenario-driven interactions

6 non-understanding recovery strategies S: For when do you need the conference room? 1. ASK REPEAT Could you please repeat that? 2. ASK REPHRASE Could you please try to rephrase that? 3. NOTIFY (NTFY) Sorry, I didn’t catch that... 4. YIELD TURN (YLD) … 5. REPROMPT (RP) For when do you need the conference room? 6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room? 8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am … 9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am … 10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …

7 corpus statistics  449 sessions  8278 user turns  utterances transcribed and checked  manual annotations  misunderstandings  correct concept values at each turn  sources of understanding errors  user response-types to recovery strategies

8 questions under investigation  data  what are the main causes of non-understandings?  how large is their impact on performance?  how do various recovery strategies compare to each other?  what are the relationships between strategies and user behaviors?

9 causes of non-understandings conversation level intention level signal level channel level channel Recognition ParsingInterpretation End-pointing Goal Semantics TextAudio user system

10 causes of non-understandings conversation level intention level signal level channel level out-of-application 16% out-of-grammar 16% ASR error 62% endpointer error

11 questions under investigation  data  what are the main causes of non-understandings?  how large is their impact on performance?  how do various recovery strategies compare to each other?  what are the relationships between strategies and user behaviors? data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

12 1 + e -( α + β ·FNON)  logistic regression  P(Task Success) = modeling impact on performance 1

14 strategy performance – recovery rate  overall logistic ANOVA  significant differences in mean recovery rates  all pairs comparison (corrected using FDR) MoveOn Help TerseYouCanSay RePrompt YouCanSay AskRephrase DetailedReprompt Notify AskRepeat Yield recovery rate

16 user response types  tagging scheme by Shin  also used by Choularton, Raux  5 categories  repeat  rephrase  contradict  change  other

17 50% 40% 30% 20% 10% response types after non-understaning 0% rephraserepeat contradictchangeother Pizza (choularton & dale) Communicator (Shin et al.) Roomline (this study)

18 user response types by strategy MoveOn Help TerseYouCanSay RePrompt YouCanSay AskRephrase DetailedReprompt Notify AskRepeat Yield Rephrase Change Repeat Other 100% 80% 60% 40% 20% 0%

19  sources of non-understandings  impact on performance  strategy comparison  user responses summary  can we improve global dialog performance by using a smarter policy?  can we learn a better policy from data?  asr, but also “language” errors → more shaping strategies …  regression model allows better quantitative assessment  help, “move-on” → further investigate “move-on”  margin for improving control over user responses  yes  preliminary results promising …

20 thank you! questions …

21 rejections Figure 3. Misunderstandings and non-understandings before and after rejections Before rejection mechanism After rejection mechanism False rejections Correct rejections

22 strategy performance assessment  recovery rate  recovery utility  weighted sum of correctly and incorrectly acquired concepts  weights are determined in a data-driven fashion  recovery efficiency  also takes time to recovery into account

23 experimental design: scenarios  10 scenarios, fixed order  presented graphically (explained during briefing)

24 strategy pair-wise comparison  recovery performance ranked list, based on pair-wise t-tests: RNKMOVEHELPTYCSRPYCSARPHDRPNTFYAREPYLD MOVE1MOVE: ---1.311.331.351.711.81.912.06 HELP2HELP: ------1.551.641.731.87 HELP3TYCS: ------1.51.581.681.81 SIG4RP: --------1.461.58 HELP5YCS: --------1.441.55 SIG6ARPH: --------1.421.53 SIG?DRP: ---------- SIG?NTFY: ---------- SIG?AREP: ---------- SIG?YLD: ----------  CER evaluation shows similar results

25 recovery for various response-types

27 impact of recovery rate on performance 1 + e -( α + β ·RecoveryRate)  recovery = next turn is correctly understood  P(Task Success) = 1

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

Similar presentations

Presentation on theme: "Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air.

Similar presentations

Presentation on theme: "Sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies Dan Bohuswww.cs.cmu.edu/~dbohus Alexander I. Rudnickywww.cs.cmu.edu/~air."— Presentation transcript:

Similar presentations

About project

Feedback