Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003

Increased Robustness in Spoken Dialog Systems2 The problem S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO]

Increased Robustness in Spoken Dialog Systems3 And some statistics  CMU Communicator 66% of the sessions contain a serious misunderstanding 26% result in a complete breakdown in interaction  Remaining 40% are still frustrating experiences … Is this really a big problem? 40% 26% 66% contain misunderstandings 26% Failed Lots of anecdotal evidence

Increased Robustness in Spoken Dialog Systems4 USC study [Shin et al (1)]  Labeled errors and user behavior on Communicator (multi-site) corpus  Average 1.66 error segments/session  78% error segments get back on track  37% of the sessions have errors leading to complete breakdown in interaction More statistics … 37% Failed

Increased Robustness in Spoken Dialog Systems5 Yet more statistics … Utterance level understanding error rates  CMU Communicator32.4% → 66% of sess. [Rudnicky, Bohus et al (2)]  CU Communicator27.5% → … of sess. [Segundo (3)]  HMIHY (ATT)36.5% → … of sess. [Walker (4)]  Jupiter (MIT)28.5% → … of sess. [Hazen (5)]

Increased Robustness in Spoken Dialog Systems6 It is a significant problem ! 60-70% contain misunderstandings 10-30% lead to interaction breakdowns Roughly…

Increased Robustness in Spoken Dialog Systems7 Goal of proposed work sessions containing misunderstandings interaction breakdowns

Increased Robustness in Spoken Dialog Systems8 Outline The problem  Sources of the problem The approach Infrastructure: the RavenClaw framework Proposed work, in detail Discussion

Increased Robustness in Spoken Dialog Systems9 S: Are you a registered user ? U: No I'm not. No [NO I'M NOT NO] S: What is your full name ? U: Adam Schumacher [ATHENS IN AKRON] S: There is an Athens in Georgia and in Greece. Which destination did you want ? U: Neither [NEITHER] S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which destination did you want ? U: Georgia [GEORGIA] S: A flight from Athens... Where do you want to go ? U: Start over [START OVER] S: Do you really want to start over ? U: Yes [YES] S: What city are you leaving from ? U: Hamilton, Ontario [HILTON ONTARIO] S: Sorry, I'm not sure I understood what you said. Where are you leaving from ? U: Hamilton [HILTON] S: I'm still having trouble understanding you. To go on, I need you to answer the following question. What city are you leaving from ? U: Toronto [TORONTO] The problems … in more detail 1 Low accuracy of speech recognition System incorrectly initiates a disambiguation 2 System unable to handle user’s response to disambiguation 3 Implicit verification is confusing 4 Recognition errors 5 6 Only one recovery strategy available to the system: ask user to repeat 7 8

Increased Robustness in Spoken Dialog Systems10 Three contributing factors 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms

Increased Robustness in Spoken Dialog Systems11 Factor 1: Low recognition accuracy ASR still imperfect at best  Variability: environmental, speaker  10-30% WER in spoken language systems Tradeoff : Accuracy vs. System Flexibility Effect: Main source of errors in SDS  WER → most important predictor of user satisfaction [Walker et al (6,7)]  Users prefer less flexible, more accurate systems [Walker et al (8)]

Increased Robustness in Spoken Dialog Systems12 Factor 2: Inability to assess reliability of beliefs Errors typically propagate to the upper levels of the system, leading to:  Non-understandings  Misunderstandings Effect: Misunderstandings are taken as facts and acted upon  At best: extra turns, user-initiated repairs, frustration  At worst: complete breakdown in interaction

Increased Robustness in Spoken Dialog Systems13 Factor 3: Lack of recovery mechanisms Small number of strategies  Implicit and explicit verifications most popular Sub-optimal implementations Triggered in an ad-hoc / heuristic manner Problem is often regarded as an add-on  Non-uniform, domain-specific treatment Effect: Systems prone to complete breakdowns in interaction

Increased Robustness in Spoken Dialog Systems14 Outline The problem Sources of the problem  The approach Infrastructure: the RavenClaw framework Proposed work, in detail Discussion

Increased Robustness in Spoken Dialog Systems15 Three contributing factors … 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms

Increased Robustness in Spoken Dialog Systems16 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms Approach 1

Increased Robustness in Spoken Dialog Systems17 Approach 2 1. Low accuracy of speech recognition 2. Inability to assess reliability of beliefs 3. Lack of efficient error recovery and prevention mechanisms

Increased Robustness in Spoken Dialog Systems18 Why not just fix ASR? ASR performance is improving, but requirements are increasing too ASR will not become perfect anytime soon ASR is not the only source of errors Approach 2: ensure robustness under a large variety of conditions

Increased Robustness in Spoken Dialog Systems19 B.Optimally deploy a set of error prevention and recovery strategies Proposed solution Assuming the inputs are unreliable: A.Make systems able to assess the reliability of their beliefs

Increased Robustness in Spoken Dialog Systems20 B.Optimally deploy a set of error prevention and recovery strategies Proposed solution – more precisely Assuming the inputs are unreliable: 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…

Increased Robustness in Spoken Dialog Systems21 2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Proposed solution – more precisely Assuming the inputs are unreliable: 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc… Do it in a domain-independent manner !

Increased Robustness in Spoken Dialog Systems22 Outline The problem Sources of the problem The approach  Infrastructure: the RavenClaw framework Proposed work, in detail Discussion

Increased Robustness in Spoken Dialog Systems23 The RavenClaw DM framework Dialog Management framework for complex, task-oriented dialog systems Separation between Dialog Task and Generic Conversational Skills  Developer focuses only on Dialog Task description  Dialog Engine automatically ensures a minimum set of conversational skills  Dialog Engine automatically ensures the grounding behaviors

Increased Robustness in Spoken Dialog Systems24 RavenClaw architecture Dialog Task implemented by a hierarchy of agents Information captured in concepts:  Probability distributions over sets of values  Support for belief assessment & grounding mechanisms Communicator Welcome LoginTravelLocals Bye AskRegistered AskName GreetUserGetProfile Leg1 DepartLocationArriveLocation [UserName] [Registered] [Profile] [Departure] [Arrival]

Increased Robustness in Spoken Dialog Systems25 RoomLine LoginRoomLine GetQuery Bye ExecuteQueryDiscussResults Dialog Task Domain-Independent Grounding Grounding Decision Model Grounding Level Strategies/Grounding Actions Optimal action Grounding State Indicators

Increased Robustness in Spoken Dialog Systems26 RavenClaw-based systems LARRI [Symphony] – Language-based Assistant for Retrieval of Repair Information IPA [NASA Ames] – Intelligent Procedure Assistant BusLine [Let’s Go!] – Pittsburgh bus route information RoomLine – conference room reservation at CMU TeamTalk [11-754] – spoken command and control for a team of robots

Increased Robustness in Spoken Dialog Systems27 Outline The problem Sources of the problem The approach Infrastructure: the RavenClaw framework  Proposed work, in detail Discussion

Increased Robustness in Spoken Dialog Systems28 Previous/Proposed Work Overview 2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…

Increased Robustness in Spoken Dialog Systems29 2.Define the grounding actions - error prevention and recovery strategies 3.Create a grounding decision model - decides upon the optimal strategy to employ at a given point Proposed Work, in Detail - Outline 1.Compute grounding state indicators - reliability of beliefs (confidence annotation + updating) - correction detection - goodness-of-dialog metrics - other : user models, etc…

Increased Robustness in Spoken Dialog Systems30 Reliability of beliefs Continuously assess reliability of beliefs Two sub-problems:  Computing the initial confidence in a concept Confidence annotation problem  Update confidence based on events in the dialog User reaction to implicit or explicit verifications Domain reasoning

Increased Robustness in Spoken Dialog Systems31 Confidence annotation Traditionally focused on ASR [Chase(9), …] More recently, interest in CA geared towards use in SDS [Walker(4), Segundo(3), Hazen(5), Rudnicky, Bohus et al (2)]  Utterance-level, Concept-level CA  Integrating multiple features ASR: acoustic & lm scores, lattice, n-best Parser: various measures of parse goodness Dialog Management: state, expectations, history, etc  50% relative improvement in classification error

Increased Robustness in Spoken Dialog Systems32 Confidence annotation – To Do List Improve accuracy even more  More features / Less features / Better features Study transferability across domains  Q: Can we identify a set of features that transfer well?  Q: Can we use un- or semi-supervised learning or bootstrap from little data and an annotator in a different domain?

Increased Robustness in Spoken Dialog Systems33 Confidence updating S: Where are you flying from? Pittsburgh/0.6 + San Francisco/0.8 = ? Pittsburgh/0.6 + Pittsburgh/0.7 = ? Pittsburgh/0.6 + No/0.8 = ? Pittsburgh/0.6 + Impl.Verif + NoContr = ? Pittsburgh/0.6 + Impl.Verif + BargeIn = ? Pittsburgh/0.6 + Impl.Verif + CorrDet/0.68 = ? …………… To my knowledge, not really* studied yet!

Increased Robustness in Spoken Dialog Systems34 Confidence updating – approaches Naïve Bayesian updating  Assumptions do not match reality Analytical model  Set of heuristic / probabilistic rules Data-driven model  Define events as features  Learning task: Initial Conf. + E1 + E2 + E3 … → Current Conf. {1/0} Bypass confidence updating  Keep all events as grounding state indicators (doesn’t lose that much information)

Increased Robustness in Spoken Dialog Systems37 Correction Detection Automatically detect at run-time correction sites or aware sites  Another data-driven classification task Prosodic features, bag-of-words features, lexical markers [Litman(10), Bosch(11), Swerts(12), Lewov(13)] Useful for:  implementation of implicit / explicit verifications  belief assessment / updating  as direct indicator for grounding decisions

Increased Robustness in Spoken Dialog Systems38 Correction Detection – To Do List Build an aware site detector  Q: Can we identify what is the user correcting? Study transferability across domains  Q: Can we identify a set of features that transfer well?  Q: Can we use un- or semi-supervised learning or bootstrap from little data and a detector in a different domain?

Increased Robustness in Spoken Dialog Systems41 Goodness-of-dialog indicators Assessing how well a conversation is advancing Non-understandings  Q: Can we identify the cause?  Q: Can we relate a non-understood utterance to a dialog expectation? Dialog State related indicators / Stay_Here  Q: Can we expand this to some “distance to optimal dialog trace”? Overall confidence in beliefs within topic  Q: How to aggregate? Entropy-based measures? Allow for task-specific metrics of goodness

Increased Robustness in Spoken Dialog Systems44 Grounding Actions Design and evaluate a rich set of strategies for preventing and recovering from errors (both misunderstandings and non- understandings) Current status: few strategies used / analyzed  Explicit verification: “Did you say Pittsburgh?”  Implicit verification: “traveling from Pittsburgh… when do you want to leave?”

Increased Robustness in Spoken Dialog Systems45 Explicit & Implicit Verifications Analysis of user behavior following these 2 strategies [Krahmer(10), Swerts(11)] User behavior is rich, correction detectors are important! Design is important!  Did you say Pittsburgh?  Did you say Pittsburgh? Please respond ‘yes’ or ‘no’.  Do you want to fly from Pittsburgh? Correct implementation & adequate support is important!  Users discovering errors through implicit confirmations are less likely to get back on track … hmm

Increased Robustness in Spoken Dialog Systems46 Strategies for misunderstandings Explicit verification (w/ variants) Implicit verification (w/ variants) Disambiguation  “I’m sorry, are you flying out of Pittsburgh or San Francisco?” Rejection  “I’m not sure I understood what you said. Can you tell me again where are you flying from?”

Increased Robustness in Spoken Dialog Systems47 Strategies for non-understandings - I Lexically entrain  “Right now I need you to tell me the departure city… You can say for instance, ‘I’d like to fly from Pittsburgh’.” Ask repeat  “I’m not sure I understood you. Can you repeat that please?” Ask reformulate  “Can you please rephrase that?” Diagnose  If non-understanding source can be known/estimated, give that information to the user “I can’t hear you very well. Can you please speak closer to the microphone?”

Increased Robustness in Spoken Dialog Systems48 Strategies for non-understandings - II Select alternative plan: Domain specific strategies  E.g. try to get state name first, then city name Establish context (& Confirm context variant)  “Right now I’m trying to gather enough information to make a room reservation. So far I know you want a room on Tuesday. Now I need to know for what time you need the room.” Give targeted help  Give help on the topic / focus of the conversation / estimated user goal Constrain language model / recognition

Increased Robustness in Spoken Dialog Systems49 Strategies for non-understandings - III Switch input modality (i.e. DTMF, pen, etc) Restart topic / backup dialog Start-over Switch to operator Terminate session …

Increased Robustness in Spoken Dialog Systems50 Grounding Strategies – To Do List Design, implement, analyze, iterate  Human-Human dialog analysis  Design the strategies, with variants and appropriate support  Implement in the RavenClaw framework  Perform data-driven analysis: Q: User behaviors Q: Applicability conditions Q: Costs, Success rates

Increased Robustness in Spoken Dialog Systems52 Grounding decision model Decide which is the best grounding action to take at a certain time Goals / Desired properties  Domain Independent  Adaptive Learn and target any dialog performance metric Adjust to large variations in the reliability of inputs Accept any new strategies on the fly  Scalable

Increased Robustness in Spoken Dialog Systems53 Previous work Conversation as action under uncertainty [Horvitz(14), Paek(15)]  Bayesian decision theory with assumed utilities Reinforcement learning in spoken dialog systems [Kearns(16), Singh(17), Pieraccini(18), Litman(19), Walker(20)]  Learning dialog policies Heuristic approaches [add refs]  Predominant in today’s systems

Increased Robustness in Spoken Dialog Systems54 Grounding – Decision Theoretic Approach Given:  Set of states S={s} and a probabilistic model of state given some evidence e, P(s|e)→ grounding state indicators  Set of actions A={a} → grounding actions  Model describing the utility of each action from each state U(s,a) → grounding model Take action that maximizes expected utility: EU(a|e) =  S U(a,s)·p(s|e)

Increased Robustness in Spoken Dialog Systems55 The missing ingredient: Utilities Utilities matrix (S x A) Explicit Verification Implicit Verification No Action Correct0010 Incorrect105-10 Unavailablexx10 Handcraft Learn from data

Increased Robustness in Spoken Dialog Systems56 Learning utilities C IC EV IV NGA U Essentially a POMDP problem  Hidden state Belief dictated by grounding state indicator models  Actions Strategies  Rewards Targeted optimization measures

Increased Robustness in Spoken Dialog Systems57 A possible overall architecture 2 types of grounding models Dealing with misunderstandings, one grounding model per concept Dealing with non-understandings, one grounding model per agent Communicator Welcome LoginTravelLocals Bye AskRegistered AskName GreetUserGetProfile Leg1 DepartLocationArriveLocation [UserName] [Registered] [Profile] [Departure] [Arrival]

Increased Robustness in Spoken Dialog Systems58 A possible overall architecture Q: How to combine the decisions?  Identify a small set of rules E.g.: concepts first, then agents focused-to-top  Hierarchical POMDP approaches ? [Roy, Pineau, Thrun] Communicator Welcome LoginTravelLocals Bye AskRegistered AskName GreetUserGetProfile Leg1 DepartLocationArriveLocation [UserName] [Registered] [Profile] [Departure] [Arrival]

Increased Robustness in Spoken Dialog Systems59 A possible overall architecture Q: Formulate parallel learning problem  Large numbers of small models are good in principle  Need to clearly identify assumptions  Or hierarchical learning problem Communicator Welcome LoginTravelLocals Bye AskRegistered AskName GreetUserGetProfile Leg1 DepartLocationArriveLocation [UserName] [Registered] [Profile] [Departure] [Arrival]

Increased Robustness in Spoken Dialog Systems61 Evaluation sessions containing misunderstandings interaction breakdowns

Increased Robustness in Spoken Dialog Systems62 Evaluation Evaluate proposed framework across a large variety of domains  RoomLine, BusLine, LARRI, TeamTalk, etc… Grounding state indicators evalution  Internal metrics, e.g. accuracy, etc Grounding strategies analysis  Empirical analysis Quantitative assessments: costs, success rates Qualitative insights: user behaviors, best variants

Increased Robustness in Spoken Dialog Systems63 Evaluation Grounding model / framework evaluation (in terms of chosen performance metric) Against expert heuristic strategy Against smaller number of strategies Against non-adaptive system

Increased Robustness in Spoken Dialog Systems64

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Similar presentations

Presentation on theme: "Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003.

Similar presentations

Presentation on theme: "Increased Robustness in Spoken Dialog Systems 1 (roadmap to a thesis proposal) Dan Bohus, SPHINX Lunch, May 2003."— Presentation transcript:

Similar presentations

About project

Feedback