Presentation is loading. Please wait.

Presentation is loading. Please wait.

Planning, Optimizing, and Characterizing

Similar presentations


Presentation on theme: "Planning, Optimizing, and Characterizing"— Presentation transcript:

1 Planning, Optimizing, and Characterizing
Three approaches to dialogue management Planning, Optimizing, and Characterizing Presented by Lee Becker October 21, 2009

2 Introduction “The real art of conversation is not only to say the right thing at the right place but to leave unsaid the wrong thing at the tempting moment.” – Dorothy Neville

3 Sample Dialogue 1 Tutor: Hi, how are you? 2 Student: Good 3
Excellent, Let’s talk a bit about your experiences with science. Tell me about what you have been doing in science most recently. 4 We’ve been learning about circuits and how to work light bulbs 5 Circuits and working light bulbs, cool. Tell me more about circuits

4 The Dialogue Management Problem
Giving an appropriate response Understanding what was said and how it fits into the overall conversational context Responding with intention Obeying social norms Turn-taking Feedback / Acknowledgment

5 Dialogue as Planning “Words are also actions, and actions are a kind of words.” – Ralph Waldo Emerson

6 System View of Dialogue Management
Dialogue Manager User Utterance Response This is more of a tool view with I/O. An agent can allow for richer interaction

7 Planning Agents Maintain state of the world (beliefs)
Predetermined wants (desires) specify how the world should look Select goals (intentions) Build/Execute Plan Belief Monitoring BDI Architecture

8 Blocks World Example Init(On(A, Table) ^ On(B, Table) ^ On(C, Table)
^ Block(A) ^ Block(B) ^ Block(C) ^ Clear(A) ^ Clear(B) ^ Clear(C)) Goal(On(A,B) ^ On(B,C)) Action( Move(b,x,y)) Precondition: On(b,x) ^ Clear(b) ^ Clear(y) ^ Block(b) ^ (b != x) ^ (b != y) ^ (x != y) Effect: On(b,y) ^ Clear(x) ^ ~On(b,x) ^ ~Clear(y) Action (MoveToTable(b,x)) Precondition: On(b,x) ^ Clear(b) ^ Block(b) ^ (b !=x) Effect: On(b,Table) ^ Clear(x) ^ ~On(b,x) Have a planning example which shows initial state, goal state, and actions. By using forward state-space search or backward state-space search you can find the plan necessary to solve this problem. A B B A B C A C C

9 Speech acts and planning
Planning is intuitive for physical actions How can utterances fit into a plan? “Can you give me the directions to The Med?” “Did you take out the trash?” “I will try my best to be at home for dinner” “I name this ship the "Queen Elizabeth”” Speech Acts (Austin, Searle) Illocutionary Force Performative Action

10 Speech Acts Meet AI Allen, Cohen, and Perrault
Speech Acts Expressed in terms of Preconditions Effects Related to change in agent’s mental states Plans are sequence of speech acts

11 Example Speech Acts REQUEST(speaker, hearer, act):
effect: speaker WANT hearer DO act INFORM(speaker, hearer, proposition): effect: KNOW(hearer, proposition)

12 TRAINS A descendant of the Allen, Cohen, and Perrault BDI+Speech Acts tradition Conversational Agent for Logistics and planning Users converse with a “Manager” to develop a plan of action in the TRAINS domain.

13 Sample TRAINS Scenario
User (1) We have to make OJ. (2) There are oranges at I. (3) And an OJ Factory at B. (4) Engine E3 is scheduled to arrive at I at 3PM (5) Shall we ship the oranges? System (6) Yes, (7) Shall I start loading the oranges in the empty car at I? (8) Yes, and well have E3 pick it up. (9) OK? (10) OK City B OJ Factory Orange Source City I Empty Car Engine E3 Empty car City G Banana Source

14 Deliberative Agent

15 Communicative Agent

16 Discourse Obligations
BDI does not account for what compels one speaker to answer another Two Strangers Example: A: Do you know the time? B: Sure. It’s half past two. Answering Person B’s questions does not help Person A attain his goals.

17 Discourse Obligations
Obligations – Like Speech Acts, produce observable effects on the speaker. Source of obligation Obliged action S1 Accept or Promise A S1 achieve A S1 Request A S2 address Request: accept or reject A S1 Y/N-Question whether P S2 Answer-if P S1 Wh-Question S2 Inform-ref x utterance not understood or incorrect repair utterance S1 Initiate discourse unit S2 acknowledge discourse unit Request Repair of P Repair P Request Acknowledgement of P Acknowledge P

18 Discourse Obligations
Inherent tension between Obligations and Goals Approaches Perform all obligated actions Perform only actions that will lead to desired state A blend of the other two approaches

19 TRAINS Discourse Obligations
loop if system has obligations then address obligations else if system has performable intentions then perform actions else deliberate on goals end if end loop

20 TRAINS Discourse Obligations
Ensure system cooperation even if response is in conflict with the user’s goals Aids in developing mixed-initiative Goal-driven actions  Speaker Led Initiative Obligation-driven actions  Other Led Initiative

21 Mutual Belief and Grounding
Conversational agents do not act in isolation Mental states should account for: Private Beliefs Shared Beliefs In TRAINS Shared Belief needed for: Modeling the domain-plan under-construction Common understanding

22 Mutual Belief and Grounding
Extended Conversation Acts Act Type Description Sample Acts Turn-Taking Maintain, release, or take the turn in the dialogue take-turn, keep-turn Grounding Deal with establishing shared knowledge about the dialogue Repair, acknowledge Core Speech Acts Original illocutionary acts Inform, yes-no-question, accept, request Argumentation Characterize the relationship between utterances Elaborate, Q&A

23 The TRAINS Approach Attempts to capture the processes underlying dialogue via: Speech acts Discourse Obligations Mutual Belief, Grounding Potentially Rigid Rules and logic handcrafted

24 Dialogue as a Markov Decision Process
“When one admits that nothing is certain, one must, I think, also admit that some things are much more nearly certain than others.” – Bertrand Russell

25 Flexible Dialogue Qualities of robust dialogue
Flexible conversation flow Adapted to users’ preferences / skill levels Resilient to errors in understanding The dialogue author’s dilemma: Robustness vs. Effort: Other Issues: Noisy Channels: ASR, NLU Evaluation What is an optimal decision?

26 Dialogue with uncertainty
Markov Decision Process (MDP) Probabilistic Framework Ability to model planning and decision-making over time Based on the Markov Assumption: Future states depend only on the current state Future states independent of other past states

27 Markov Decision Processes
Markov chains with choice! +

28 Markov Decision Processes
Agent based process defined by 4-tuple: S: A set of states describing the agent’s world A: A set of actions the agent may take T: A set of transition probabilities: Pa(s,s’) =P(s’|s,a) = P(st+1|st, at) R: A set of rewards where r(s,a) is the expected, immediate reward the agent receives for taking action a when in state s.

29 Markov Decision Processes
Policy Function π(s) Mapping of states to actions Optimal policy π*(s) yields highest possible cumulative reward MDP with a fixed policy is a Markov Chain Rewards Cumulative Reward Bellman equation is the expected cumulative reward for a given state/action pair is the immediate reward for the current state plus the expected discounted utility of all possible next states s’, weighted by the probability of moving to that state s’, and assuming once there we take optimal action a’ [Jurafsky, Martin 2008]

30 Solving an MDP Approaches: Ideally: In Practice:
Value Iteration, Policy Iteration, Q-Learning Ideally: Encode state space with relevant features and rewards Compute state transition and reward probabilities directly from a corpus of annotated dialogues In Practice: Reduce state space and do random exploration Simulate a user and produce a corpus

31 Reinforcement Learning for Dialogue Policy Design
NJFun System [Singh, et al 2002]

32 NJFun Sample Dialogue S1: Welcome to NJFun. How may I help you?
U1: I’d like to find um winetasting in Lambertville in the morning. (ASR: I’d like to find out wineries the in the Lambertville in the morning) S2: Did you say you are interested in Lambertville? U2: Yes. S3: Did you say you want to go in the morning? U3: Yes. S4: I found a winery near Lambertville that is open in the morning. It is the Poor Richard’s Winery in Lambertville. That you for using the system. Please give me feedback by saying ‘good’, ‘so-so’, or ‘bad’. U4: Good. S5: Goodbye and have a nice day.

33 NJFun – State Spaces Full state space – encode everything
State explosion quickly makes optimization intractable State-space Estimator Encode sufficient information to learn good decisions

34 NJFun – State Space Estimator
Limited policy optimization for two types of decisions Initiative – Direct vs. Open Ended System Initiative: “Please say the name of the town or city that you are interested in.” User Initiative: “Please give me more information.” Confirmation – Verify or Assume “Did you say you are interested in <location>?

35 NJFun State Features & Values
Explanation Greet (G) 0, 1, 2 Whether the system has greeted the user Attribute (A) 1,2,3,4 Which attribute is being worked on Confidence / Confirmed (C) 0,1,2,3,4 0,1,2 for low, medium and high ASR confidence 3,4 for explicitly confirmed and disconfirmed Value (V) 0,1 Whether the value has been obtained for current attribute Tries (T) 0,1,2 How many times current attribute has been asked Grammar (M) Whether non-restrictive or restrictive grammar was used History (H) Whether there was trouble on any previous attribute Grammar = ASR lang. model

36 State Space Estimator Features yield 62 possible dialogue states
42 Choice States each with 2 actions per state Confirm/Not confirm System/User initiative In total 242 unique dialogue trajectories

37 Finding an Optimal Policy
Gathering Training Data New system built with randomized dialogue policy Deployed to 54 users each assigned 6 tasks 311 dialogues in total Reward Measure Binary task completion +1: Dialogues that queried for exact set of attributes (activity type, location, time of day, etc…) -1: Otherwise Reinforcement Learning

38 Finding an Optimal Policy
RL Learned Policy: Initiative Begin with user initiative Back off to mixed or system initiative Confirmation In general confirm at lower confidence ASR values Other features describe more complex interventions

39 Evaluating the Optimal Policy
System with optimal policy tested on additional 21 users 124 test dialogues Did not significantly perform baseline on binary completion measure (p =0.059) Statistically significant improvement using weak completion and ASR measures

40 Limited Observability
MDPs assume the world is fully observable However: Not all errors or states are directly observable Undetected errors may propagate Evidence may not indicate error

41 Limited Observability
Turn Utterance [ASR Output] S1 Are you a registered user? U2 No, I’m not [NO I’M NOT] S3 What is your full name U4 <Username> [ATHENS IN AKRON] S5 There is an Athens in Georgia and in Greece. What destination did you want? U6 Neither [NEITHER] S7 I’m sorry, I did not catch that. There is an Athens in Georgia and in Greece. Which destination do you want? U8 Georgia [GEORGIA] S9 A flight from Athens. . . Where do you want to go? U10 Start over [START OVER]

42 Partially Observable Markov Decision Processes (POMDPS)
Intuition Maintain parallel hypothesis of what was said Backpedal or switch strategies when a hypothesis becomes sufficiently false

43 POMDP Example System / User / [ASR] POMDP belief state
Traditional Method Initial State S M L b Order: { size: <null> } S: How can I help you? U: A small pepperoni pizza. [A small pepperoni pizza.] S M L b Order: { size: small } S: Ok, what toppings? U: A small pepperoni [A small pepperoni] S M L b Order: { size: small } S: And why type of crust? U: Uh just normal [large] normal S M L b Order: { size: large[?] }

44 A comparison of Markov Models
Do we have control over state transitions No Yes Are states completely observable? Markov-Chain MDP HMM POMDP Table courtesy of

45 POMDPs Extends the MDP Model
O: Set of observations agent can receive about the world Z: Observation Probabilities b(s): Belief state, probability of being in state s Not in fixed state, instead maintains a probability distribution over all possible states

46 POMDPs Belief Monitoring
Shifting probability mass to match observations Optimal action depends only on the agent’s current belief state

47 POMDPs Influence Diagram R: Reward A: Action S: State O: Observation
Decisiion cycle: Given the current belief state b, execute the action a = pi*(b) Receive observation o Set the current belief state to forward(b,a,0) and repeat forward = belief state monitoring [Norvig, 2006]

48 POMDPs for Spoken Dialogue Systems
SDS-POMDP [Williams and Young 2007] Claim: POMDPs perform better for SDS because Maintain parallel dialogue state Can incorporate ASR confidence scores directly in the belief state update

49 SDS-POMDP Architecture
Action Y Audio Signal ~A Recognized Action C Confidence S_d dialogue history S_u internal user state

50 SDS-POMDP Components Standard POMDP SDS-POMDP State set S (Sm, Au, Sd)
Observation Set O (~Au, C) Action Set A (Am) Transition Function p(s’|s,a) p(s’u|su,am)p(a’u|s’u,am)p(s’d|a’u,sd, am) Observation Function p(o’|s’,a) p(~a’,c’u|a’u) Reward function r(s,a) r(su, au, sd, am) Belief State b(s) b(su, au, sd) P(s’_u…) user goal model indicates how the user’s goal changes at each time step P(a’_u…) user actin model what actions the user is likely to take P(s’_d…) dialog history model – relevent info about the historical information of dialog

51 SDS-POMDP Experiments
Travel Domain test-bed simulation Users asked series of questions and then finalize ticket purchase 16 available actions (greet, ask-from, ask-to, confirm-to-x, confirm-from-x, submit-x-y…) As well as a Fail action to start over 1945 Total dialogue states

52 SDS-POMDP Experiments
Reward Function Based on task completion and “dialog appropriateness” Confirmation before user references item: -3 Aborting the dialogue: -5 Issuing the correct submit-x-y query: +10 Issuing an incorrect submit-x-y query: -10 All other actions: -1

53 SDS-POMDP Experiments
Finding a policy Created user simulations Handcrafted Probabilities chosen to make user cooperative but varied Model based on real data (10,000 turns) Trained using Perseus (Spaan and Vlassis, 2005) variant of point-based value iteration Significantly outperformed MDP trained for the same domain

54 Markov Decision Processes
MDPs and derivatives provide a rich representation for automatic dialogue planning Probabilistic underpinnings accommodate uncertainty Challenges with scaling to more complex scenarios

55 Learning Dialogue Structure
“Every discourse, even a poetic or oracular sentence, carries with it a system of rules for producing analogous things and thus an outline of methodology.” – Jacques Derrida

56 Learning Dialogue Structure
Approaches thus far Do not address effort to author dialogue states Highly tuned to a specific task Have not addressed similarities from dialogue to dialogue or task to task

57 Learning Dialogue Structure
[Bangalore et al., 2006] Characterize dialogue structure Move towards data-driven creation of SDS components Learn models for predicting dialogue acts and sub-task structure from several corpora of task-driven human-human dialogues

58 Dialogue Tree Structure
Dialogue is an incremental process

59 Utterance analysis and classification
Lowest level of the dialogue structure hierarchy Segmenter splits utterances into individual clauses Syntactic annotation done via a supertagger [Bangalore and Joshi, 1999] and a Tree- Adjoining Grammar (TAG) [Joshi, 1987] operations Give dependency analysis Predicate-argument structure

60 Utterance analysis and classification
Dialogue Act Annotation DAMSL [Core, 1998] too general Utilized acts specific to customer service domain Dialogue Acts: Ask, explain, conversational, request Sub-types: info, order_info, product_info, hello, thanks, repeat, order_status, … Several corpora used to train a dialogue act tagger Features Speaker information Current and Previous Utterances (Word Trigrams) Supertagged Utterance

61 Modeling Subtask Structure
Dialogue Acts and utterances hint at the conversational context Knowing how the utterance fits into the overall flow and sequencing of tasks can help in deciding the next action Two approaches to deriving structure Chunking Parsing

62 Modeling Subtask Structure
Chunk Model Parse Model

63 Modeling Subtask Structure
Chunk Model BIO (beginning, inside, outside) sequence classifier Given a sequence of utterances U = u1, u2, …, un Find the best subtask label ST = {stB, stI, stO} ST*=argmaxSTP(ST|U) Parse Model Like incremental parsing Find most likely plan-tree PT PT*=argmaxSTP(PT|U)

64 Modeling Subtask Structure
Performance on sequence prediction between Chunk and Parse models were comparable Chunk model’s efficiencies better suited for dialogue’s real-time demands Parse model structure provides little extra information

65 Data-Driven Dialogue Characterization
Advantages Does not suffer from issues of scalability or tractability Rapid-prototyping & domain adaptation Drawbacks Cost of collecting and annotating large corpora Generality of technique still unknown

66 Conclusions “We demand rigidly defined areas of doubt and uncertainty.” – Douglas Adams

67 Conclusions Takeaways Planning / TRAINS MDPs
Breakdown of conversational tasks and units Preconditions/Effects of utterances MDPs Statistical mechanism for optimizing dialogue behavior Mathematically grounded model Data-driven Dialogue Characterization Attempt to understand the global flow and structure of conversation

68 Conclusions Drawbacks Planning MDPs
Handcrafted Rigid MDPs State-space quickly explodes Training data is problematic Rewards are handcrafted Data-driven Dialogue Characterization Does not address the decision making process Annotation may be expensive

69 Conclusions Moving forward Perhaps sub-optimal policies are sufficient
Can we move beyond fine-tuning individual dialogue systems? Do the lessons learned from these approaches extend beyond task-oriented dialogue? How does the semantic content of the utterance influence decision making?

70 References [Allen et al., 1995] Allen, J. F., Schubert, L. K., Ferguson, G., Heeman, P., Hwang, C. H., Kato, T., Light, M., Martin, N. G., Miller, B. W., Poesio, M., and Traum, D. R. (1995). The trains project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7. [Bangalore et al., 2006] Bangalore, S., Di Fabbrizio, G., and Stent, A. (2006). Learning the structure of task-driven human-human dialogs. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 201–208, Morristown, NJ, USA. Association for Computational Linguistics. [Singh et al., 2002] Singh, S., Litman, D., Kearns, M., and Walker, M. (2002). Optimizing dialogue management with reinforcement learning: Experiments with the njfun system. Journal of Artificial Intel ligence Research, 16:105–133. [Traum, 1996] Traum, D. R. (1996). Conversational agency: The trains-93 dialogue manager. In In Susann LuperFoy, Anton Nijhholt, and Gert Veldhuijzen van Zanten, editors, Proceedings of Twentieth Workshop on Language Technology, TWLT-II, pages 1–11. [Williams and Young, 2007] Williams, J. D. and Young, S. (2007). Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang., 21(2):393–422.

71 References [Allen and Perrault, 1980] Allen, J. F. and Perrault, C. R. (1980). Analyzing intention in utterances. Artificial Intelligence, 15(3):143–178. [Austin, 1962] Austin, J. (1962). How to Do Things with Words. Harvard University Press. [Bangalore and Joshi, 1999] Bangalore, S. and Joshi, A. K. (1999). Supertagging: an approach to almost parsing. Comput. Linguist., 25(2):237–265. [Bratman et al., 1988] Bratman, M. E., Israel, D., and Pollack, M. (1988). Plans and resource- bounded practical reasoning. Computational Intel ligence, 4:349–355. [Cohen and Perrault, 1979] Cohen, P. R. and Perrault, C. R. (1979). Elements of a plan-based theory of speech acts. Cognitive Science, 3:177–212. [Core, 1998] Core, M. G. (1998). Analyzing and predicting patterns of damsl utterance tags. In In Working Notes AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, pages 18–24. [Joshi, 1987] Joshi, A. K. (1987). Introduction to tree adjoining grammars. In Manaster-Ramer, A., editor, Mathematics of Language. John Benjamins, Amsterdam. [Russell and Norvig, 2003] Russell, S. and Norvig, P. (2003). Artificial Intel ligence A Modern Approach. Pearson Education, Inc. [Searle, 1975] Searle, J. R. (1975). A Taxonomy of Illocutionary Acts. [Traum and Allen, 1994] Traum, D. R. and Allen, J. F. (1994). Discourse obligations in dialogue processing. In 32nd Annual Meeting of the Association for Computational Linguistics, pages 1–8. [Turing, 1950] Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236):433– 460 [Weizenbaum, 1966] Weizenbaum, J. (1966). Eliza - a computer program for the study of natural language communication between man and machine. Communications of the ACM, 9(1):36–45

72 SDS-Components Observation Function
Impossible to calculate directly from data p(o’|s’,a) = p(~a’,c’u|a’u) Instead estimate it with: perr = probability of a speech recognition error


Download ppt "Planning, Optimizing, and Characterizing"

Similar presentations


Ads by Google