Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emergence of Gricean Maxims from Multi-agent Decision Theory Adam Vogel Stanford NLP Group Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky.

Similar presentations


Presentation on theme: "Emergence of Gricean Maxims from Multi-agent Decision Theory Adam Vogel Stanford NLP Group Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky."— Presentation transcript:

1 Emergence of Gricean Maxims from Multi-agent Decision Theory Adam Vogel Stanford NLP Group Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky

2 Decision-Theoretic Pragmatics Gricean cooperative principle: Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged.

3 Decision-Theoretic Pragmatics Gricean Maxims: Be truthful: speak with evidence Be relevant: speak in accordance with goals Be clear: be brief and avoid ambiguity Be informative: say exactly as much as needed

4 Emergence of Gricean Maxims Co-operative principle Be truthful Be relevant Be clear Be informative ??? Approach: Operationalize the co-operative principle Tool: Multi-agent decision theory Goal: Maxims emerge from rational behavior Joint utility Rationality

5 Related Work One-shot reference tasks – Generating spatial referring expressions [Golland et al. 2010] – Predicting pragmatic reasoning in language games [Stiller et al. 2011] Interpreting natural language instructions – Learning to read help guides [Branavan et al. 2009] – Learning to following navigational directions [Vogel and Jurafsky 2010] [Artzi and Zettlemoyer 2013] [Chen and Mooney 2011] [Tellex et al. 2011]

6 CARDS Task

7 Outline Spatial semantics ListenerBot: single-agent advice taker – Can accept advice, never gives it DialogBot: multi-agent decision maker – Gives advice by tracking the other player’s beliefs

8 Spatial Semantics “in the top left of the board” “on the left side”“right in the middle” BOARD(top;left)BOARD(left)BOARD(middle) MaxEnt Classifierw/ Bag of Words Estimated fromCorpus Data

9 Complexity Ahoy Approximate decision making only feasible for problems with <10k states!

10 Semantic State Representation Divide board into 16 regions Cluster squares based on meanings

11 Spatial semantics ListenerBot: single-agent advice taker – Can accept advice, never gives it DialogBot: multi-agent decision maker – Gives advice by tracking the other player’s beliefs Outline

12 Partially Observable Markov Decision Process (POMDP) Or: An HMM you get to drive!

13 State space S: hidden configuration of the world Location of card Location of player

14 Action space A: what we can do Move around the board Search for the card

15 Observations : sensor information + messages Whether we are on top of the card BOARD(right;top) etc.

16 Observation Model : sensor model We see the card if we search for it and are on it For messages

17 Reward R(s,a): value of an action in a state Large reward if in the same square as the card Every action adds small negative reward

18 Transition T(s’|a,s): dynamics of the world Travel actions change player location Card never moves

19 Initial belief state : distribution over S Uniform distribution over card location Known initial player location

20 Belief Update: Action: SEARCH Observation: (Card not here, )

21 Belief Update:

22 Action: SEARCH Observation: (Card not here, “left side”)

23 Belief Update:

24 Decision Making Choose policy Goal: Maximize expected reward Solution: Perseus, an approximate value iteration algorithm [Spaan et al. 2005] Computational complexity: P-SPACE! Immediate rewardFuture rewardExpected +

25 Spatial semantics ListenerBot: single-agent advice taker – Can accept advice, never gives it DialogBot: multi-agent decision maker – Gives advice by tracking the other player’s beliefs Outline

26 DialogBot (Approximately) tracks beliefs of other player Speech actions change beliefs of other player Model: Decentralized POMDP (Dec-POMDP) – Problem: NEXP Hard!! Top!

27 Each agent selects its own action

28 Each agent receives its own observation

29 Transition depends on both actions

30 Reward is shared between agents Formalization of the co-operative principle

31 Exact Multi-agent Belief Update Time

32 Approximate Multi-agent Belief Update Time

33 Single-agent POMDP Approximation Other agent belief transition model World transition model Resulting POMDP has states

34 What to say?

35 “Top”

36 “Middle”

37 “Right”

38

39 Return to Grice Be truthful Be relevant Be clear Be informative

40 Cooperating DialogBots Middle of the board

41 Cooperating DialogBots Middle of the board

42 Adolescent DialogBots Top

43 Return to Grice Be truthful: DialogBot speaks with evidence Be relevant: DialogBot gives advice to help win the game Be clear Be informative

44 Experimental Results Evaluate pairs of agents from 197 random initial states Agents have 50 high-level moves to find the card Bots% SuccessAverage High Level Actions ListenerBot & ListenerBot 84.4%19.8 ListenerBot & DialogBot 87.2%17.5 DialogBot & DialogBot 90.6%16.6

45 Emergent Gricean Behavior Be truthful: DialogBot speaks with evidence Be relevant: DialogBot gives advice to help win Be clear: need variable costs on messages Be informative: requires levels of specificity ACL 2013: Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs From joint reward, not hard coded Future Work: intentions, joint plans, deeper belief nesting Thanks!


Download ppt "Emergence of Gricean Maxims from Multi-agent Decision Theory Adam Vogel Stanford NLP Group Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky."

Similar presentations


Ads by Google