LECTURE 6: MULTIAGENT INTERACTIONS

LECTURE 6: MULTIAGENT INTERACTIONS
An Introduction to MultiAgent Systems

What are Multiagent Systems?

MultiAgent Systems Thus a multiagent system contains a number of agents… …which interact through communication… …are able to act in an environment… …have different “spheres of influence” (which may coincide)… …will be linked by other (organizational) relationships

Utilities and Preferences
Assume we have just two agents: Ag = {i, j} Agents are assumed to be self-interested: they have preferences over how the environment is Assume W = {w1, w2, …} is the set of “outcomes” that agents have preferences over We capture preferences by utility functions: ui = W  ú uj = W  ú Utility functions lead to preference orderings over outcomes: w ši w’ means ui(w) $ ui(w’) w ™i w’ means ui(w) > ui(w’)

What is Utility? Utility is not money (but it is a useful analogy)
Typical relationship between utility & money:

Multiagent Encounters
We need a model of the environment in which these agents will act… agents simultaneously choose an action to perform, and as a result of the actions they select, an outcome in W will result the actual outcome depends on the combination of actions assume each agent has just two possible actions that it can perform, C (“cooperate”) and D (“defect”) Environment behavior given by state transformer function:

Multiagent Encounters
Here is a state transformer function: (This environment is sensitive to actions of both agents.) Here is another: (Neither agent has any influence in this environment.) And here is another: (This environment is controlled by j.)

Rational Action Suppose we have the case where both agents can influence the outcome, and they have utility functions as follows: With a bit of abuse of notation: Then agent i’s preferences are: “C” is the rational choice for i. (Because i prefers all outcomes that arise through C over all outcomes that arise through D.)

Payoff Matrices We can characterize the previous scenario in a payoff matrix: Agent i is the column player Agent j is the row player

What is game theory? Game theory is a formal way to analyze strategic interaction among a group of rational players (or agents) who behave strategically Game theory has applications Economics Politics etc.

Game Examples: The battle of the sexes
Coordination game with some elements of conflict At the separate workplaces, Chris and Pat must choose to attend either an opera or a football game in the evening. Both Chris and Pat know the following: Both would like to spend the evening together. But Chris prefers the game. Pat prefers the opera. Chris Pat Football Opera 1 , 2 0 , 0 2 , 1

Example: Matching pennies
Zero-sum game (true game of conflict)! Each of the two players has a penny. Two players must simultaneously choose whether to show the Head or the Tail. Both players know the following rules: If two pennies match (both heads or both tails) then player 2 wins player 1’s penny. Otherwise, player 1 wins player 2’s penny. Player 1 Player 2 Tail Head -1 , 1 1 , -1

Example: Chicken Anti-coordination game
Two teenagers drive on a narrow road in opposite directions Noone wants to go out of the way – whoever “chickens” out looses his pride, tough guy wins If both are tough, they break their bones. Player 1 Player 2 Chicken Tough -10 , -10 10 , 0 0 , 10 5 , 5

Classic Example: Prisoner’s Dilemma
The mother of all cooperation games Two suspects held in separate cells are charged with a major crime. However, there is not enough evidence. Both suspects are told the following policy: If neither confesses then both will be convicted of a minor offense and sentenced to one month in jail. If both confess then both will be sentenced to jail for six months. If one confesses but the other does not, then the confessor will be released but the other will be sentenced to jail for nine months. Prisoner 1 Prisoner 2 Examples of Prisoner’s Dilemma: Arms race Missile defence Driving an SUV Confess Mum -1 , -1 -9 , 0 0 , -9 -6 , -6

Normal-form (strategic-form) representation
The normal-form (or strategic-form) representation of a game G specifies: A finite set of players {1, 2, ..., n}, players’ strategy spaces S1 S Sn and their payoff functions u1 u un where ui : S1 × S2 × ...× Sn→R. All combinations of the strategies. A combination of the strategies is a set of strategies, one for each player Prisoner 2 Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6

Static games of complete information
Static (or simultaneous-move) game of complete information Each player’s strategies* and payoff function are common knowledge among all the players. Each player i chooses his/her strategy si without knowledge of others’ choices. Then each player i receives his/her payoff ui(s1, s2, ..., sn). The game ends. * “strategies” (in Game Theory) = actions / moves

Normal-form representation: 2-player game
Bi-matrix representation 2 players: Player 1 and Player 2 Each player has a finite number of strategies Example: S1={s11, s12, s13} S2={s21, s22} Player 2 s21 s22 Player 1 s11 u1(s11,s21), u2(s11,s21) u1(s11,s22), u2(s11,s22) s12 u1(s12,s21), u2(s12,s21) u1(s12,s22), u2(s12,s22) s13 u1(s13,s21), u2(s13,s21) u1(s13,s22), u2(s13,s22)

Classic example: Prisoners’ Dilemma: normal-form representation
Set of players: {Prisoner 1, Prisoner 2} Sets of strategies: S1 = S2 = {Mum, Confess} Payoff functions: u1(M, M)=-1, u1(M, C)=-9, u1(C, M)=0, u1(C, C)=-6; u2(M, M)=-1, u2(M, C)=0, u2(C, M)=-9, u2(C, C)=-6 Players Prisoner 1 Prisoner 2 Strategies Confess Mum -1 , -1 -9 , 0 0 , -9 -6 , -6 Payoffs

Player 1 Player 2 Tail Head -1 , 1 1 , -1 Normal (or strategic) form representation: Set of players: {Player 1, Player 2} Sets of strategies: S1 = S2 = { Head, Tail } Payoff functions: u1(H, H)=-1, u1(H, T)=1, u1(T, H)=1, u1(H, T)=-1; u2(H, H)=1, u2(H, T)=-1, u2(T, H)=-1, u2(T, T)=1

Experiment 1: What strategy would you play in the Prisoner’s dilemma?
Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6

Solving Prisoners’ Dilemma
Confess always does better whatever the other player chooses  dominating strategy Dominated strategy: Mum There exists another strategy (confess) which always does better regardless of other players’ choices Prisoner 2 Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6

Experiment 2: Choose a strategy for player 1 (the row player) in the game below
D 5, 2 2, 6 1,4 0,4 0,0 3,2 2,1 1,1 7,0 2,2 1,5 5,1 9,5 1,3 0,2 4,8 A B Player 1 C D

Dominant Strategies Given any particular strategy (either C or D) of agent i, there will be a number of possible outcomes We say s1 dominates s2 if every outcome possible by i playing s1 is preferred over every outcome possible by i playing s2 (independently of what player j is playing) A rational agent will never play a dominated strategy So in deciding what to do, we can delete dominated strategies Unfortunately, there isn’t always a unique undominated strategy

Definition: strictly dominated strategy
si” is strictly better than si’ regardless of other players’ choices Prisoner 2 Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6

Example Two firms, Reynolds and Philip, share some market
Each firm earns $60 million from its customers if neither do advertising Advertising costs a firm $20 million Advertising captures $30 million from competitor Philip No Ad Ad Reynolds 60 , 60 30 , 70 70 , 30 40 , 40

2-player game with finite strategies
S1={s11, s12, s13} S2={s21, s22} s11 is strictly dominated by s12 if u1(s11,s21)<u1(s12,s21) and u1(s11,s22)<u1(s12,s22). s21 is strictly dominated by s22 if u2(s1i,s21) < u2(s1i,s22), for i = 1, 2, 3 Player 2 s21 s22 Player 1 s11 u1(s11,s21), u2(s11,s21) u1(s11,s22), u2(s11,s22) s12 u1(s12,s21), u2(s12,s21) u1(s12,s22), u2(s12,s22) s13 u1(s13,s21), u2(s13,s21) u1(s13,s22), u2(s13,s22)

Definition: weakly dominated strategy
si” is at least as good as si’ regardless of other players’ choices Player 1 Player 2 R U B L 1 , 1 2 , 0 0 , 2 2 , 2

Strictly and weakly dominated strategy
A rational player never chooses a strictly dominated strategy. Hence, any strictly dominated strategy can be eliminated. A rational player may choose a weakly dominated strategy.

Iterated elimination of strictly dominated strategies
If a strategy is strictly dominated, eliminate it The size and complexity of the game is reduced Eliminate any strictly dominated strategies from the reduced game Continue doing so successively

Iterated elimination of strictly dominated strategies: an example
Player 2 Left Middle Right Up 1 , 0 1 , 2 0 , 1 0 , 3 2 , 0 Player 1 Down Player 1 Player 2 Middle Up Down Left 1 , 0 1 , 2 0 , 3 0 , 1

Example: Tourists & Natives
Only two bars (bar 1, bar 2) in a city Can charge price of $2, $4, or $5 6000 tourists pick a bar randomly 4000 natives select the lowest price bar Example 1: Both charge $2 each gets 5,000 customers and $10,000 Example 2: Bar 1 charges $4, Bar 2 charges $5 Bar 1 gets =7,000 customers and $28,000 Bar 2 gets 3000 customers and $15,000

Bar 2 $2 $4 $5 Bar 1 10 , 10 14 , 12 14 , 15 12 , 14 20 , 20 28 , 15 15 , 14 15 , 28 25 , 25 Payoffs are in thousands of dollars Bar 2 $4 $5 Bar 1 20 , 20 28 , 15 15 , 28 25 , 25

Nash Equilibrium In general, we will say that two strategies s1 and s2 are in Nash equilibrium if: under the assumption that agent i plays s1, agent j can do no better than play s2; and under the assumption that agent j plays s2, agent i can do no better than play s1. Neither agent has any incentive to deviate from a Nash equilibrium Unfortunately: Not every interaction scenario has a Nash equilibrium Some interaction scenarios have more than one Nash equilibrium

New solution concept: Nash equilibrium
Player 2 L C R Player 1 T 0 , 4 4 , 0 5 , 3 M B 3 , 5 6 , 6 The combination of strategies (B, R) has the following property: Player 1 CANNOT do better by choosing a strategy different from B, given that player 2 chooses R. Player 2 CANNOT do better by choosing a strategy different from R, given that player 1 chooses B.

New solution concept: Nash equilibrium
Player 2 L’ C’ R’ Player 1 T’ 0 , 4 4 , 0 3 , 3 M’ B’ 3.5 , 3.6 The combination of strategies (B’, R’) has the following property: Player 1 CANNOT do better by choosing a strategy different from B’, given that player 2 chooses R’. Player 2 CANNOT do better by choosing a strategy different from R’, given that player 1 chooses B’.

Nash Equilibrium: idea
A set of strategies, one for each player, such that each player’s strategy is best for her, given that all other players are playing their corresponding strategies, or A stable situation that no player would like to deviate if others stick to it Prisoner 2 Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6 (Confess, Confess) is a Nash equilibrium.

Definition: Nash Equilibrium
Given others’ choices, player i cannot be better-off if she deviates from si* Prisoner 2 Mum Confess Prisoner 1 -1 , -1 -9 , 0 0 , -9 -6 , -6

2-player game with finite strategies
S1={s11, s12, s13} S2={s21, s22} (s11, s21)is a Nash equilibrium if u1(s11,s21)  u1(s12,s21), u1(s11,s21)  u1(s13,s21) and u2(s11,s21)  u2(s11,s22). Player 2 s21 s22 Player 1 s11 u1(s11,s21), u2(s11,s21) u1(s11,s22), u2(s11,s22) s12 u1(s12,s21), u2(s12,s21) u1(s12,s22), u2(s12,s22) s13 u1(s13,s21), u2(s13,s21) u1(s13,s22), u2(s13,s22)

Finding a Nash equilibrium: cell-by-cell inspection
Player 2 Left Middle Right Up 1 , 0 1 , 2 0 , 1 0 , 3 2 , 0 Player 1 Down Player 1 Player 2 Middle Up Down Left 1 , 0 1 , 2 0 , 3 0 , 1

Bar 2 $2 $4 $5 Bar 1 10 , 10 14 , 12 14 , 15 12 , 14 20 , 20 28 , 15 15 , 14 15 , 28 25 , 25 Payoffs are in thousands of dollars Bar 2 $4 $5 Bar 1 20 , 20 28 , 15 15 , 28 25 , 25 But this was also the solution of the game which we obtained using the method of removing the dominated strategies!! In fact, any dominant strategy equilibrium is also Nash equilibrium!

Using best response function to find Nash equilibrium
In a 2-player game, ( s1, s2 ) is a Nash equilibrium if and only if player 1’s strategy s1 is her best response to player 2’s strategy s2, and player 2’s strategy s2 is her best response to player 1’s strategy s1. For each strategy of each player find the best response strategy of the other player. If you find two best responses in the same cell, this is NE Prisoner 1 Prisoner 2 Confess Mum -1 , -1 -9 , 0 0 , -9 -6 , -6

Using best response function to find Nash equilibrium: example
Player 2 L’ C’ R’ Player 1 T’ 0 , 4 4 , 0 3 , 3 M’ B’ 3.5 , 3.6 M’ is Player 1’s best response to Player 2’s strategy L’ T’ is Player 1’s best response to Player 2’s strategy C’ B’ is Player 1’s best response to Player 2’s strategy R’ L’ is Player 2’s best response to Player 1’s strategy T’ C’ is Player 2’s best response to Player 1’s strategy M’ R’ is Player 2’s best response to Player 1’s strategy B’

Bar 2 $2 $4 $5 Bar 1 10 , 10 14 , 12 14 , 15 12 , 14 20 , 20 28 , 15 15 , 14 15 , 28 25 , 25 Payoffs are in thousands of dollars Use best response function to find the Nash equilibrium.

Example: The battle of the sexes
She He Football Opera 2 , 1 0 , 0 1 , 2 Opera is Player 1’s best response to Player 2’s strategy Opera Opera is Player 2’s best response to Player 1’s strategy Opera Hence, (Opera, Opera) is a Nash equilibrium Football is Player 1’s best response to Player 2’s strategy Football Football is Player 2’s best response to Player 1’s strategy Football Hence, (Football, Football) is a Nash equilibrium But which equilibrium will actually happen?? -- if She thinks that He is going to pick Football, and therefore picks Football -- and reversely, He thinks She is going to pick Opera and so picks Opera they will end up at different places and payoffs of 0!! – this situation happens often in COORDINATION PROBLEMS

Example: The New York Game
Nick John Union St Empire 1 , 1 0 , 0 This coordination game has two Nash equillibria No sense which one is better “Tipping point” (Schelling, 1961) “focal point” – a NE resulting from a strategy that stands out and is played frequently in one-shot games. Can become a social norm. e.g. setting country borders. Example: driving on the right side of the road Experiment: Coordinate on set of 4 actions.

Player 1 Player 2 Tail Head -1 , 1 1 , -1 Head is Player 1’s best response to Player 2’s strategy Tail Tail is Player 2’s best response to Player 1’s strategy Tail Tail is Player 1’s best response to Player 2’s strategy Head Head is Player 2’s best response to Player 1’s strategy Head Hence, NO Nash equilibrium

Competitive and Zero-Sum Interactions
Where preferences of agents are diametrically opposed we have strictly competitive scenarios Zero-sum encounters are those where utilities sum to zero: ui(w) + uj(w) = 0 for all w 0 W Zero sum implies strictly competitive Zero sum encounters in real life are very rare … but people tend to act in many scenarios as if they were zero sum

Another example of zero-sum games
Cournot market (Augustin Cournot, 1840) Two companies Perrier and Apollinaris sell mineral water. Each company has a fixed cost of $5000 per period, regardless whether they sell anything or not. The two companies are competing for the same market and each firm must choose a high price ($2 per bottle) or a low price ($1 per bottle). Here are the rules of the game: 1) At a price of $2, 5000 bottles can be sold for a total revenue of $10000. 2) At a price of $1, bottles can be sold for a total revenue of $10000. 3) If both companies charge the same price, they split the sales evenly between them. 4) If one company charges a higher price, the company with the lower price sells the whole amount and the company with the higher price sells nothing. 5) Payoffs are profits -- revenue minus the $5000 fixed cost.

Cournot game Here is the payoff table for these two companies 0,0
Perrier Price = $1 Price = $2 SOLUTION: Maximin criterion For a two-person, zero sum game it is rational for each player to choose the strategy that maximizes the minimum payoff, and the pair of strategies and payoffs such that each player maximizes her minimum payoff is the "solution to the game." 0,0 5000, -5000 -5000, 5000 Price = $1 Appolinaris Price = $2

So is there a solution to matching pennies?
It appears that there isn’t a maximin solution: the min payoff of each strategy is -1. But this game can have more than two strategies. In addition to the two obvious strategies, head and tail, a player can "randomize" her strategy by offering either a head or a tail, at random, with specific probabilities  “mixed strategy”. The game of matching pennies has a solution in mixed strategies, and it is to offer heads or tails at random with probabilities 0.5 each way. Von Neumann showed that every two-person zero sum game had a maximin solution, in mixed if not in pure strategies. But these games are very rare in the real world – they seem to exist only as “games” like baseball, bridge, poker, chess etc. And even there one can include external payoffs: e.g. thrill of gambling, the Bank in all Casino games etc.

Back to the Prisoner’s Dilemma: is cooperation irrational?
The individual rational action is defect This guarantees a payoff of no worse than 2, whereas cooperating guarantees a payoff of at most 1 So defection is the best response to all possible strategies: both agents defect, and get payoff = 2 But intuition says this is not the best outcome: Surely they should both cooperate and each get payoff of 3!

The Prisoner’s Dilemma
This apparent paradox is the fundamental problem of multi-agent interactions. It appears to imply that cooperation will not occur in societies of self-interested agents. Real world examples: nuclear arms reduction (“why don’t I keep mine. . . ”) free rider systems — public transport; in the UK — television licenses. The prisoner’s dilemma is ubiquitous. Can we recover cooperation?

Arguments for Recovering Cooperation
Conclusions that some have drawn from this analysis: the game theory notion of rational action is wrong! somehow the dilemma is being formulated wrongly Arguments to recover cooperation: We are not all Machiavelli (but every existing system gets exploited by cheaters!) The other prisoner is my twin! (but then he is not a real autonomous self-interested player) The shadow of the future…

The Iterated Prisoner’s Dilemma
One answer: play the game more than once If you know you will be meeting your opponent again, then the incentive to defect appears to evaporate Cooperation is the rational choice in the infinititely repeated prisoner’s dilemma (Hurrah!)

Backwards Induction But…suppose you both know that you will play the game exactly n times On round n - 1, you have an incentive to defect, to gain that extra bit of payoff… But this makes round n – 2 the last “real”, and so you have an incentive to defect there, too. This is the backwards induction problem. Playing the prisoner’s dilemma with a fixed, finite, pre-determined, commonly known number of rounds, defection is the best strategy

Axelrod’s Tournament Suppose you play iterated prisoner’s dilemma against a range of opponents… What strategy should you choose, so as to maximize your overall payoff? Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma Try to play an evolutionary game at:

Strategies in Axelrod’s Tournament
ALLD: “Always defect” — the hawk strategy; TIT-FOR-TAT: On round u = 0, cooperate On round u > 0, do what your opponent did on round u – 1 TESTER: On 1st round, defect. If the opponent retaliated, then play TIT-FOR-TAT. Otherwise intersperse cooperation and defection. JOSS: As TIT-FOR-TAT, except periodically defect

Recipes for Success in Axelrod’s Tournament
Axelrod suggests the following rules for succeeding in his tournament: Don’t be envious: Don’t play as if it were zero sum! Be nice: Start by cooperating, and reciprocate cooperation Retaliate appropriately: Always punish defection immediately, but use “measured” force — don’t overdo it Don’t hold grudges: Always reciprocate cooperation immediately

More arguments about cooperation…
The world is multi-dimensional; any state has multiple aspects and it is impossible to reduce it to a single reward/payoff function: winning one thing usually means loosing something else. Utility functions in reality are very complex and change in time. Different people have different values on outcomes and they don’t know other’s utilities Acts of “altruism” can be actually self-interested on another dimension (maximizing a hidden utility). If one focuses on the right aspect when interpreting the cooperative behaviour of a person, usually, one finds that it is self-interested (“will be rewarded in Heaven”, “does it for glory”, “does it because acting according to high moral principles leads to inner satisfaction”). A game is just a very simple and one-dimensional model…

LECTURE 6: MULTIAGENT INTERACTIONS

Similar presentations

Presentation on theme: "LECTURE 6: MULTIAGENT INTERACTIONS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LECTURE 6: MULTIAGENT INTERACTIONS

Similar presentations

Presentation on theme: "LECTURE 6: MULTIAGENT INTERACTIONS"— Presentation transcript:

Similar presentations

About project

Feedback