Presentation on theme: "(CS/SS 241) Introduction to SISL: Topics in Algorithmic game theory Adam Wierman – 258 Jorgensen John Ledyard – 102 Baxter Jason R. Marden – 335 Moore."— Presentation transcript:
(CS/SS 241) Introduction to SISL: Topics in Algorithmic game theory Adam Wierman – 258 Jorgensen John Ledyard – 102 Baxter Jason R. Marden – 335 Moore January 8, 2008 Introduction to game theoretic approaches to distributed optimization and control
2 Course Outline Topics Course: 1-2 Weeks on following topics - Routing, scheduling and load balancing games - Facility location games - Network formation games - Inefficiency of Equilibria – Price of Anarchy - Distributed Optimization - Learning in games - Mechanism design - Sponsored search - Prediction markets Course is Designed to be highly interactive!
3 Course Structure Student Presentations: (50%) - 1 or 2 for each topic - paper selected by professor - required to consult with a professor before presenting Homework: (40%) - 4 or 5 for the quarter Class Participation: (10%) Class Policy: - Designed as a graduate level class - Work in groups
4 unknown waters 24 hours Search Agents How should the agents search the water? Goal: Find Enemy Submarines Orincon/Lockheed Martin - Hawaii Motivating Problem #1: Enemy Submarine Detection
5 Motivating Problem #1: Enemy Submarine Detection (2) Search Agents different capabilities/strategies Genetic Algorithm “black box” “good, not optimal” 3 hours 24 hour mission
6 Motivating Problem #1: Enemy Submarine Detection (3) execute mission “open loop” mission time value of remaining planned mission critical level Problems: - Robustness to uncertainties. - Does not use information to improve plan. Halt plan and reformulate Why is this problem challenging? Optimizing over a very large strategy space! 24 hour MT 3 hour CT
7 Motivating Problem #2: Routing Over a Network Resources/links: - congestion/cost function Agents: - large number - large strategy sets Goals: - Efficiently use network - Satisfy agent constraints
10 What if agents made decisions by themselves? Selfish Agents: - Local independent objectives that may be in conflict with other agents - Robustness - Satisfy agent constraints - Local information Questions: - Can we achieve the global objective? - How do players make decisions? - How much knowledge do players need to know about the global objective? - How much knowledge do players need to know about the strategies of other players? Game theory analyzes the phenomenon that emerges when self-interested players interact. agent
11 What is a game? Players: Actions: Real Valued Payoffs or Utilities: Can previous examples be modeled as games?
12 Players: Strategies: –Search trajectories Real Valued Payoff or Utility: –Ability to assess “value” of chosen trajectory Search Agents Motivating Problem #1: Enemy Submarine Detection
13 Players: –Drivers Strategies: –Available routes connection source and destination Real Valued Payoff or Utility: –Measure of congestion/time Motivating Problem #2: Routing Over a Network
15 Motivation: Multiagent Systems Players: Actions: Desired behavior: large number autonomous GOALS : 1.Design autonomous agents -What should agents optimize? -How should they optimize? 2.When agents selfishly pursue own independent objectives they also collectively accomplish Utility Function Learning Dynamics Competition breeds Cooperation
16 Background Game Theory: Example of a Game ROCK PAPER SCISSOR R P S I II Players : Actions : Utilities: Rock / Paper / Scissor Player 1’s payoff Player 2’s payoff ROCK PAPER
17 Pure Nash Equilibrium No one player can improve his utility by a unilateral deviation I II A B A B I A B A B
18 Nash Equilibrium No one player can improve his utility by a unilateral deviation I II A B A B No pure NE exists A mixed NE exists A mixed NE exists in any game
19 Non-cooperative Game Formulation A set of “self-interested” agents: Action sets: Player utility or objective functions Global objective function Potential games
20 Types of Games D. Monderer and L. Shapley, “Potential Games,” Games and Economic Behavior, vol. 14, pp , Each player’s utility is perfectly aligned with objective! Identical Interest Game Potential Game
21 Example Potential Game I II A B A B I A B A B Payoff Matrix Potential Observations: 1.Existence of pure Nash equilibrium. 2.Maximizer of potential is a pure Nash equilibrium
22 Alternative Example Potential Game A congestion model: –Set of players: –Set of facilities: –Road specific costs: –Set of actions: A congestion game: congestion game = potential game
23 HW #1: Congestion Game A congestion model: –Set of players: –Set of facilities: –Road specific costs: –Set of actions: Consider the following Social Welfare Function: Question: You are the global planner and are responsible for distributing vehicles over the network to maximize social welfare. How would you do this?
24 Can players learn to play an equilibrium? Can players learn to play equilibrium in a game when they start from out-of-equilibrium conditions? How much information do they need, and how “rational” do they need to be? The learning problem
25 Learning and Efficiency of Equilibria How do we get to an equilibrium? How good is an equilibrium? I II A B A B I A B A B
26 Can players learn to play an equilibrium? Learning in games is especially difficult because learning is interactive. One agent’s act of learning changes what has to be learned by all the others. The learning problem
27 Learning in Games and Multiagent Systems ROCK PAPER SCISSOR R P S I II Players : Actions : Utilities: Rock / Paper / Scissor Learning in Games PROCESS LEARNING RULE Learning Rules Asymptotic Behavior Results (info up to time k)
28 Learning in Games Model players interaction as a repeated game Time –For each player Play –Strategy: –Action: –Payoff: Learning –Strategy Update: –Desired characteristics of learning algorithms Computational feasibility Convergence to desirable operating condition (e.g., Nash equilibrium)
29 Existing Learning Algorithms and Results Infinite Memory Algorithms (all past actions are relevant) –Fictitious play Nash equilibrium in potential games (Monderer & Shapley, 1996) –Regret matching Coarse correlated equilibrium in all games (Hart & Mas-Colell, 2000) Nash equilibrium in two player potential games (Hart & Mas-Colell, 2003) –Joint Strategy Fictitious Play with Inertia Nash equilibrium in potential games (Marden et al., 2005) –Regret-Based Dynamics Nash equilibrium in potential games (Marden et al., 2007) Finite Memory Algorithms (fixed number of past actions are relevant) –Adaptive play Nash equilibrium in weakly acyclic games (Young, 1993) –Better reply process with finite memory and inertia Nash equilibrium in weakly acyclic games (Young, 2005) –Spatial Adaptive Play Optimal Nash equilibrium in potential games (Young, 1998) –Payoff-Based Dynamics Nash equilibrium in weakly acyclic games (Marden et al., 2007) Different algorithms, different demands.
30 Classes of Learning Algorithms Full Information: –Observe complete action profile –Aware of structural form of utility –Examples: fictitious play Demanding for large-scale games Everyday, Homer needs to know - Route Ned took - Route Burns took - Route Apu Nahasapeemapetilon - and on… and on… and on...
31 Classes of Learning Algorithms Virtual Payoff Based: –can not observe action profile –unaware of structural form –ability to assess alternatives –Examples: regret matching Everyday, Homer needs to know - congestion on route 1 - congestion on route 2 - congestion on all routes Homer could take
32 Classes of Learning Algorithms Payoff Based: –Only observe action played and utility received Everyday, Homer needs to know Congestion only on route taken
33 Classes of Learning Algorithms Payoff Based: –Only observe action played and utility received Full Information: –Observe complete action profile –Aware of structural form of utility –Examples: fictitious play Virtual Payoff Based: –can not observe action profile –unaware of structural form –ability to assess alternatives –Examples: regret matching
34 Spatial Adaptive Play (SAP) … … “willingness to optimize” Time: t +1 Repeat Theorem[Young, 1998]: The SAP has the unique stationary distribution
35 Where is Spatial Adaptive Play? Payoff Based: –Only observe action played and utility received Full Information: –Observe complete action profile –Aware of structural form of utility –Examples: fictitious play Virtual Payoff Based: –can not observe action profile –unaware of structural form –ability to assess alternatives –Examples: regret matching
36 Sensor Coverage Problem C. G. Cassandras and W. Li, “Sensor networks and cooperative control,” European Journal of Control, vol. 11, no. 4–5, pp. 436–463, 2005 Mission Space Autonomous Sensors Global Objective: Maximize Probability of Detection Non-Cooperative Game Formulation (1) Design Utility Functions (2) Apply Learning Dynamics (3) Limiting behavior = desirable R(x) X
37 Sensor Coverage Problem: Sensor Model Limited Coverage: Detection Probability: Joint Detection Probability: point of interest i th sensor location
38 Sensor Coverage Problem: Global Objective by choosing Optimize Total Rewards: Pictorially, place circles to maximize weighted sum R(x) X Can we learn this optimal allocation pattern?
39 Sensor Coverage Problem: Utility Design Equally Shared Utility: Local # sensors scanning Problem: not aligned with global objective Simplify Sensor Model Cost of Anarchy in Sensor Coverage Inefficiency of Equilibrium
40 Sensor Coverage Problem: Utility Design Wonderful Life Utility: Identical Interests: local marginal contribution null action Not local Low sensitivity Aligned
41 Sensor Coverage Problem: Utility Design Wonderful Life Utility:  Wonderful Life Utility = Potential Game Maximizer of Nash Equilibrium
42 Sensor Coverage without Failures
43 Sensor Coverage with Failures
44 Example: Sudoku Sudoku Challenge –Fill in all boxes with 1 – 9 –No repetition in rows, columns, 3x3 squares Global Objective –Solve puzzle Model as Non-cooperative game –Set of agents –Action sets –Utility functions?
45 Example: Sudoku (2) Utility Potential Sudoku is a potential game! Sudoku solved
46 Example: Sudoku (3) Spatial Adaptive Play – Guaranteed to find optimal Key: Recognizing that Sudoku can be modeled as a Potential Game
47 Recap / Motivation: Learning in Games and Multiagent Systems Economic Approach Engineering Approach Analyze players’ behavior in repeated game, e.g. rock/paper/scissor Model behavior (descriptive), e.g.,fictitious play Prove limiting behavior of models and generalize results for classes of games Have large number of agents and global objective Design agent utilities and use learning algorithms as prescriptive control approach Emphasis not on rationality, but on implementation in MAS
48 Next Lecture and Beyond Next Lecture: –Adam Wierman –Congestion games –Load Balancing Problem –Inefficiency of Equilibrium And Beyond: –Student Presentations –Inefficiency of Equilibrium in Congestion Games –Tentative Date: January 22, 2008 –Any Volunteers?