Presentation on theme: "Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model Michael Wunder Michael Kaisers Michael Littman John Yaros."— Presentation transcript:
Building Agents for the Lemonade Game Using a Cognitive Hierarchy Population Model Michael Wunder Michael Kaisers Michael Littman John Yaros
Overview of Method In the Lemonade-Stand Game (LG), players are rewarded for finding a partner quickly, to avoid becoming the odd man out As a result, complicated prediction- optimization learners are at a disadvantage Utilizing heuristics, an agent can identify (and attract) potential partners Population-based models are useful to determine the best heuristics in the LG
Example: p-beauty contest Keynes proposed that the stock market is like a beauty contest where judges are trying to guess the contestant (or stock, or strategy) that others like n players submit a number x between 0 and 1 00, and the winner is closest to a fraction of the average guess, p*( i x i )/n, p is fraction between 0 and 1, i.e. 2/3
P-beauty game explained The Nash strategy is to play 0 because it cannot be outplayed However, first-time players do not reach this outcome…why? from Behavioral Game Theory by Colin Camerer
How a Cognitive Hierarchy Works Level 0: no reasoning, random action or simple rule Level 1 : reacts to the base strategy at Level 0 only Level k: reacts to Level k- 1 … Poor predictable Bart, always picks Rock. Good ol Rock. Homer cant beat that. Good ol Rock. Nothing beats that.
Population-based Reasoning Steps of the CH technique: 1. Identify base strategies (random, static) 2. Derive processes for steps of reasoning A step of reasoning, in this case, is the strategy that can exploit the one before 3. Recursively apply steps to each level k 4. These levels form the hierarchy according to some distribution f(k) 5. Select a strategy that does well against desired population
Lemonade-Stand Game Levels LG yields elegant level heuristics L0-U: Uniformly random action L0-C: Constant action L0-X: Constant with probability X, otherwise choose randomly L1: Move Across from most most stable player (with highest X). Also Optimal against L1. This move is Cooperative equilibrium.
Lemonade Game Levels, Contd. L2: Stay Constant for at least one turn, in case opponents are two L1s. If the current location is disadvantageous, move somewhere else, perhaps Across from a good partner. L3: With other L3, Sandwich a constant or L2 player, and become Across from each other if it moves. Can we classify contestants by level?
Actual Competition Results Using idealized agents from each of these levels, find the score of each contestant against populations of adjacent levels
Actual Competition Results The x-axis is composed of a ratio of the nearby levelsLevel 1.2 is a population of 80% L1 and 20% Level 2
Actual Competition Results This population construction method allows for clear distinctions between levels, but other possibilities exist
Conclusion Our agent (RL3) contains elements of all three levels, which is not optimal against this population of competitors The model that emerges from LG does predict the outcome fairly well The model predicts that subsequent repetitions would generally move the population up the hierarchy CH has implications for larger games (e.g. TAC)