Presentation on theme: "Evaluation Through Conflict Martin Zinkevich Yahoo! Inc."— Presentation transcript:
Evaluation Through Conflict Martin Zinkevich Yahoo! Inc. http://martin.zinkevich.org/lemonade
Who was I Worked with U Alberta Computer Poker Research Group – Designed Counterfactual Regret Algorithm – Theory behind DIVAT Worked on AAAI Computer Poker Competition – 2006 as lead programmer, 2007 as chair Work used in Man Vs Machine
Who am I Run the Lemonade Stand Game Competition Work with Yahoo Anti-Abuse Team
AAAI Computer Poker Competition 5 years running Now the ANNUAL Computer Poker Competition Latest-11 universities et al
An Old Idea Think about learning in the presence of other intelligent agents. Prove cool stuff about your learning algorithm given: – constraints about the adversary – constraints about the game
Solving the Unsolvable In current competitions, people are often applying techniques that are effective in solvable games, even when the game is not solvable. In what competitions is it useless to approximate the game as solvable?
Axelrod’s Iterated Prisoner’s Dilemma A competition between many competitors. One entry: tit-for-tat (Anatol Rapaport) – Nice (initially) – Retaliating – Forgiving – Non-envious Learned that cooperation has value, but: – Cooperate with whom? – How do we cooperate?
What Is The Lemonade Stand Game? Every round for 100 rounds: – each person selects an action privately – then, the actions are revealed The score of a player is the distance clockwise to the next player plus the distance counterclockwise.
Key Observations A constant-sum game between 3 players. – For every gain, someone has to lose. Possibilities For Cooperation – Opposite sides of the circle, “sandwiching” Not a “Solvable Game” (Nash, 1951) – Playing equilibrium strategies is not advisable Easy To Set “Table Image” – The constant strategy often evokes cooperative behavior Existing Techniques Fail – Experts algorithms lose to constant strategy Strategy #1: Play Constant Strategy #2: Play Opposite Strategy #3: Sandwich
Competition Structure Every set of three players played 100 rounds 180 times (1.5 million rounds total) Highest Total Score Wins Mean, Standard Error can be calculated
Competitors 28 players, 9 teams – University of Southampton/Imperial College London (Soton) – Yahoo! Inc. (Pujara) – Rutgers University (RL3) – Brown University (Brown) – Carnegie Mellon (2 teams-Waugh, ACTR) – University of Michigan (FrozenPontiac) – Princeton University (Schapire) – (Greg Kuhlmann)
The High Level Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task.
Lofty Goals Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task. behavior: a fully specified strategy. used: actually leveraged
Practical Concessions Phenomenal Intelligence: the observed behavior used by a set of people at a point in time for some task. Not any intelligent agent Not any time (people change) Not any task (context matters)