Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004

Markov Games as a Framework for Multi-agent Reinforcement Learning2 Overview MDP is capable of describing only single-agent environments. New mathematical framework is needed to support multi-agent reinforcement learning. Markov Games A single step in this direction is explored. 2-player zero-sum Markov Games

Markov Games as a Framework for Multi-agent Reinforcement Learning3 Definitions Markov Decision Process (MDP)

Markov Games as a Framework for Multi-agent Reinforcement Learning4 Definitions (cont.) Markov Game (MG)

Markov Games as a Framework for Multi-agent Reinforcement Learning5 Definitions (cont.) Two-player zero-sum Markov Game (2P-MG)

Markov Games as a Framework for Multi-agent Reinforcement Learning6 2P-MG Is Capable? Precludes cooperation! Generalizes MDPs (when |O|=1) The opponent has a constant behavior, which may be viewed as part of the environment. Matrix Games (when |S|=1) The environment doesn’t hold any information and rewards are totally decided by the actions. Yes

Markov Games as a Framework for Multi-agent Reinforcement Learning7 Matrix Games Example – “rock, paper, scissors”

Markov Games as a Framework for Multi-agent Reinforcement Learning8 What does ‘optimality’ exactly mean? MDP A stationary, deterministic, and undominated optimal policy always exists. MG The performance of a policy depends on the opponent’s policy, so we cannot evaluate them without context. New definition of ‘optimality’ in game theory Performs best at its worst case compared with others At least one optimal policy exists, which may or may not be deterministic because the agent is uncertain of its opponent’s move.

Markov Games as a Framework for Multi-agent Reinforcement Learning9 Finding Optimal Policy - Matrix Games The optimal agent’s minimum expected reward should be as large as possible. Use V to express the minimum value, then consider how to maximize it

Markov Games as a Framework for Multi-agent Reinforcement Learning10 Finding Optimal Policy - MDP Value of a state Quality of a state-action pair

Markov Games as a Framework for Multi-agent Reinforcement Learning11 Finding Optimal Policy – 2P-MG Value of a state Quality of a s-a-o triple

Markov Games as a Framework for Multi-agent Reinforcement Learning12 Learning Optimal Polices Q-learning minimax-Q learning

Markov Games as a Framework for Multi-agent Reinforcement Learning13 Minimax-Q Algorithm

Markov Games as a Framework for Multi-agent Reinforcement Learning14 Experiment - Problem Soccer

Markov Games as a Framework for Multi-agent Reinforcement Learning15 Experiment - Training 4 agents trained through 10 6 steps minimax-Q learning vs. random opponent - MR vs. itself - MM Q-learning vs. random opponent - QR vs. itself - QQ

Markov Games as a Framework for Multi-agent Reinforcement Learning16 Experiment - Testing Test 3 QR, QQ – 100% loser? Test 1 QR > MR? Test 2 QR<<QQ?

Markov Games as a Framework for Multi-agent Reinforcement Learning17 Contributions A solution to 2-player Markov games with a modified Q-learning method in which minimax is in place of max Minimax can also be used in single-agent environments to avoid risky behavior.

Markov Games as a Framework for Multi-agent Reinforcement Learning18 Future work Possible performance improvement of the minimax-Q learning method Linear programming caused large computational complexity. Iterative methods may be used to get approximate solutions to minimax much faster, which is sufficiently satisfactory.

Markov Games as a Framework for Multi-agent Reinforcement Learning19 Discussions The paper claims that the training is not sufficient for attaining the optimal policy for MR and MM. Then how soon will it possible for them to do so? It is claimed that MR and MM should break even with even the strongest opponent. Why? After training and before testing, the policies in agents are fixed. How about not fixing it and leaving learning abilities there? Thus we can examine how they adapt themselves over the long run, say how their winning rate changes. What is a “slow enough exponentially weighted average”?

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Similar presentations

Presentation on theme: "Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Similar presentations

Presentation on theme: "Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004."— Presentation transcript:

Similar presentations

About project

Feedback