Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning in Multi-agent System

Similar presentations

Presentation on theme: "Learning in Multi-agent System"— Presentation transcript:

1 Learning in Multi-agent System
Zhai Yuqing

2 Outline Agent Learning Multi-agent learning
Reinforcement Learning & Multi-agent Reinforcement Learning

3 Agent Learning

4 Why Learning? Learning is essential for unknown environments,
i.e., when designer lacks omniscience Learning is useful as a system construction method, i.e., expose the agent to reality rather than trying to write it down Learning modifies the agent's decision mechanisms to improve performance

5 Why Learning? It is difficult to hand-design behaviours that act optimally (or even close to it) Agents can optimize themselves using reinforcement learning Not learning new concepts (or behaviours), but given a set of states and actions it can find best policy Is this just adaptive control? Learning can be done on-line and continuously throughout lifetime of agent, adapting to (slowly) changing situations

6 Learning Rewards Observations, Sensations Learning Algorithm World,
State Policy Actions

7 Learning to act in the world
Other agents (possibly learning) Rewards Observations, Sensations ? Learning Algorithm Environ-ment Policy Actions World

8 Learning Agent Architecture
A learning agent can be thought of as containing a performance element that decides what actions to take and a learning element that modifies the performance element so that it makes better decisions

9 Learning Agent Architecture



12 Multi-agent Learning

13 Learning in Multiagent Systems
Intersection of DAI and ML Why bring them together? There is a strong need to equip Multiagent systems with learning abilities The extended view of ML as Multiagent learning is qualitatively different from traditional ML and can lead to novel ML techniques and algorithms

14 Multi-Agent Learning Problem:
Agent tries to solve its learning problem, while other agents in the environment also are trying to solve their own learning problems.  challenging non-stationarity. Main scenarios: (1) cooperative; (2) self-interest (many deep issues swept under the rug) Agent may know very little about other agents: payoffs may be unknown learning algorithms unknown Traditional method of solution: game theory (uses several questionable assumptions)

15 Multi-agent Learning Problem
Agent tries to solve its own learning problem, while other agents in the environment try to solve their own learning problems Larger state space Might have to include state of other robots in own state Problems of Multi-Agent RL All of problems from single agent Other agents unpredictable or non-stationary Should reinforcement be local or global? Was robot trying to achieve goal or reacting to other robots when doing a good action?

16 Learning in Multi-Agent Systems
No Doubt learning is of great importance for MAS ! Challenge: Multi-Agent learning problem. The optimal policy changes. Other agents are learning too. Can we have a unifying framework in which this learning can be understood. Challenging MAS-domains: Robotic soccer Traffic Robotic rescue Trading agents, e-commerce Automated Driving





21 General Characterization
Principal categories of learning The features in which learning approaches may differ The fundamental learning problem known as the credit-assignment problem

22 Principal Categories Centralized Learning (isolated learning)
Learning executed by a single agent, no interaction with other agents Several centralized learners may try to obtain different or identical goals at the same time

23 Principal Categories Decentralized Learning (interactive learning)
Several agents are engaged in the same learning process Several groups of agents may try to obtain different or identical learning goals at the same time Single agent may be involved in several centralized/decentralized learning processes at the same time

24 Learning and Activity Coordination
Previous research on coordination focused on off-line design of behavioral rules, negotiation protocols, etc… Agents operating in open, dynamic environments must be able to adapt to changing demands and opportunities How can agents learn to appropriately coordinate their activities?

25 Learning about and from Other Agents
Agents learn to improve their individual performance Better capitalize on available opportunities by prediction the behavior of other agents (preferences, strategies, intentions, etc…)

26 Learning Organizational Roles
Assume agents have the capability of playing one of several roles in a situation Agents need to learn role assignments to effectively complement each other Cooperative problem solving domain

27 Learning Organizational Roles
The framework includes Utility, Probability and Cost (UPC) estimates of a role adopted at a particular situation Utility – desired final state’s worth if the agent adopted the given role in the current situation Probability – likelihood of reaching a successful final state (given role/situation) Cost – associated computational cost incurred Potential – usefulness of a role in discovering pertinent global information

28 Learning Organizational Roles: Theoretical Framework
sets of situations and roles for agent k An agent maintains vectors of UPC During the learning phase: rates a role by combining the component measures

29 Learning Organizational Roles: Theoretical Framework
After the learning phase is over, the role to be played in situation s is: UPC values are learned using reinforcement learning UPC estimates after n updates:

30 Learning Organizational Roles: Updating the Utility
S – the situations encountered between the time of adopting role r in situation s and reaching a final state F with utility The utility values for all roles chosen in each of the situation in S are updated: Alpha – learning rate

31 Learning Organizational Roles: Updating the Probability
- returns 1 if the given state is successful The update rule for probability: Alpha – learning rate

32 Learning Organizational Roles: Updating the Potential
- returns 1 if in the path to the final state, conflicts are detected and resolved by information exchange The update rule for potential: Alpha – learning rate

33 Learning to Exploit an Opponent: Model-Based Approach
The most prominent approach in AI for developing playing strategies is the minimax algorithm Assumes that the opponent will choose the worst move An accurate model of the opponent can be used to develop better strategies Markovich and Carmel work

34 Learning to Exploit an Opponent Model-Based Approach
The main problem of RL is its slow convergence Model based approach tries to reduce the number of interaction examples needed for learning Perform deeper analysis of past interaction experience Markovich and Carmel work

35 Model Based Approach The learning process is split into two separate stages: Infer a model of the other agent based on past experience Utilize the learned model for designing effective interaction strategy for the future

36 Reducing Communication by Learning
Learning is a method for reducing the load of communication among agents Consider the contract-net approach: Broadcasting of task announcement is assumed Scalability problems when the number of managers/tasks increases

37 Reducing Communication in Contract-Net
A flexible learning-based mechanism called addressee learning Enable agents to acquire knowledge about the other agents’ task solving abilities Tasks may be assigned more directly

38 Reducing Communication in Contract-Net
Case-based reasoning is used for knowledge acquisition and refinement Humans often solve problems using solutions that worked well for similar problems Construct cases – problem-solution pairs

39 Case-Based Reasoning in Contract Net
Each agent maintains it own case base A case consists of: Task specification Info about which agent already solved the task and the quality of the solution Need a similarity measure for tasks

40 Case-Based Reasoning in Contract Net
Distance between two attributes is domain-specific Similarity between two tasks and : For task a set of similar tasks is:

41 Case-Based Reasoning in Contract Net
An agent has to assign task to another agent Select the most appropriate agents by computing their suitability:

42 Improving Learning by Communication
Two forms of improving learning by communication are distinguished: Learning based on low-level communication (e.g. exchanging missing information) Learning based on high-level communication (e.g. mutual explanation)

43 Improving Learning by Communication
Example: Predator-Prey domain Predators are Q-learners Each predator has a limited visual perception Exchange sensor data – low-level communication Experiments show that it clearly leads to improved learning results

44 Knowledge exchange in MAS
More sophisticated implementations provide knowledge exchange capabilities Exchange the strongest rules they have learned Multi-agent Mutual Learning(MAML)

45 Some Open Questions… What are the unique requirements and conditions for Multiagent learning? Do centralized and decentralized learning qualitatively differ from each other? Development of theoretical foundations of decentralized learning Applications of Multiagent learning in complex real-world environments

46 Reinforcement Learning & Multi-agent Reinforcement Learning

47 6

48 12

49 Reinforcement Learning Approach
Feature: The Reward won’t be given immediately after agent’s action. Usually, it will be given only after achieving the goal. This delayed reward is the only clue to agent’s learning. State Recognizer Action Selector LookUp Table W ( S, a ) Learner Agent Input Environment Reward E Overview: TD [Sutton 88], Q-learning [Watkins 92] Agent can estimate a model of state transition probabilities of E(Environment), if E has a fixed state transition probability (; E is a MDPs) . Profit sharing [Grefensttette 88] Agent can estimate a model of state transition probabilities of E, even though E does not have a fixed state transition probability. c.f. Dynamic programming Agent needs to have a perfect model of state transition probabilities of E.








57 Reinforcement Learning Scenario
Agent reward rt state st action at rt+1 Environment st+1







64 Example                                                                                                                                                  Q(s, ared)     =  0 +   × 81    =  72       Q(s, agreen)  =  0 +   × 100  =  90       Q(s, ablue)    =  0 +   × 100  =  90












76 Multi-agent RL Basic idea
Combine the learning process in an unknown environment with the interactive decision process of multiple agents There is no single utility function to optimize Each agent has a different objective and its payoff is determined by the joint action of multiple agents

77 Challenges in Multi-agent RL
Curse of dimensionality The number of parameters to be learned increases dramatically with the number of agents Partial observability states and actions of the other agents which are required for an agent to make decision are not fully observable inter-agent communication is usually costly Notes: Partially observable Markov decision processes (POMDPs) have been used to model partial observability in probabilistic AI

78 The Multi-Agent Reinforcement Learning (MARL) Model
Multiple selfish agents in a stationary dynamic environment. Environment modeled as a Stochastic (a.k.a Markov) Game (SG or MG). Transition and Payoff are functions of all agents’ actions.

79 The MARL Model (cont.) Transition probabilities and payoffs are initially unknown to agent. Agent’s goal – maximize return.

80 Typical Multi-agent RL methods
Value Iteration Learning [Sutton and Barto ] Based on different concepts of equilibrium in game theory Min-max solution-based learning algorithm in zero-sum stochastic games [Littman] Nash Equilibrium-based learning algorithm[Wellman] Extending Littman’s algorithm to the general-sum games Correlated Equilibrium based learning algorithm[Hall ] Consider the possibility of action correlation among agents

81 Typical Multi-agent RL methods
Multiple-person decision theory-based Assume that each agent plays a best-response against stationary opponents Require the joint action of agents to converge to Nash Equilibrium in self-play Learn quickly while losing and slowly while winning Learn best response when opponents are stationary, otherwise move to equilibrium

82 Typical Multi-agent RL methods
Integrating RL with Coordination Learning Joint action learner Independent Learner Ignore the existence of other agents Just apply RL in the classic sense

83 Typical Multi-agent RL methods
Hierarchical Multi-agent RL Each agent is given an initial hierarchical decomposition of the overall task Define cooperative subtasks to be those subtasks in which coordination among agents has significant effect on the performance of the overall task Cooperative subtasks are usually defined at highest level(s) of the hierarchy

84 MAL Foundation The game theoretic concepts of stochastic games and Nash equilibria Learning algorithms use stochastic games as a natural extension of Markov decision processes (MDPs) to multiple agents Equilibria learners Nash-Q , Minimax-Q Friend-or-Foe-Q gradient ascent learner best-response learner

85 Multiagent Q-learning desiderata
“performs well” vs. arbitrarily adapting other agents best-response probably impossible Doesn’t need correct model of other agents’ learning algorithms But modeling is fair game Doesn’t need to know other agents’ payoffs Estimate other agents’ strategies from observation do not assume game-theoretic play No assumption of stationary outcome: population may never reach eqm, agents may never stop adapting Self-play: convergence to repeated Nash would be nice but not necessary. (unreasonable to seek convergence to a one-shot Nash)

86 Finding Nash equilibrium
Game theoretic approach which supposes the complete knowledge of the reward structure of the underlying game by all the agents Each agent calculates an equilibrium, by using mathematical programming Suppose that the other agents are rational

87 Potential applications of MARL
E-commerce – agents buying and selling over the internet. Autonomous computing, e.g., automatic fault recovery. Exploration of environments that are inaccessible to humans: bottom of oceans, space, etc…

88 The End

Download ppt "Learning in Multi-agent System"

Similar presentations

Ads by Google