Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Agent Strategic Modeling in a Robotic Soccer Domain Andraz Bezek, Matjaz Gams Department of Intelligent Systems, Jozef Stefan Institute {andraz.bezek,

Similar presentations


Presentation on theme: "Multi-Agent Strategic Modeling in a Robotic Soccer Domain Andraz Bezek, Matjaz Gams Department of Intelligent Systems, Jozef Stefan Institute {andraz.bezek,"— Presentation transcript:

1 Multi-Agent Strategic Modeling in a Robotic Soccer Domain Andraz Bezek, Matjaz Gams Department of Intelligent Systems, Jozef Stefan Institute {andraz.bezek, matjaz.gams}@ijs.si Ivan Bratko Faculty of Computer and Information Science, University of Ljubljana bratko@fri.uni-lj.si

2 Talk Outline Overview of the Problem Overview of the Problem Multi-Agent Strategy Discovering Algorithm Multi-Agent Strategy Discovering Algorithm Results on the RoboCup Domain Results on the RoboCup Domain Results on the 3vs2 Keepaway Domain* Results on the 3vs2 Keepaway Domain* *not in the paper ( latest results )!

3 Schema of Multi-Agent Strategy Discovering Algorithm (MASDA) MASDA Input: Multi-agent action sequence (E.g. A RoboCup game) Output: Strategic concepts (E.g. Describing a specific RoboCup game) Input:Basic domain knowledge (E.g. Basic soccer and RoboCup domain knowledge) Input: Basic domain knowledge (E.g. Basic soccer and RoboCup domain knowledge)

4 An example MAS problem: a RoboCup attack

5 Goal: Human description of strategic action concept left forward player dribbles from the left half of the middle third into the penalty box left forward makes a pass into the the penalty box center forward in the center of the penalty box successfully shoots into the right part of the goal.

6 Multi-Agent Strategy Discovering Algorithm (MASDA) Numeric data (~3.000.000) Symbolic data (~150.000) Action graph (~6.500) Abstract action graph (~1000) Strategic action descriptions (~100) Strategic concepts (~10) Increasing abstraction I.1 I.1 I.2, I.3 I.2, I.3 II.2 II.2 II.3 II.3 III.1, III.2, III.3 III.1, III.2, III.3 II.1 II.1

7 Step I. Data preprocessing: I.1. Detection of actions in raw data

8 Step I. Data preprocessing: I.2. Action sequence generationtagent1agent20dashturn 1turndash 2turndash 3dashkick 4turndash 5dashturn......... 

9 Step I. Data preprocessing: I.3. Introduction of domain knowledget Left midfielder Center midfielder 0 creating space dribble 1 2 3attacksupport pass to player 4 5dribble creating space......... 

10 Step II: Graphical description: II.1. Action graph creation t Left midfielder Center midfielder 0 creating space dribble 1 2 3 attacksupport pass to player 4 5dribble creating space.........  L-MF: creating space C-MF: dribble L-MF: attack support C-MF: pass to player C-MF: creating space L-MF: dribble

11 Step II: Graphical description: II.1. Action graph creation

12 Step II: Graphical description: II.2. Abstraction process Abstraction 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

13 Step II: Graphical description: II.3. Strategy selection Abstraction 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

14 Step III: Symbolic description learning: III.1. Generation of action descriptions LTeam.R-FW: Long dribble LTeam.R-FW: LTeam.R-FW: Pass to space LTeam.R-FW: LTeam.C-MF: Successful shoot LTeam.C-MF: LTeam.MF: Pass to player LTeam.MF:

15 Step III: Symbolic description learning: III.2. Generation of learning examples classfeature1feature2...LTeam.MF:Pass_to_playerTF...FT LTeam.R-FW:Long_dribbleTF...FT LTeam.R-FW:Pass_to_spaceTF...FT LTeam.C-MF:Successful_shootTF...FT

16 Step III: Symbolic description learning: III.3. Rule induction Each edge in a strategy represents one class. Each edge in a strategy represents one class. 2-class learning problem: 2-class learning problem: positive examples: action instances for a given edge positive examples: action instances for a given edge negative examples: all other action instances negative examples: all other action instances Induce rules for a positive class (i.e. edge) Induce rules for a positive class (i.e. edge) Repeat for all edges in a strategy Repeat for all edges in a strategy

17 Testing on the RoboCup Simulated League Domain Input: Input: 10 RoboCup games: a fixed team vs. various opponent teams 10 RoboCup games: a fixed team vs. various opponent teams Basic soccer knowledge (no knowledge about strategy, no tactics, and no rules of the game): Basic soccer knowledge (no knowledge about strategy, no tactics, and no rules of the game): soccer roles (e.g. left-forward) soccer roles (e.g. left-forward) soccer actions (e.g. control dribble) soccer actions (e.g. control dribble) relations between players (e.g. behind) relations between players (e.g. behind) playing-field areas (e.g. penalty box) playing-field areas (e.g. penalty box) Output: Output: strategic concepts (shown on next slide) strategic concepts (shown on next slide) http://www.robocup.org/

18 RoboCup Domain: an example strategic concept LTeam.FW:Pass to player: RTeam.R-FB:Immediate LTeam.FW:Long dribble: RTeam.C-MF:Moving-away-slow  RTeam.L-FB:Still  RTeam.R-FB:Short-distance LTeam.FW:Successful shoot: RTeam.C-FW:Moving-away  LTeam.R-FW:Short-distance LTeam.FW:Successful shoot (end): RTeam.RC-FB:Left  RTeam.RC-FB:Moving-away-fast  RTeam.R-FB:Long-distance

19 RoboCup Domain: testing methodology Create a reference strategic concept on 10 RoboCup games Create a reference strategic concept on 10 RoboCup games Leave-one-out cross validation to generate 10 learning tasks (learn: 9 games, test: 1 game) Leave-one-out cross validation to generate 10 learning tasks (learn: 9 games, test: 1 game) positive examples: examples matching with a reference strategic concept positive examples: examples matching with a reference strategic concept negative examples: all other examples negative examples: all other examples Generate strategic concepts on 9 learning games and test on the remaining game Generate strategic concepts on 9 learning games and test on the remaining game Measure accuracy, recall and precision for a given strategy using: Measure accuracy, recall and precision for a given strategy using: only action description only action description only generated rules only generated rules both both Varying level of abstraction: 1-20 Varying level of abstraction: 1-20

20 RoboCup Domain: analysis of 10 RoboCup games

21 3vs2 Keepaway Domain Motivation: Motivation: RoboCup is too complex to play with learned concepts RoboCup is too complex to play with learned concepts In 3vs2 Keepwaway domain we are able play with learned concepts In 3vs2 Keepwaway domain we are able play with learned concepts Basic domain info: 5 agents, 3 high-level agent actions, 13 state variables Basic domain info: 5 agents, 3 high-level agent actions, 13 state variables http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/ (Peter Stone et al.)

22 3vs2 Keepaway Domain Measure average episode duration Measure average episode duration Two handcoded reference strategies: Two handcoded reference strategies: good strategy: hand (14s) - hold the ball till the nearest opponent is within 5m, then pass to the most open player good strategy: hand (14s) - hold the ball till the nearest opponent is within 5m, then pass to the most open player random: rand (5.2s) - randomly choose possible actions random: rand (5.2s) - randomly choose possible actions Our task: learn rules for reference strategies and play as similar as possible Our task: learn rules for reference strategies and play as similar as possible MASDA remains identical MASDA remains identical Modified only domain knowledge: Modified only domain knowledge: roles (K1, K2, K3, T1, T2), roles (K1, K2, K3, T1, T2), actions (hold, passK2, passK3) actions (hold, passK2, passK3) 13 domain attributes 13 domain attributes

23 Testing Methodology Reference game with a known strategy MASDA (rule induction) Game with a learned strategy Rules are handcoded into the program Rules are handcoded into the program Comparison of episode duration Compute average episode duration Compute average episode duration Compute average episode duration Compute average episode duration

24 Episode duration comparison of reference and learned game

25 Visual comparison of reference and learned game reference game: random (rand.avi) reference game: handcoded (hand.avi) learned random (rand-pass4.avi) learned handoced (hand-holdpass2.avi)

26 Comparison of handcoded strategy and learned rules DistK1T1  [6, 16)  DistK1T2  [6, 16)  DistK1C  [6, 12)  MinAngK3K1T1T2  [0, 90) => Hold DistK1T1  [6, 16)  DistK1T2  [6, 16)  DistK1C  [6, 12)  MinAngK3K1T1T2  [0, 90) => Hold DistK1T1  [6, 12)  DistK1T2  [6, 16)  DistK1K3  [10, 14)  DistK1K2  [8, 14) => Hold DistK1T1  [6, 12)  DistK1T2  [6, 16)  DistK1K3  [10, 14)  DistK1K2  [8, 14) => Hold MinDistK2T1T2  [12, 16)  DistK3C  [8, 16)  DistK1T2  [2, 10)  DistK1T1  [0, 6)  MinAngK2K1T1T2  [15, 135) => pass to K2 MinDistK2T1T2  [12, 16)  DistK3C  [8, 16)  DistK1T2  [2, 10)  DistK1T1  [0, 6)  MinAngK2K1T1T2  [15, 135) => pass to K2 DistK1T1  [2, 6)  MinDistK3T1T2  [10, 16)  DistK1K2  [10, 16)  DistK2C  [4, 14)  DistK1T2  [2, 8)  MinAngK2K1T1T2  [0, 15) => pass to K3 DistK1T1  [2, 6)  MinDistK3T1T2  [10, 16)  DistK1K2  [10, 16)  DistK2C  [4, 14)  DistK1T2  [2, 8)  MinAngK2K1T1T2  [0, 15) => pass to K3 if dist(K1, T1) > 5m => hold dist(K1, T1) <= 5m player K2 is not free => pass to K3 player K2 is free => pass to K2

27 Conclusion We have designed a domain independent strategy learning algorithm (MASDA), which learns from action trace and basic domain knowledge We have designed a domain independent strategy learning algorithm (MASDA), which learns from action trace and basic domain knowledge Successful implementation on: Successful implementation on: RoboCup domain evaluated by human expert and cross validation. RoboCup domain evaluated by human expert and cross validation. 3vs2 Keepaway domain evaluated by comparing with two reference strategies thru episode duration, visual comparison and rule inspection 3vs2 Keepaway domain evaluated by comparing with two reference strategies thru episode duration, visual comparison and rule inspection

28 Questions http://dis.ijs.si/andraz/logalyzer/

29 RoboCup Domain: successful attack strategies R-FW: pass to player → FW:control dribble → FW:shoot R-FW:dribble → R-FW:pass to player → FW:shoot FW:pass to player → L-FW:control dribble → L-FW:shoot L-FW:long dribble → L-FW:pass → FW:shoot L-FW:pass to player → FW:dribble → FW:shoot C-FW:long dribble → C-FW:pass → FW:dribble → FW:shoot


Download ppt "Multi-Agent Strategic Modeling in a Robotic Soccer Domain Andraz Bezek, Matjaz Gams Department of Intelligent Systems, Jozef Stefan Institute {andraz.bezek,"

Similar presentations


Ads by Google