Presentation is loading. Please wait.

Presentation is loading. Please wait.

Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and.

Similar presentations


Presentation on theme: "Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and."— Presentation transcript:

1 Co-evolution time, changing environments agents

2 Static search space  solutions are determined by the optimization process only, by doing variation and selection  evaluation determines fitness of each solution BUT…

3 Dynamic search space  what happens if state of environment changes between actions of the optimization process AND / OR other processes are concurrently trying to change the state also

4 Agents to act in changing environments  the search is now for an agent strategy to act successfully in a changing environment example: agent is defined with an algorithm based on a set of parameters ‘learning’ means searching for parameter settings to make the algorithm successful in the environment agent algorithm takes current state of environment as input and outputs an action to change the environment

5 Optimizing an agent  procedural optimization recall7.1.3 finite state machines (recognize a string) and 7.1.4 symbolic expressions (control operation of a cart) which are functions

6 The environment of activity  parameter space, e.g. game nodes are game states edges are legal moves that transform one state to another example: tic-tac-toe, rock-scissors-paper

7 Agent in the environment procedure to input current state and determine an action to alter the state  evaluated according to its success at achieving goals in the environment external goal - environment in desirable state (win game) internal goal - maintain internal state (survive)

8 Models of agent implementation 1.look-up table: state / action 2.deterministic algorithm hard-coded intelligence ideal if perfect strategy is known, otherwise,  parameterized algorithm ‘tune’ the parameters to optimize performance 3.neural network

9 Optimizing agent procedure initialize agent parameters evaluate agent success in environment repeat modify parameters evaluate modified agent success if improved success, retain modified parameters

10 Environment features 1.other agents? 2.state changes independent of agent(s)? 3.randomness? 4.discrete or continuous? 5.sequential or parallel?

11 Environment features game examples 1.other agents? solitaire, two-person, multi-player

12 Environment features game examples 2.state changes independent of agent(s)? YES simulators (weather, resources) NO board games

13 Environment features game examples 3.randomness? NO chess, sudoku(!) YES dice, card games

14 Environment features game examples 4.discrete or continuous? video games board games

15 Environment features game examples 5.sequential or parallel? turn-taking, simultaneous which sports? marathon, javelin, hockey, tennis

16 Models of agent optimization  evolving agent vs environment  evolving agent vs skilled player  co-evolving agents competing as equals  e.g., playing a symmetric game competing as non-equals (predator - prey)  e.g., playing an asymmetric game co-operative

17 Example: Game Theory invented by John von Neumann  models concurrent* interaction between agents each player has a set of choices players concurrently make a choice each player receives a payoff based on the outcome of play: the vector of choices e.g. paper-scissors-rock  2 players, with 3 choices each  3 outcomes: win, lose, draw *there are sequential Game Theory models also

18 Payoff matrix  tabulates payoffs for all outcomes e.g. paper-scissors-rock P1 \ P2 paperscissorsrock paper(draw,draw)(lose,win)(win,lose) scissors(win,lose)(draw,draw)(lose,win) rock(lose,win)(win,lose)(draw,draw)

19 Two-player two-choice ordinal games (2 x 2 games)  Four outcomes  Each player has four ordered payoffs example game: P1 \ P2 LR U1,23,1 D2,44,3

20 2 x 2 games  144 distinct games based on relation of payoffs to outcomes: (4! x 4!) / (2 x 2)  model many real world encounters  environment for studying agent strategies how do players decide what choice to make? P1 \ P2 LR U1,23,1 D2,44,3

21 2 x 2 games player strategy:  minimax - minimize losses pick row / column with largest minimum assumes nothing about other player P1 \ P2 LR U1,21,23,13,1 D2,42,44,34,3

22 2 x 2 games player strategies:  dominant strategy preferred choice, whatever opponent does does not determine a choice in all games P1 \ P2 LR U1,31,34,24,2 D3,43,42,12,1 LR U1,21,24,34,3 D3,43,42,12,1 LR U2,12,11,21,2 D4,34,33,43,4

23 2 x 2 games some games are dilemmas “prisoner’s dilemma” C = ‘cooperate’, D = ‘defect’ P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

24 2 x 2 games - playing strategies  what happens if players use different strategies?  how do/should players decide in difficult games?  modeling players and evaluating in games P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

25 iterated games  can players analyze games after playing and determine a better strategy for next encounter?  iterated play e.g., prisoner’s dilemma: can players learn to trust each other and get better payoffs? P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2

26 iterated prisoner’s dilemma  player strategy for choice as a function of previous choices and outcomes P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 game12…i-1ii+1… P1CC…DD? P2DC…CD? P1: choice i+1,P1 = f( choice i,P1, choice i,P2 )

27 iterated prisoner’s dilemma  evaluating strategies based on payoffs in iterated play P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 P1: choice n+1,P1 = f( choice n,P1, choice n,P2 ) game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343

28 tournament play  set of K strategies for a game, e.g., iterated prisoner’s dilemma  evaluate strategies by performance against all other strategies  round-robin tournament

29 example of tournament play P1P2…P(K-1)PKPK Σ P1308298…3402787140 P2343302…2972546989 ………………… P(K-1)280288…3012896860 PKPK312322…2972777045

30 tournament play P1P2…P(K-1)PKΣ P1308298…3402787140 P2343302…2972546989 ………………… P(K-1)280288…3012896860 PK312322…2972777045 order players by total payoff change ‘weights’ (proportion of population) based on success repeat tournament until weights stabilize

31 tournament survivors final weights of players define who is successful typical distribution: 1 dominant player a few minor players most players eliminated

32 variations on fitness evaluation  spatially distributed agents random players on an n x n grid each player plays 4 iterated games against neighbours and computes total payoff each player compares total payoff with 4 neighbours; replaced by neighbour with best payoff

33 spatial distribution

34 defining agents, e.g., prisoner’s dilemma  algorithms from experts Axelrod 1978  exhaustive variations on a model e.g. react to previous choices of both players: need choices for: first move after (C,C) // (self, opponent) after (C,D) after (D,C) after (D,D)  5 bit representation of strategy 32 possible strategies, 32 players in tournament tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D

35 tit-for-tat  interpreting the strategy start by trusting repeat opponent’s last behaviour  cooperate after cooperation  punish defection don’t hold a grudge BUT… Axelrod’s payoffs became the standard tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D tit-for-tat first:C (C,C)C (C,D)D (D,C)C (D,D)D P1 \ P2 CD C3,33,30,50,5 D5,05,01,11,1

36 problems with the research  if payoffs are changed but the prisoner’s dilemma pattern remains, tournaments can produce different winners --> relative payoff does matter P1 \ P2 CD C3,33,31,41,4 D4,14,12,22,2 CD C3,33,30,50,5 D5,05,01,11,1 game12…i-1ii+1…n-1nΣ 13…423…43298 P1CC…DDC…DC P2DC…CDC…CC 43…123…13343

37 agent is function of more than previous strategy  choice P1,n influenced byvariables previous choices - how many?2, 4, 6,..? payoffs - of both players4, 8 ? individual’s goals1 same, different measures?  maximum payoff  maximum difference  altruism  equality  social utility other factors??

38 agent research (Robinson & Goforth)  choice P1,n influenced by previous choices - how many payoffs - of both players repeated tournaments by sampling the space of payoffs - demonstrated that payoffs matter example of discovery: extra level of trust beyond tit-for-tat  effect of players’ goals on outcomes and payoffs and classic strategies 3 x 3 games ~3 billion games -> grid computing

39 general model  environment is payoff space where encounters take place  multiple agents interact in the space repeated encounters where  agents get payoffs  agents ‘display’ their own strategies and  agents ‘learn’ the strategies of others  goals of research reproduce real-world behaviour by simulating strategies develop effective strategies

40 Developing effective agent strategies  co-evolution bootstrapping symmetric competition  games, symmetric game theory asymmetric competition  asymmetric game theory cooperation  common goal, social goal

41 Coevolution in symmetric competition  Fogel’s checkers player evolving player strategies search space of strategies  fitness is determined by game play  variation by random change to parameters  selection by ranking in play against others  initial strategies randomly generated

42 Checkers game representation  32 element vector for the playable squares on the board (one colour)  Domain of each element: {-K,-1,0,1,K} e.g., start state of game: {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, 0,0,0,0,0,0,0,0, 1,1,1,1,1,1,1,1,1,1,1,1}

43 Game play algorithm board.initialize() while (not board.gameOver()) player1.makeMove(board) if (board.gameOver()) break player2.makeMove(board) end while board.assignPoints(player1,player2)

44 makeMove algorithm makeMove(board) { boardList = board.getNextBoards(this) bestBoardVal = -1; bestBoard = null forAll (tempBoard in boardList) boardVal = evaluate(tempBoard) if (boardVal>bestBoardVal) bestBoardVal = boardVal bestBoard = tempBoard board = bestBoard

45 evaluate algorithm evaluate(board) {  function of 32 elements of board  parameters are variables of search space, including K  output is in range [-1,1] used for comparing boards } what to put in the evaluation?

46 Chellapilla and Fogel’s evaluation  no knowledge of chess rules board generates legal moves and updates state no ‘look-ahead’ to future moves implicitly a pattern recognition system BUT only relational patterns predefined - see diagram (neural network)

47 Neural network input layeroutput layerhidden layer(s) i1i1 h 1 = w 11.i 1 +w 12.i 2 +w 13.i 3 o 1 =w 31.g 1 +w 32.g 2 +w 33.g 3 g1g1

48 Neural network gent  32 inputs - board state  1st hidden layer: 91 nodes 3x3 squares (36), 4x4 (25), 5x5 (16), 6x6 (9), 7x7 (4), 8x8 (1)  2nd hidden layer: 40 nodes  3rd hidden layer: 10 nodes  1 output - evaluation of board [-1, 1]  parameters - weights on neural network plus K (value of king)

49 Genetic algorithm  population: 15 player agents  one child agent from each parent: variation by random changes in K and weights  fitness based on score in tournament among 30 agents (win: 1, loss: -2, draw: 0)  agents with best 15 scores selected for next generation after 840 generations, effective checkers player in competition against humans


Download ppt "Co-evolution time, changing environments agents. Static search space  solutions are determined by the optimization process only, by doing variation and."

Similar presentations


Ads by Google