Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK.

Similar presentations

Presentation on theme: "Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK."— Presentation transcript:

1 Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

2 Overview Games: dynamic, uncertain, open-ended – Ready-made test environments – 21 billion dollar industry: space for more machine learning… Agent architectures – Where the Computational Intelligence fits – Interfacing the Neural Nets etc – Choice of learning machine (WPC, neural network, NTuple systems) Training algorithms – Evolution / co-evolution – TDL – Hybrids Methodology: strong belief in open competitions

3 My Angle Machine learning – How well can systems learn – Given complex semi-structured environment – With indirect reward schemes

4 Sample Games Car Racing Othello Ms Pac-Man – Demo

5 Agent Basics Two main approaches – Action selector – State evaluator Each of these has strengths and weaknesses For any given problem, no hard and fast rules – Experiment! Success or failure can hinge on small details!

6 Co-evolution Evolutionary algorithm: rank them using a league

7 (Co) Evolution v. TDL Temporal Difference Learning – Often learns much faster – But less robust – Learns during game-play – Uses information readily available (i.e. current observable game-state) Evolution / Co-evolution (vanilla form) – Information from game result(s) – Easier to apply – But wasteful Both can learn game strategy from scratch

8 In Pictures…

9 Simple Example: Mountain Car Often used to test TD learning methods Accelerate a car to reach goal at top of incline Engine force weaker than gravity (DEMO)

10 State Value Function Actions are applied to current state to generate set of future states State value function is used to rate these Choose action that leads to highest state value Discrete set of actions

11 Action Selector A decision function selects an output directly based on current state of system Action may be a discrete choice, or continuous outputs

12 TDL – State Value Learned

13 Evolution : Learns Policy, not Value

14 Example Network Found by NEAT+Q (Whiteson and Stone, JMLR 2006) EvoTDL Hybrid They used a different input coding So results not directly comparable

15 ~Optimal State Value Policy Function f = abs(v)

16 Action Controller Directly connect velocity to output Simple network! One neuron! One connection! Easy to interpret! vs

17 Othello With Thomas Runarsson, University of Iceland

18 Volatile Piece Difference moveMove

19 Setup Use weighted piece counter – Fast to compute (can play billions of games) – Easy to visualise – See if we can beat the ‘standard’ weights Limit search depth to 1-ply – Enables billions of games to be played – For a thorough comparison Focus on machine learning rather than game-tree search Force random moves (with prob. 0.1) – Get a more robust evaluation of playing ability

20 Standard “Heuristic” Weights (lighter = more advantageous)

21 CEL Algorithm Evolution Strategy (ES) – (1, 10) (non-elitist worked best) Gaussian mutation – Fixed sigma (not adaptive) – Fixed works just as well here Fitness defined by full round-robin league performance (e.g. 1, 0, -1 for w/d/l) Parent child averaging – Defeats noise inherent in fitness evaluation

22 TDL Algorithm Nearly as simple to apply as CEL public interface TDLPlayer extends Player { void inGameUpdate(double[] prev, double[] next); void terminalUpdate(double[] prev, double tg); } Reward signal only given at game end Initial alpha and alpha cooling rate tuned empirically

23 TDL in Java

24 CEL (1,10) v. Heuristic

25 TDL v. Random and Heuristic

26 TDL + CEL v. Heuristic (1 run)

27 Can we do better? Enforce symmetry – This speeds up learning Use trusty old friend: N-Tuple System

28 NTuple Systems W. Bledsoe and I. Browning. Pattern recognition and reading by machine. In Proceedings of the EJCC, pages 225 232, December 1959. Sample n-tuples of input space Map sampled values to memory indexes – Training: adjust values there – Recognition / play: sum over the values Superfast Related to: – Kernel trick of SVM (non-linear map to high dimensional space; then linear model) – Kanerva’s sparse memory model – Also similar to Buro’s look-up table

29 Symmetric N-Tuple Sampling

30 3-tuple Example

31 N-Tuple System Results used 30 random n-tuples Snakes created by a random 6-step walk – Duplicates squares deleted System typically has around 15000 weights Simple training rule:

32 NTuple System (TDL) total games = 1250

33 Learned strategy…

34 Web-based League (snapshot before CEC 2006 Competition)

35 Results versus CEC 2006 Champion (a manual EVO / TDL hybrid)

36 N-Tuple Summary Stunning results compared to other game- learning architectures such as MLP How might this hold for other problems? How easy are N-Tuples to apply to other domains?

37 Screen Capture Mode: Ms Pac-Man Challenge

38 Robotic Car Racing

39 Conclusions Games are great for CI research – Intellectually challenging – Fun to work with Agent learning for games is still a black art Small details can make big differences! – Which inputs to use Big details also! (NTuple versus MLP) Grand challenge: how can we design more efficient game learners? EvoTDL hybrids are the way forward.

40 CIG 2008: Perth, WA;

Download ppt "Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK."

Similar presentations

Ads by Google