Download presentation

Presentation is loading. Please wait.

Published byRoman Cleasby Modified over 2 years ago

1
Machine Learning and Games Simon M. Lucas Centre for Computational Intelligence University of Essex, UK

2
Overview Games: dynamic, uncertain, open-ended – Ready-made test environments – 21 billion dollar industry: space for more machine learning… Agent architectures – Where the Computational Intelligence fits – Interfacing the Neural Nets etc – Choice of learning machine (WPC, neural network, NTuple systems) Training algorithms – Evolution / co-evolution – TDL – Hybrids Methodology: strong belief in open competitions

3
My Angle Machine learning – How well can systems learn – Given complex semi-structured environment – With indirect reward schemes

4
Sample Games Car Racing Othello Ms Pac-Man – Demo

5
Agent Basics Two main approaches – Action selector – State evaluator Each of these has strengths and weaknesses For any given problem, no hard and fast rules – Experiment! Success or failure can hinge on small details!

6
Co-evolution Evolutionary algorithm: rank them using a league

7
(Co) Evolution v. TDL Temporal Difference Learning – Often learns much faster – But less robust – Learns during game-play – Uses information readily available (i.e. current observable game-state) Evolution / Co-evolution (vanilla form) – Information from game result(s) – Easier to apply – But wasteful Both can learn game strategy from scratch

8
In Pictures…

9
Simple Example: Mountain Car Often used to test TD learning methods Accelerate a car to reach goal at top of incline Engine force weaker than gravity (DEMO)

10
State Value Function Actions are applied to current state to generate set of future states State value function is used to rate these Choose action that leads to highest state value Discrete set of actions

11
Action Selector A decision function selects an output directly based on current state of system Action may be a discrete choice, or continuous outputs

12
TDL – State Value Learned

13
Evolution : Learns Policy, not Value

14
Example Network Found by NEAT+Q (Whiteson and Stone, JMLR 2006) EvoTDL Hybrid They used a different input coding So results not directly comparable

15
~Optimal State Value Policy Function f = abs(v)

16
Action Controller Directly connect velocity to output Simple network! One neuron! One connection! Easy to interpret! vs

17
Othello With Thomas Runarsson, University of Iceland

18
Volatile Piece Difference moveMove

19
Setup Use weighted piece counter – Fast to compute (can play billions of games) – Easy to visualise – See if we can beat the ‘standard’ weights Limit search depth to 1-ply – Enables billions of games to be played – For a thorough comparison Focus on machine learning rather than game-tree search Force random moves (with prob. 0.1) – Get a more robust evaluation of playing ability

20
Standard “Heuristic” Weights (lighter = more advantageous)

21
CEL Algorithm Evolution Strategy (ES) – (1, 10) (non-elitist worked best) Gaussian mutation – Fixed sigma (not adaptive) – Fixed works just as well here Fitness defined by full round-robin league performance (e.g. 1, 0, -1 for w/d/l) Parent child averaging – Defeats noise inherent in fitness evaluation

22
TDL Algorithm Nearly as simple to apply as CEL public interface TDLPlayer extends Player { void inGameUpdate(double[] prev, double[] next); void terminalUpdate(double[] prev, double tg); } Reward signal only given at game end Initial alpha and alpha cooling rate tuned empirically

23
TDL in Java

24
CEL (1,10) v. Heuristic

25
TDL v. Random and Heuristic

26
TDL + CEL v. Heuristic (1 run)

27
Can we do better? Enforce symmetry – This speeds up learning Use trusty old friend: N-Tuple System

28
NTuple Systems W. Bledsoe and I. Browning. Pattern recognition and reading by machine. In Proceedings of the EJCC, pages , December Sample n-tuples of input space Map sampled values to memory indexes – Training: adjust values there – Recognition / play: sum over the values Superfast Related to: – Kernel trick of SVM (non-linear map to high dimensional space; then linear model) – Kanerva’s sparse memory model – Also similar to Buro’s look-up table

29
Symmetric N-Tuple Sampling

30
3-tuple Example

31
N-Tuple System Results used 30 random n-tuples Snakes created by a random 6-step walk – Duplicates squares deleted System typically has around weights Simple training rule:

32
NTuple System (TDL) total games = 1250

33
Learned strategy…

34
Web-based League (snapshot before CEC 2006 Competition)

35
Results versus CEC 2006 Champion (a manual EVO / TDL hybrid)

36
N-Tuple Summary Stunning results compared to other game- learning architectures such as MLP How might this hold for other problems? How easy are N-Tuples to apply to other domains?

37
Screen Capture Mode: Ms Pac-Man Challenge

38
Robotic Car Racing

39
Conclusions Games are great for CI research – Intellectually challenging – Fun to work with Agent learning for games is still a black art Small details can make big differences! – Which inputs to use Big details also! (NTuple versus MLP) Grand challenge: how can we design more efficient game learners? EvoTDL hybrids are the way forward.

40
CIG 2008: Perth, WA;

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google