Download presentation

Presentation is loading. Please wait.

Published byStephen Wells Modified over 2 years ago

1
Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science

2
Simulated Soccer How does agent decide what to do with the ball? Complexities Continuous inputs High dimensionality

3
Reinforcement Learning (RL) Learning to associate utility values with state- action pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig)

4
Problems State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer…

5
Quantization Divide State Space into regions of interest Tile Coding (Sutton & Barto, 1998) No automated method for regions granularity Heterogeneity location Prefer a learned abstraction of state space

6
Kohonen Networks Clustering algorithm Data driven Voronoi Diagram Agent near opponent goal Teammate near opponent goal No nearby opponents

7
State Space Reduction 90 continuous valued inputs describe state of a soccer game Naïve discretization 2 90 states Filter out unnecessary inputs still 2 18 states Clustering algorithm only 5000 states Big Win!!!

8
Two Pass Algorithm Pass 1: Use Kohonen Network and large training set to learn state space Pass 2: Use Reinforcement Learning to learn utilities for states (SARSA)

9
Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line?

10
Unresolved Issues Increased generalization leads to frequency aliasing… This becomes a sampling problem… vs.Few samplesMany samples Example: Riemann Sum

11
Aliasing & Sampling Utility function not band limited How can we sample to reduce error? Uniformly increase sampling rate? (not the best idea) Adaptively super sample? Choose sample points based on special criteria?

12
Forcing Functions Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced reward for state-action pair How do we create such a forcing function?

13
Results Evaluate three systems Control – Random action selection SARSA Forcing Function Evaluation criteria Goals scored Time of possession

14
Cumulative Score

15
Time of Possession

16
Team with Forcing Functions

17
With Forcing vs. Without

18
Summary Two-Pass learning algorithm for simulated soccer State space abstraction is automated Data driven technique Improved state of the art for simulated soccer

19
Future Work Learned distance metric Additional automation in process Better generalization

Similar presentations

OK

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on computer languages for gaming Ppt on fast food in hindi Creating ppt on ipad Ppt on indian defence forces Ppt on statistics and probability examples Ppt on agile software development Ppt on indian air force Ppt on female foeticide Ppt on 5 star chocolate pie Ppt on the art of war ii