Download presentation

Presentation is loading. Please wait.

Published byStephen Wells Modified about 1 year ago

1
Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science

2
Simulated Soccer How does agent decide what to do with the ball? Complexities Continuous inputs High dimensionality

3
Reinforcement Learning (RL) Learning to associate utility values with state- action pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig)

4
Problems State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer…

5
Quantization Divide State Space into regions of interest Tile Coding (Sutton & Barto, 1998) No automated method for regions granularity Heterogeneity location Prefer a learned abstraction of state space

6
Kohonen Networks Clustering algorithm Data driven Voronoi Diagram Agent near opponent goal Teammate near opponent goal No nearby opponents

7
State Space Reduction 90 continuous valued inputs describe state of a soccer game Naïve discretization 2 90 states Filter out unnecessary inputs still 2 18 states Clustering algorithm only 5000 states Big Win!!!

8
Two Pass Algorithm Pass 1: Use Kohonen Network and large training set to learn state space Pass 2: Use Reinforcement Learning to learn utilities for states (SARSA)

9
Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line?

10
Unresolved Issues Increased generalization leads to frequency aliasing… This becomes a sampling problem… vs.Few samplesMany samples Example: Riemann Sum

11
Aliasing & Sampling Utility function not band limited How can we sample to reduce error? Uniformly increase sampling rate? (not the best idea) Adaptively super sample? Choose sample points based on special criteria?

12
Forcing Functions Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced reward for state-action pair How do we create such a forcing function?

13
Results Evaluate three systems Control – Random action selection SARSA Forcing Function Evaluation criteria Goals scored Time of possession

14
Cumulative Score

15
Time of Possession

16
Team with Forcing Functions

17
With Forcing vs. Without

18
Summary Two-Pass learning algorithm for simulated soccer State space abstraction is automated Data driven technique Improved state of the art for simulated soccer

19
Future Work Learned distance metric Additional automation in process Better generalization

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google