Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science

Simulated Soccer How does agent decide what to do with the ball? Complexities Continuous inputs High dimensionality

Reinforcement Learning (RL) Learning to associate utility values with state- action pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig)

Problems State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer…

Quantization Divide State Space into regions of interest Tile Coding (Sutton & Barto, 1998) No automated method for regions granularity Heterogeneity location Prefer a learned abstraction of state space

Kohonen Networks Clustering algorithm Data driven Voronoi Diagram Agent near opponent goal Teammate near opponent goal No nearby opponents

State Space Reduction 90 continuous valued inputs describe state of a soccer game Naïve discretization  2 90 states Filter out unnecessary inputs  still 2 18 states Clustering algorithm  only 5000 states Big Win!!!

Two Pass Algorithm Pass 1: Use Kohonen Network and large training set to learn state space Pass 2: Use Reinforcement Learning to learn utilities for states (SARSA)

Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line?

Unresolved Issues Increased generalization leads to frequency aliasing… This becomes a sampling problem… vs.Few samplesMany samples Example: Riemann Sum

Aliasing & Sampling Utility function not band limited How can we sample to reduce error? Uniformly increase sampling rate? (not the best idea) Adaptively super sample? Choose sample points based on special criteria?

Forcing Functions Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced reward for state-action pair How do we create such a forcing function?

Results Evaluate three systems Control – Random action selection SARSA Forcing Function Evaluation criteria Goals scored Time of possession

Cumulative Score

Time of Possession

Team with Forcing Functions

With Forcing vs. Without

Summary Two-Pass learning algorithm for simulated soccer State space abstraction is automated Data driven technique Improved state of the art for simulated soccer

Future Work Learned distance metric Additional automation in process Better generalization

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Similar presentations

Presentation on theme: "Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Similar presentations

Presentation on theme: "Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback