Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science.

Slides:

Advertisements

Similar presentations

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Advertisements

2806 Neural Computation Self-Organizing Maps Lecture Ari Visa.

11 Planning and Learning Week #9. 22 Introduction... 1 Two types of methods in RL ◦Planning methods: Those that require an environment model  Dynamic.

Reinforcement Learning for the Soccer Dribbling Task Arthur Carvalho Renato Oliveira.

1 Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012.

Class Project Due at end of finals week Essentially anything you want, so long as it’s AI related and I approve Any programming language you want In pairs.

Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

Q. The policy iteration alg. Function: policy_iteration Input: MDP M = 〈 S, A,T,R 〉  discount  Output: optimal policy π* ; opt. value func. V* Initialization:

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Policies and exploration and eligibility, oh my!.

Incorporating Advice into Agents that Learn from Reinforcement Presented by Alp Sardağ.

1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

1 Study of Topographic and Equiprobable Mapping with Clustering for Fault Classification Ashish Babbar EE645 Final Project.

Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?

Reinforcement Learning (1)

Policies and exploration and eligibility, oh my!.

Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.

Clustering & Dimensionality Reduction 273A Intro Machine Learning.

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Random signals. Histogram of the random signal Continuous Time Sinusoidal signals.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Chapter 8: Generalization and Function Approximation pLook at how experience with a limited part of the state set be used to produce good behavior over.

1 Near-Optimal Play in a Social Learning Game Ryan Carr, Eric Raboin, Austin Parker, and Dana Nau Department of Computer Science, University of Maryland.

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin.

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen

Today Ensemble Methods. Recap of the course. Classifier Fusion

Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.

Design Principles for Creating Human-Shapable Agents W. Bradley Knox, Ian Fasel, and Peter Stone The University of Texas at Austin Department of Computer.

Q-learning, SARSA, and Radioactive Breadcrumbs S&B: Ch.6 and 7.

Curiosity-Driven Exploration with Planning Trajectories Tyler Streeter PhD Student, Human Computer Interaction Iowa State University

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

POMDPs: 5 Reward Shaping: 4 Intrinsic RL: 4 Function Approximation: 3.

INTRODUCTION TO Machine Learning

CHAPTER 16: Reinforcement Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Introduction Game-playing:

1 Introduction to Reinforcement Learning Freek Stulp.

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

02/05/2002 (C) University of Wisconsin 2002, CS 559 Last Time Color Quantization Mach Banding –Humans exaggerate sharp boundaries, but not fuzzy ones.

Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.

Vector Quantization CAP5015 Fall 2005.

Institute for Theoretical Computer ScienceCGAIDE, Reading UK, 10 th November 2004 Reinforcement Learning of Strategies for Settlers of Catan Michael Pfeiffer.

Reinforcement Learning AI – Week 22 Sub-symbolic AI Two: An Introduction to Reinforcement Learning Lee McCluskey, room 3/10

Learning for Physically Diverse Robot Teams Robot Teams - Chapter 7 CS8803 Autonomous Multi-Robot Systems 10/3/02.

Reinforcement Learning for 3 vs. 2 Keepaway P. Stone, R. S. Sutton, and S. Singh Presented by Brian Light.

Planning, Acting, and Learning Chapter Contents The Sense/Plan/Act Cycle Approximate Search Learning Heuristic Functions Rewards Instead of Goals.

Reinforcement Learning

Introduction to Monte Carlo Method

Mastering the game of Go with deep neural network and tree search

Chapter 5: Monte Carlo Methods

CMSC 471 – Spring 2014 Class #25 – Thursday, May 1

Reinforcement learning (Chapter 21)

K Nearest Neighbor Classification

Nearest-Neighbor Classifiers

Announcements Homework 3 due today (grace period through Friday)

CMSC 471 Fall 2009 RL using Dynamic Programming

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming

Department of Computer Science University of York

Clustering 77B Recommender Systems

Chapter 2: Evaluative Feedback

Reinforcement Learning

Chapter 8: Generalization and Function Approximation

Chapter 9: Planning and Learning

CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29

Chapter 4: Dynamic Programming

Chapter 2: Evaluative Feedback

Presentation transcript:

Reinforcement Learning in Simulated Soccer with Kohonen Networks Chris White and David Brogan University of Virginia Department of Computer Science

Simulated Soccer How does agent decide what to do with the ball? Complexities Continuous inputs High dimensionality

Reinforcement Learning (RL) Learning to associate utility values with state- action pairs Agent incrementally updates value associated with each state-action pair based on interaction with environment (Russell & Norvig)

Problems State space explodes exponentially in terms of dimensionality Current methods of managing state space explosion lack automation RL does not scale well to problems with complexities of simulated soccer…

Quantization Divide State Space into regions of interest Tile Coding (Sutton & Barto, 1998) No automated method for regions granularity Heterogeneity location Prefer a learned abstraction of state space

Kohonen Networks Clustering algorithm Data driven Voronoi Diagram Agent near opponent goal Teammate near opponent goal No nearby opponents

State Space Reduction 90 continuous valued inputs describe state of a soccer game Naïve discretization  2 90 states Filter out unnecessary inputs  still 2 18 states Clustering algorithm  only 5000 states Big Win!!!

Two Pass Algorithm Pass 1: Use Kohonen Network and large training set to learn state space Pass 2: Use Reinforcement Learning to learn utilities for states (SARSA)

Fragility of Learned Actions What happens to attacker’s utility if goalie crosses dotted line?

Unresolved Issues Increased generalization leads to frequency aliasing… This becomes a sampling problem… vs.Few samplesMany samples Example: Riemann Sum

Aliasing & Sampling Utility function not band limited How can we sample to reduce error? Uniformly increase sampling rate? (not the best idea) Adaptively super sample? Choose sample points based on special criteria?

Forcing Functions Use a forcing function to only sample action in a state when it is likely to be effective (valleys are ignored) Reduces variance in experienced reward for state-action pair How do we create such a forcing function?

Results Evaluate three systems Control – Random action selection SARSA Forcing Function Evaluation criteria Goals scored Time of possession

Cumulative Score

Time of Possession

Team with Forcing Functions

With Forcing vs. Without

Summary Two-Pass learning algorithm for simulated soccer State space abstraction is automated Data driven technique Improved state of the art for simulated soccer

Future Work Learned distance metric Additional automation in process Better generalization