Soar-RL Discussion: Future Directions, Open Questions, and Why You Should Use It 31 st Soar Workshop 1.

Slides:

Advertisements

Similar presentations

Meta-Level Control in Multi-Agent Systems Anita Raja and Victor Lesser Department of Computer Science University of Massachusetts Amherst, MA

Advertisements

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Introduction to Hierarchical Reinforcement Learning Jervis Pinto Slides adapted from Ron Parr (From ICML 2005 Rich Representations for Reinforcement Learning.

David Wingate Reinforcement Learning for Complex System Management.

DARPA Mobile Autonomous Robot SoftwareMay Adaptive Intelligent Mobile Robotics William D. Smart, Presenter Leslie Pack Kaelbling, PI Artificial.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

A Hierarchical Framework for Composing Nested Web Processes Haibo Zhao, Prashant Doshi LSDIS Lab, Dept. of Computer Science, University of Georgia 4 th.

Planning under Uncertainty

Reinforcement learning

RL Cont’d. Policies Total accumulated reward (value, V ) depends on Where agent starts What agent does at each step (duh) Plan of action is called a policy,

Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.

Soar-RL and Reinforcement Learning Introducing talks by Shiwali Mohan, Mitchell Keith Bloch & Nick Gorski.

Nov 14 th  Homework 4 due  Project 4 due 11/26.

Outline MDP (brief) –Background –Learning MDP Q learning Game theory (brief) –Background Markov games (2-player) –Background –Learning Markov games Littman’s.

Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

The Importance of Architecture for Achieving Human-level AI John Laird University of Michigan June 17, th Soar Workshop

Reinforcement Learning and Soar Shelley Nason. Reinforcement Learning Reinforcement learning: Learning how to act so as to maximize the expected cumulative.

Soar-RL: Reinforcement Learning and Soar Shelley Nason.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Quality of Experience Control Strategies for Scalable Video Processing Wim Verhaegh, Clemens Wüst, Reinder J. Bril, Christian Hentschel, Liesbeth Steffens.

Making Decisions CSE 592 Winter 2003 Henry Kautz.

8/9/20151 DARPA-MARS Kickoff Adaptive Intelligent Mobile Robots Leslie Pack Kaelbling Artificial Intelligence Laboratory MIT.

1 Reinforcement Learning: Learning algorithms Function Approximation Yishay Mansour Tel-Aviv University.

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

September 15September 15 Multiagent learning using a variable learning rate Igor Kiselev, University of Waterloo M. Bowling and M. Veloso. Artificial Intelligence,

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Introduction Many decision making problems in real life

Artificial Intelligence

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.

Reinforcement Learning Presentation Markov Games as a Framework for Multi-agent Reinforcement Learning Mike L. Littman Jinzhong Niu March 30, 2004.

Dynamic Programming for Partially Observable Stochastic Games Daniel S. Bernstein University of Massachusetts Amherst in collaboration with Christopher.

Integrating Background Knowledge and Reinforcement Learning for Action Selection John E. Laird Nate Derbinsky Miller Tinkerhess.

Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths & Weaknesses for Practical Deployment Tim Paek Microsoft Research Dialogue on Dialogues.

Introduction to Reinforcement Learning Dr Kathryn Merrick 2008 Spring School on Optimisation, Learning and Complexity Friday 7 th.

Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, & Adrian Sannier ASME IDETC & CIE, September 13, 2006.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

Top level learning Pass selection using TPOT-RL. DT receiver choice function DT is trained off-line in artificial situation DT used in a heuristic, hand-coded.

Class 2 Please read chapter 2 for Tuesday’s class (Response due by 3pm on Monday) How was Piazza? Any Questions?

Learning to Navigate Through Crowded Environments Peter Henry 1, Christian Vollmer 2, Brian Ferris 1, Dieter Fox 1 Tuesday, May 4, University of.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Distributed Q Learning Lars Blackmore and Steve Block.

DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.

Model Minimization in Hierarchical Reinforcement Learning Balaraman Ravindran Andrew G. Barto Autonomous Learning Laboratory.

Manufacturing versus Construction. Resiliency: adaptation to constant change.

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

Competence-Preserving Retention of Learned Knowledge in Soar’s Working and Procedural Memories Nate Derbinsky, John E. Laird University of Michigan.

 We have also applied VPI in a disaster management setting:  We investigate overlapping coalition formation models. Sequential Decision Making in Repeated.

Theme Guidance - Network Traffic Proposed NMLRG IETF 95, April 2016 Sheng Jiang (Speaker, Co-chair) Page 1/6.

CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.

Marilyn Wolf1 With contributions from:

Learning Fast and Slow John E. Laird

Online Multiscale Dynamic Topic Models

Making complex decisions

Analytics and OR DP- summary.

Clearing the Jungle of Stochastic Optimization

Dr. Unnikrishnan P.C. Professor, EEE

Reinforcement Learning

CS 188: Artificial Intelligence Spring 2006

CS 188: Artificial Intelligence Fall 2008

Department of Computer Science Ben-Gurion University

Reinforcement Nisheeth 18th January 2019.

October 20, 2010 Dr. Itamar Arel College of Engineering

Markov Decision Processes

Markov Decision Processes

Presentation transcript:

Soar-RL Discussion: Future Directions, Open Questions, and Why You Should Use It 31 st Soar Workshop 1

Soar-RL Today Robust framework for integrated online RL Big features: – Hierarchical RL – Internal actions – Architectural constraints Generalization, time Active research pushing in those directions 2

Soar-RL Tomorrow Future research directions – Additional architectural constraints – Fundamental advances in RL Increased adoption – Usage cases – Understanding barriers to use 3

Why You Should Use Soar-RL Changing environment Optimized policies over environment actions Balanced exploration and exploitation Why I would use Soar-RL if I were you 4

Non-stationary Environments Dynamic environment regularities – Adversarial opponent in a game – Weather or seasons in a simulated world – Variations between simulated and embodied tasks – Limited transfer learning, or tracking Agents that persist for long periods of time 5

Pragmatic Soar-RL 6 Get-all-blocks Go-to-room Plan-path op1op2op3 Drive-to-gateway Turn-leftTurn-rightGo-forward

Optimizing External Actions 7

Balanced Exploration & Exploitation Stationary, but stochastic actions 8

Practical Reasons Programmer time is expensive Learn in simulation, near-transfer to embodied agent If hand-coded behavior can guarantee doctrine, then so can RL behavior – Mix symbolic and adaptive behaviors FIX 9

Biggest Barrier to Using Soar-RL 10

11 Other Barriers to Adoption?

I Know What You’re Thinking 12

Future Directions for Soar-RL Parameter-free framework Issues of function approximation Scaling to general intelligence MDP characterization 13

Parameter-free framework Want fewer free parameters – Less developer time finding best settings – Stronger architectural commitments Parameters are set initially, and can evolve over time 14 Policies Exploration Gap handling Learning algo. Policies Exploration Gap handling Learning algo. Knobs Learning rate Exploration rate Initial values Eligibility decay Knobs Learning rate Exploration rate Initial values Eligibility decay

Soar-RL Value Q-value: expected future reward after taking action a in state s 15 … move-forward move-left move-forward move-left move-forward stateactionQ-value

Soar-RL Value Function Approximation Typically agent WM has lots of knowledge – Some knowledge irrelevant to RL state – Generalizing over relevant knowledge can improve learning performance Soar-RL factorization is non-trivial – Independent rules, but dependent features – Linear combination of rules & values – Rules that fire for more than one operator proposal Soar-RL is a good framework for exploration 16

Scaling to General Intelligence Actively extending Soar to long-term agents – Scaling long-term memories – Long runs with chunking Can RL contribute to long-term agents? – Intrinsic motivation – Origin of RL rules – Correct factorization Rules, feature space, time What is learned with Soar-RL and what is hand coded – Learning at the evolutionary time scale 17

MDP Characterization MDP: Markov Decision Process Interesting problems are non-Markovian – Markovian problems are solvable Goal: use Soar to tackle interesting problems Are SARSA/Q-learning the right algorithms? 18 (The catch: basal ganglia performs temporal-difference updates)

19