ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.

Slides:



Advertisements
Similar presentations
Reinforcement learning
Advertisements

Ch. 12 Routing in Switched Networks Routing in Packet Switched Networks Routing Algorithm Requirements –Correctness –Simplicity –Robustness--the.
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
Partially Observable Markov Decision Process (POMDP)
1 A Semiparametric Statistics Approach to Model-Free Policy Evaluation Tsuyoshi UENO (1), Motoaki KAWANABE (2), Takeshi MORI (1), Shin-ich MAEDA (1), Shin.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Markov Decision Processes
4/1 Agenda: Markov Decision Processes (& Decision Theoretic Planning)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Octopus Arm Mid-Term Presentation Dmitry Volkinshtein & Peter Szabo Supervised by: Yaki Engel.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
More RL. MDPs defined A Markov decision process (MDP), M, is a model of a stochastic, dynamic, controllable, rewarding process given by: M = 〈 S, A,T,R.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Exploration in Reinforcement Learning Jeremy Wyatt Intelligent Robotics Lab School of Computer Science University of Birmingham, UK
1 Rates of Convergence of Performance Gradient Estimates Using Function Approximation and Bias in Reinforcement Learning Greg Grudic University of Colorado.
Machine Learning and Optimization For Traffic and Emergency Resource Management. Milos Hauskrecht Department of Computer Science University of Pittsburgh.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate.
General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.
3.5 – Solving Systems of Equations in Three Variables.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
TKK | Automation Technology Laboratory Partially Observable Markov Decision Process (Chapter 15 & 16) José Luis Peralta.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Value Function Approximation on Non-linear Manifolds for Robot Motor Control Masashi Sugiyama1)2) Hirotaka Hachiya1)2) Christopher Towell2) Sethu.
Mobile Robot Localization (ch. 7)
Lecture 1 – Operations Research
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.
Traffic Light Simulation Lynn Jepsen. Introduction and Background Try and find the most efficient way to move cars through an intersection at different.
Modeling Long Term Care and Supportive Housing Marisela Mainegra Hing Telfer School of Management University of Ottawa Canadian Operational Research Society,
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions by S. Mahadevan & M. Maggioni Discussion led by Qi An ECE, Duke University.
Some Final Thoughts Abhijit Gosavi. From MDPs to SMDPs The Semi-MDP is a more general model in which the time for transition is also a random variable.
Instructor: Spyros Reveliotis IE7201: Production & Service Systems Engineering Fall 2009 Closure.
Abstract LSPI (Least-Squares Policy Iteration) works well in value function approximation Gaussian kernel is a popular choice as a basis function but can.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
CS 182 Reinforcement Learning. An example RL domain Solitaire –What is the state space? –What are the actions? –What is the transition function? Is it.
Reinforcement learning (Chapter 21)
Dynamic Programming Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Reinforcement learning (Chapter 21)
Markov Decision Processes
Autonomous Cyber-Physical Systems: Reinforcement Learning for Planning
"Playing Atari with deep reinforcement learning."
Markov Decision Processes
Harm van Seijen Bram Bakker Leon Kester TNO / UvA UvA
Markov Decision Processes
Exploiting Graphical Structure in Decision-Making
Chapter 3: The Reinforcement Learning Problem
Dr. Unnikrishnan P.C. Professor, EEE
Reinforcement Learning in MDPs by Lease-Square Policy Iteration
Chapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem
Traffic Light Simulation
Traffic Light Simulation
CS 416 Artificial Intelligence
Reinforcement Learning Dealing with Partial Observability
Reinforcement Learning (2)
CSCI 465 Data Communications and Networks Lecture 16
World models and basis functions
Reinforcement Learning (2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
A Deep Reinforcement Learning Approach to Traffic Management
Presentation transcript:

ADITI BHAUMICK ab3585

To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization of the level of congestion as low, medium or high are implemented. Reinforcement Learning is used as it allows the optimal strategy for signal timing to be learnt without assuming any models of the system.

On high-dimensional state action spaces, function approximation techniques are used to achieve computational efficiency. A form of Reinforcement learning called Q Learning is used to develop a Traffic Light Control algorithm that incorporates function approximation.

The Q TLC-FA is implemented. A number of other algorithms were also implemented in order to compare the Q Learning with function approximation algorithm. The algorithms are implemented on an open source Java based software called the Green Light District. The single stage cost function is k(s,a) and r1= s1= 0.5 is set.

The basis for the new form of algorithm developed is the Markov Decision Process framework. A stochastic process {Xn} that takes values in a set S is called a MDP if its evolution is governed by a control valued sequence { Zn}. The Q-Bellman Equation or the Bellman Equation of optimality is defined as:

The Q Learning based TLC with Function Approximation updates a parameter θ which is a d- dimensional quantity. Here instead of solving a system in [ SXA(S)], a system in only d variables is solved.

The metrics used for this algorithm are: The number of lanes, N The thresholds on queue lengths L1 and L2 The threshold on elapsed time T1 None of these metrics are dependent on the full state representation of the entire network.

The QTLC-FA Algorithm outperforms all the other algorithms it is compared with. Future work would involve the application of other efficient RL algorithms with function approximation. Effects of driver behavior could also be incorporated into the framework.