Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Reinforcement Learning Peter Bodík. Previous Lectures Supervised learning –classification, regression Unsupervised learning –clustering, dimensionality.
Reinforcement learning
Markov Decision Process
Value Iteration & Q-learning CS 5368 Song Cui. Outline Recap Value Iteration Q-learning.
TEMPORAL DIFFERENCE LEARNING Mark Romero – 11/03/2011.
brings-uas-sensor-technology-to- smartphones/ brings-uas-sensor-technology-to-
Monte-Carlo Methods Learning methods averaging complete episodic returns Slides based on [Sutton & Barto: Reinforcement Learning: An Introduction, 1998]
1 Monte Carlo Methods Week #5. 2 Introduction Monte Carlo (MC) Methods –do not assume complete knowledge of environment (unlike DP methods which assume.
1 Temporal-Difference Learning Week #6. 2 Introduction Temporal-Difference (TD) Learning –a combination of DP and MC methods updates estimates based on.
Markov Decision Processes & Reinforcement Learning Megan Smith Lehigh University, Fall 2006.
COSC 878 Seminar on Large Scale Statistical Machine Learning 1.
Reinforcement Learning & Apprenticeship Learning Chenyi Chen.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 6 - Reinforcement Learning Prof. Giancarlo.
Reinforcement Learning Tutorial
Bayesian Reinforcement Learning with Gaussian Processes Huanren Zhang Electrical and Computer Engineering Purdue University.
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Reinforcement Learning Introduction Presented by Alp Sardağ.
Chapter 6: Temporal Difference Learning
Chapter 6: Temporal Difference Learning
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Reinforcement Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.
Reinforcement Learning
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
Introduction Many decision making problems in real life
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Q-learning Watkins, C. J. C. H., and Dayan, P., Q learning,
Learning Theory Reza Shadmehr & Jörn Diedrichsen Reinforcement Learning 1: Generalized policy iteration.
Bayesian Reinforcement Learning Machine Learning RCC 16 th June 2011.
Reinforcement Learning
CMSC 471 Fall 2009 Temporal Difference Learning Prof. Marie desJardins Class #25 – Tuesday, 11/24 Thanks to Rich Sutton and Andy Barto for the use of their.
Attributions These slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted.
Neural Networks Chapter 7
INTRODUCTION TO Machine Learning
Reinforcement Learning with Laser Cats! Marshall Wang Maria Jahja DTR Group Meeting October 5, 2015.
Reinforcement learning (Chapter 21)
Reinforcement Learning Elementary Solution Methods
Reinforcement Learning: Learning algorithms Yishay Mansour Tel-Aviv University.
Introduction to Reinforcement Learning Hiren Adesara Prof: Dr. Gittens.
Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Markov Decision Process (MDP)
Online Multiscale Dynamic Topic Models
Reinforcement learning
Chapter 6: Temporal Difference Learning
Reinforcement Learning
An Overview of Reinforcement Learning
Biomedical Data & Markov Decision Process
"Playing Atari with deep reinforcement learning."
UAV Route Planning in Delay Tolerant Networks
CMSC 671 – Fall 2010 Class #22 – Wednesday 11/17
Reinforcement learning
CMSC 471 Fall 2009 RL using Dynamic Programming
Chapter 4: Dynamic Programming
Chapter 4: Dynamic Programming
Instructors: Fei Fang (This Lecture) and Dave Touretzky
یادگیری تقویتی Reinforcement Learning
October 6, 2011 Dr. Itamar Arel College of Engineering
Chapter 6: Temporal Difference Learning
Chapter 10: Dimensions of Reinforcement Learning
CMSC 471 – Fall 2011 Class #25 – Tuesday, November 29
Hidden Markov Models (cont.) Markov Decision Processes
Chapter 4: Dynamic Programming
Reinforcement learning
Presentation transcript:

Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu

Machine Learning Supervised Learning Supervised Learning Unsupervised Learning Unsupervised Learning Reinforcement Learning Reinforcement Learning

Supervised Learning Feature: Learning with a teacher Feature: Learning with a teacher Phases Phases Training phaseTraining phase Testing phaseTesting phase Application Application Pattern recognitionPattern recognition Function approximationFunction approximation

Unsupervised Leaning Feature Feature Learning without a teacherLearning without a teacher Application Application Feature extractionFeature extraction Other preprocessingOther preprocessing

Reinforcement Learning Feature: Learning with a critic Feature: Learning with a critic Application Application OptimizationOptimization Function approximationFunction approximation

Elements of Reinforcement Learning Agent Agent Environment Environment Policy Policy Reward function Reward function Value function Value function Model of environment (optional) Model of environment (optional)

Reinforcement Learning Problem

Markov Decision Process (MDP) Definition: A reinforcement learning task that satisfies the Markov property A reinforcement learning task that satisfies the Markov property Transition probabilities

An Example of MDP

Markov Decision Process (cont.) Parameters Parameters Value functions

Elementary Methods for Reinforcement Learning Problem Dynamic programming Dynamic programming Monte Carlo Methods Monte Carlo Methods Temporal-Difference Learning Temporal-Difference Learning

Bellman’s Equations

Dynamic Programming Methods Policy evaluation Policy evaluation Policy improvement Policy improvement

Dynamic Programming (cont.) E ---- policy evaluation I ---- policy improvement Policy Iteration Policy Iteration Value Iteration Value Iteration

Monte Carlo Methods Feature Feature Learning from experienceLearning from experience Do not need complete transition probabilitiesDo not need complete transition probabilities Idea Idea Partition experience into episodesPartition experience into episodes Average sample returnAverage sample return Update at episode-by-episode baseUpdate at episode-by-episode base

Temporal-Difference Learning Features Features (Combination of Monte Carlo and DP ideas) (Combination of Monte Carlo and DP ideas) Learn from experience (Monte Carlo)Learn from experience (Monte Carlo) Update estimates based in part on other learned estimates (DP)Update estimates based in part on other learned estimates (DP) TD( ) algorithm seemlessly integrates TD and Monte Carlo Methods TD( ) algorithm seemlessly integrates TD and Monte Carlo Methods

TD(0) Learning Initialize V(x) arbitrarily  to the policy to be evaluated Repeat (for each episode): Initialize x Repeat (for each step of episode) a  action given by  for x Take action a; observe reward r and next state x’ x  x’ until x is terminal

Q-Learning Initialize Q(x,a) arbitrarily Repeat (for each episode) Initialize x Repeat (for each step of episode): Choose a from x using policy derived from Q Take action a, observe r, x’ x  x’ until x is terminal

Q-Routing Q x (y,d)----estimated time that a packet would take to reach the destination node d from current node x via x’s neighbor node y T y (d) y’s estimate for the time remaining in the trip q y queuing time in node y T xy transmission time between x and y

Algorithm of Q-Routing 1. Set initial Q-values for each node 2. Get the first packet from the packet queue of node x 3. Choose the best neighbor node and forward the packet to node by 4. Get the estimated value from node 5. Update 6. Go to 2.

Dual Reinforcement Q-Routing

Network Model

Network Model (cont.)

Node Model

Routing Controller

Initialization/ Termination Procedures Initilization Initilization  Initialize and / or register global variable  Initialize routing table Termination Termination  Destroy routing table  Release memory

Arrival Procedure Data packet arrival Data packet arrival  Update routing table  Route it with control information or destroy the packet if it reaches the destination Control information packet arrival Control information packet arrival  Update routing table  Destroy the packet

Departure Procedure Set all fields of the packet Set all fields of the packet Get a shortest route Get a shortest route Send the packet according to the route Send the packet according to the route

References [1] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning—An Introduction [2] Chengan Guo, Applications of Reinforcement Learning in Sequence Detection and Network Routing [3] Simon Haykin, Neural Networks– A Comprehensive Foundation