Presentation is loading. Please wait.

Presentation is loading. Please wait.

Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu.

Similar presentations


Presentation on theme: "Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu."— Presentation transcript:

1 Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu

2 Machine Learning Supervised Learning Supervised Learning Unsupervised Learning Unsupervised Learning Reinforcement Learning Reinforcement Learning

3 Supervised Learning Feature: Learning with a teacher Feature: Learning with a teacher Phases Phases Training phaseTraining phase Testing phaseTesting phase Application Application Pattern recognitionPattern recognition Function approximationFunction approximation

4 Unsupervised Leaning Feature Feature Learning without a teacherLearning without a teacher Application Application Feature extractionFeature extraction Other preprocessingOther preprocessing

5 Reinforcement Learning Feature: Learning with a critic Feature: Learning with a critic Application Application OptimizationOptimization Function approximationFunction approximation

6 Elements of Reinforcement Learning Agent Agent Environment Environment Policy Policy Reward function Reward function Value function Value function Model of environment (optional) Model of environment (optional)

7 Reinforcement Learning Problem

8 Markov Decision Process (MDP) Definition: A reinforcement learning task that satisfies the Markov property A reinforcement learning task that satisfies the Markov property Transition probabilities

9 An Example of MDP

10 Markov Decision Process (cont.) Parameters Parameters Value functions

11 Elementary Methods for Reinforcement Learning Problem Dynamic programming Dynamic programming Monte Carlo Methods Monte Carlo Methods Temporal-Difference Learning Temporal-Difference Learning

12 Bellman’s Equations

13 Dynamic Programming Methods Policy evaluation Policy evaluation Policy improvement Policy improvement

14 Dynamic Programming (cont.) E ---- policy evaluation I ---- policy improvement Policy Iteration Policy Iteration Value Iteration Value Iteration

15 Monte Carlo Methods Feature Feature Learning from experienceLearning from experience Do not need complete transition probabilitiesDo not need complete transition probabilities Idea Idea Partition experience into episodesPartition experience into episodes Average sample returnAverage sample return Update at episode-by-episode baseUpdate at episode-by-episode base

16 Temporal-Difference Learning Features Features (Combination of Monte Carlo and DP ideas) (Combination of Monte Carlo and DP ideas) Learn from experience (Monte Carlo)Learn from experience (Monte Carlo) Update estimates based in part on other learned estimates (DP)Update estimates based in part on other learned estimates (DP) TD( ) algorithm seemlessly integrates TD and Monte Carlo Methods TD( ) algorithm seemlessly integrates TD and Monte Carlo Methods

17 TD(0) Learning Initialize V(x) arbitrarily  to the policy to be evaluated Repeat (for each episode): Initialize x Repeat (for each step of episode) a  action given by  for x Take action a; observe reward r and next state x’ x  x’ until x is terminal

18 Q-Learning Initialize Q(x,a) arbitrarily Repeat (for each episode) Initialize x Repeat (for each step of episode): Choose a from x using policy derived from Q Take action a, observe r, x’ x  x’ until x is terminal

19 Q-Routing Q x (y,d)----estimated time that a packet would take to reach the destination node d from current node x via x’s neighbor node y T y (d) ------y’s estimate for the time remaining in the trip q y ---------queuing time in node y T xy --------transmission time between x and y

20 Algorithm of Q-Routing 1. Set initial Q-values for each node 2. Get the first packet from the packet queue of node x 3. Choose the best neighbor node and forward the packet to node by 4. Get the estimated value from node 5. Update 6. Go to 2.

21 Dual Reinforcement Q-Routing

22 Network Model

23 Network Model (cont.)

24 Node Model

25 Routing Controller

26 Initialization/ Termination Procedures Initilization Initilization  Initialize and / or register global variable  Initialize routing table Termination Termination  Destroy routing table  Release memory

27 Arrival Procedure Data packet arrival Data packet arrival  Update routing table  Route it with control information or destroy the packet if it reaches the destination Control information packet arrival Control information packet arrival  Update routing table  Destroy the packet

28 Departure Procedure Set all fields of the packet Set all fields of the packet Get a shortest route Get a shortest route Send the packet according to the route Send the packet according to the route

29 References [1] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning—An Introduction [2] Chengan Guo, Applications of Reinforcement Learning in Sequence Detection and Network Routing [3] Simon Haykin, Neural Networks– A Comprehensive Foundation


Download ppt "Application of Reinforcement Learning in Network Routing By Chaopin Zhu Chaopin Zhu."

Similar presentations


Ads by Google