Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning Beijing University of Posts and Telecommunications China

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Motivation, problem area The interworking between radio sub-networks, especially tight cooperation between them, is of great interest in operating RATs for higher system performance, spectrum efficiency as well as better user experience. The emergency of end-to-end reconfiguability further facilitates the joint radio resource management (JRRM). Joint session access control (JOSAC) is one of the JRRM functions which can allow or deny a session to a certain RAT to achieve the optimal allocation of resources. Considering an operator running several access networks with different RATs and numerous base stations (BS) or access points (AP) in an urban area, it’s most desirable that the joint control among those RATs is self-managed to adapt to the varying traffic demand without much human-intervened planning and maintaining cost.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research Objectives In order to realize the self-management, we will appeal to the autonomic learning mechanism which plays an important role in the cognitive radio. With respective to JOSAC, such intelligence requires the agent to be able to learn the optimal policy from its online operations, which falls rightly within the field of reinforcement learning (RL). In this paper, we formulate the JOSAC problem in the multi-radio environment as a distributed RL process. Our contribution is to realize the autonomy of JOSAC by applying the RL methodology directly in the decision of joint admission control. Our objective is to achieve lower blocking probability and handover dropping probability via the autonomic “trial-and-error” learning process of the JOSAC agent. Proper service allocation and high network revenue can also be obtained from this process.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research approach, Methodology Distributed RL mode –Standard RL model State set: S={s 1,s 2,…,s n } Action set: A={a 1,a 2,…,a m } Policy: p: S→A Reward: r(s,a) Target: –Iterative Process Observation Calculation Decision Evaluation Update

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research approach, Methodology Distributed RL mode –Q-learning, as a popular RL algorithm, can learn the optimal policy through simple Q value iterations without knowing or modeling R(s,a) and Ps,s'(a). –Iteration Rule: –Optimal Policy: 0 1 2 A B 2 1 5 3 4 A -1000 1 A 10 1 B 1 Q(0, A) = 12 Q(0, B) = -988 Q(3, A) = 1 Q(4, A) = 10 Q(1, A) = 2 Q(1, B) = 11 Q(2, A) = -990 A A Qa1a1 a2a2 …amam S1S1 S2S2 … snsn

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research approach, Methodology Distributed RL mode –Distributed RL Architecture In order to collect information in the real time and improve the user experience more precisely, we assign each terminal an JOSAC agent to handle the JOSAC problem by itself. We place Q-value depositories at the network side to store the learning results of all the agents in the form of Q-value for experience sharing and memory space saving. The number of agents can change dynamically and it has a negligible influence on the convergence results.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research approach, Methodology Problem Formulation –Learning agent: distributed JOSAC agents –Environment: network status and arriving sessions –State: redirected or not, coverage, new session or handover, service type, load distribution –Action: session is rejected or accepted by one RAT –Reward: Δt is the session duration η(v,k) is the service revenue coefficient embodies the suitability of RAT k to the traffic type v β(h) is the reward gain coefficient of handover which gives a higher reward to the handover session to reduce the handover dropping proportion δ is the sharing coefficient between the orginal RAT and target RAT

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Research approach, Methodology Algorithm Implementation –(1) Initialization –(2) Q value acquisition and update –(3) Action selection and execution –(4) Reward calculation –(5) Parameter update –(6) Return to (2) –Under the state s, it will choose an action a according to a proportion which is defined as: T is the “temperature” coefficient which will gradually reduce to 0 along with the iteration.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Major Outcomes/Results Performance Evaluation Simulation scenario GSM/GPRSUMTSWLAN Cell capacity (kbps) 200 kbps800 kbps2,000 kbps η(v,k)η(v,k) Voice 351 Data 315 Area DistributionA: 10%B: 10%C: 80% New sessionHandover β(h)110 δ = 0.2γ = 0.5T 0 = 10α 0 = 0.5 N = 2K = 3 Iteration times: 20,000 Simulation Configuration

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Major Outcomes/Results Performance Evaluation The blocking probability and handover dropping probability performanceThe load difference between voice and data service in each RAT (λ/μ=20) Average revenue per hour performanceThe convergence proceed of iteration

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Conclusion and outlook A novel JOSAC algorithm based on reinforcement learning has been presented and evaluated in a multi-radio environment. It solves the joint admission control problem in an autonomic fashion using Q- learning method. Comparing to the Non-JOSAC algorithm and the LB-JOSAC algorithm, the proposed distributed RL-JOSAC can provide the optimized admission control policies that reduce the overall blocking probability while achieve lower handover dropping probability as well as higher revenue. Its learning behavior provides the opportunity to improve the online performance by exploiting the past experience. This can be a great advantage to handle the dynamic and complex situations in the B3G environment intelligently without much human efforts.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.

Similar presentations

Presentation on theme: "Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.

Similar presentations

Presentation on theme: "Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning."— Presentation transcript:

Similar presentations

About project

Feedback