Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright 2008 - E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning.

Slides:

Advertisements

Similar presentations

Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.

Advertisements

Min Song 1, Yanxiao Zhao 1, Jun Wang 1, E. K. Park 2 1 Old Dominion University, USA 2 University of Missouri at Kansas City, USA IEEE ICC 2009 A High Throughput.

May 4, Mobile Computing COE 446 Network Planning Tarek Sheltami KFUPM CCSE COE Principles of Wireless.

Impact of Radio Resource Allocation Policies on the TD-CDMA System Performance JSAC, Vol. 19, No. 10, October 2001.

Department of Telecommunications Resource (Re)allocation and Admission Control for Adaptive Multimedia Services Krunoslav Ivesic University of Zagreb,

Decentralised Structural Reorganisation in Agent Organisations Ramachandra Kota.

End-to-End Analysis of Distributed Video-on-Demand Systems Padmavathi Mundur, Robert Simon, and Arun K. Sood IEEE Transactions on Multimedia, February.

Session 12a, 16 June 2010 Future Network & Mobile Summit 2010 Copyright 2010 E3 ICT Performance Assessment of a Spectrum and Radio Resource.

6/2/2001 Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game Steven O. Kimbrough Presented at FMEC 2001, Oslo Joint work with Fang Zhong.

Reinforcement Learning Rafy Michaeli Assaf Naor Supervisor: Yaakov Engel Visit project’s home page at: FOR.

Quality of Service in IN-home digital networks Alina Albu 23 October 2003.

Cooperative Q-Learning Lars Blackmore and Steve Block Expertness Based Cooperative Q-learning Ahmadabadi, M.N.; Asadpour, M IEEE Transactions on Systems,

Integrating POMDP and RL for a Two Layer Simulated Robot Architecture Presented by Alp Sardağ.

Trunking & Grade of Service

Conference title 1 A Few Bad Apples Are Enough. An Agent-Based Peer Review Game. Juan Bautista Cabotà, Francisco Grimaldo (U. València) Lorena Cadavid.

Green Cellular Networks: A Survey, Some Research Issues and Challenges

1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Distributed Quality-of-Service Routing of Best Constrained Shortest Paths. Abdelhamid MELLOUK, Said HOCEINI, Farid BAGUENINE, Mustapha CHEURFA Computers.

COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Session 8a, 11 June 2008 ICT-MobileSummit 2008 Copyright 2008 E3 Revenue sharing models for dynamic telecommunications services using a Cognitive Pilot.

1 Performance Analysis of Coexisting Secondary Users in Heterogeneous Cognitive Radio Network Xiaohua Li Dept. of Electrical & Computer Engineering State.

On QoS Guarantees with Reward Optimization for Servicing Multiple Priority Class in Wireless Networks YaoChing Peng Eunyoung Chang.

OBJECT FOCUSED Q-LEARNING FOR AUTONOMOUS AGENTS M. ONUR CANCI.

Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Video Streaming over Cooperative Wireless Networks Mohamed Hefeeda (Joint.

Mazumdar Ne X tworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN N ETWORKING 1 Non-convex.

1 EnviroStore: A Cooperative Storage System for Disconnected Operation in Sensor Networks Liqian Luo, Chengdu Huang, Tarek Abdelzaher John Stankovic INFOCOM.

Balancing Exploration and Exploitation Ratio in Reinforcement Learning Ozkan Ozcan (1stLT/ TuAF)

Joint Illumination-Communication Optimization in Visible Light Communication Zhongqiang Yao, Hui Tian and Bo Fan State Key Laboratory of Networking and.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Main achievments LOOP Valdemar Monteiro.

Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.

1 S ystems Analysis Laboratory Helsinki University of Technology Flight Time Allocation Using Reinforcement Learning Ville Mattila and Kai Virtanen Systems.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E 3 project, UPC A Primary Spectrum Management Solution Facilitating Secondary Usage.

Session 2a, 10th June June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT An Autonomic Protocol Graph Management Architecture for Reconfigurable.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

1 Introduction to Reinforcement Learning Freek Stulp.

Tetris Agent Optimization Using Harmony Search Algorithm

An Optimal Distributed Call Admission control for Adaptive Multimedia in Wireless/Mobile Networks Reporter: 電機所鄭志川.

Static Spectrum Allocation

End-to-End Efficiency (E 3 ) Integrated Project of the EC 7 th Framework Programme Reference network architecture Description of the Algorithm

Distributed Q Learning Lars Blackmore and Steve Block.

Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.

1 Simple provisioning, complex consolidation – An approach to improve the efficiency of provisioning oriented optical networks Tamás Kárász Budapest University.

Reinforcement Learning

Slide 1 E3E3 ICC Beijing 21 May 2008 Simulated Annealing-Based Advanced Spectrum Management Methodology for WCDMA Systems Jad Nasreddine Jordi Pérez-Romero.

Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.

Status & Challenges Interoperability and global integration of communication infrastructure & service platform Fixed-mobile convergence to achieve a future.

COGNITIVE NETWORK ACCESS USING FUZZY DECISION MAKING Nicola Baldo and Michele Zorzi Department of Information Engineering – University of Padova, Italy.

Modeling Long Term Care and Supportive Housing Marisela Mainegra Hing Telfer School of Management University of Ottawa Canadian Operational Research Society,

1 Architecture and Behavioral Model for Future Cognitive Heterogeneous Networks Advisor: Wei-Yeh Chen Student: Long-Chong Hung G. Chen, Y. Zhang, M. Song,

Self-Organized Resource Allocation in LTE Systems with Weighted Proportional Fairness I-Hong Hou and Chung Shue Chen.

Hierarchical Management Architecture for Multi-Access Networks Dzmitry Kliazovich, Tiia Sutinen, Heli Kokkoniemi- Tarkkanen, Jukka Mäkelä & Seppo Horsmanheimo.

Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.

1 Spectrum Co-existence of IEEE b and a Networks using the CSCC Etiquette Protocol Xiangpeng Jing and Dipankar Raychaudhuri, WINLAB Rutgers.

Integrated Energy and Spectrum Harvesting for 5G Wireless Communications submitted by –SUMITH.MS(1KI12CS089) Guided by – BANUSHRI.S(ASST.PROF,Dept.Of.CSE)

Reinforcement Learning (1)

Chapter 3: Wireless WANs and MANs

System Control based Renewable Energy Resources in Smart Grid Consumer

User Interference Effect on Routing of Cognitive Radio Ad-Hoc Networks

Professor Arne Thesen, University of Wisconsin-Madison

“Predictive Mobile Networks”

DDoS Attack Detection under SDN Context

Coexistence Mechanism

CASE − Cognitive Agents for Social Environments

Storage Space Allocation at Marine Container Terminals Using Ant-based Control by Omor Sharif and Nathan Huynh Session 677: Innovations in intermodal.

Ioannis Gkourtzounis, Emmanouil S. Rigas and Nick Bassiliades

Network Research Center Tsinghua Univ. Beijing, P.R.China

Prestented by Zhi-Sheng, Lin

Presentation transcript:

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Autonomic Joint Session Admission Control using Reinforcement Learning Beijing University of Posts and Telecommunications China

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Motivation, problem area The interworking between radio sub-networks, especially tight cooperation between them, is of great interest in operating RATs for higher system performance, spectrum efficiency as well as better user experience. The emergency of end-to-end reconfiguability further facilitates the joint radio resource management (JRRM). Joint session access control (JOSAC) is one of the JRRM functions which can allow or deny a session to a certain RAT to achieve the optimal allocation of resources. Considering an operator running several access networks with different RATs and numerous base stations (BS) or access points (AP) in an urban area, it’s most desirable that the joint control among those RATs is self-managed to adapt to the varying traffic demand without much human-intervened planning and maintaining cost.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research Objectives In order to realize the self-management, we will appeal to the autonomic learning mechanism which plays an important role in the cognitive radio. With respective to JOSAC, such intelligence requires the agent to be able to learn the optimal policy from its online operations, which falls rightly within the field of reinforcement learning (RL). In this paper, we formulate the JOSAC problem in the multi-radio environment as a distributed RL process. Our contribution is to realize the autonomy of JOSAC by applying the RL methodology directly in the decision of joint admission control. Our objective is to achieve lower blocking probability and handover dropping probability via the autonomic “trial-and-error” learning process of the JOSAC agent. Proper service allocation and high network revenue can also be obtained from this process.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research approach, Methodology Distributed RL mode –Standard RL model State set: S={s 1,s 2,…,s n } Action set: A={a 1,a 2,…,a m } Policy: p: S→A Reward: r(s,a) Target: –Iterative Process Observation Calculation Decision Evaluation Update

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research approach, Methodology Distributed RL mode –Q-learning, as a popular RL algorithm, can learn the optimal policy through simple Q value iterations without knowing or modeling R(s,a) and Ps,s'(a). –Iteration Rule: –Optimal Policy: A B A A 10 1 B 1 Q(0, A) = 12 Q(0, B) = -988 Q(3, A) = 1 Q(4, A) = 10 Q(1, A) = 2 Q(1, B) = 11 Q(2, A) = -990 A A Qa1a1 a2a2 …amam S1S1 S2S2 … snsn

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research approach, Methodology Distributed RL mode –Distributed RL Architecture In order to collect information in the real time and improve the user experience more precisely, we assign each terminal an JOSAC agent to handle the JOSAC problem by itself. We place Q-value depositories at the network side to store the learning results of all the agents in the form of Q-value for experience sharing and memory space saving. The number of agents can change dynamically and it has a negligible influence on the convergence results.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research approach, Methodology Problem Formulation –Learning agent: distributed JOSAC agents –Environment: network status and arriving sessions –State: redirected or not, coverage, new session or handover, service type, load distribution –Action: session is rejected or accepted by one RAT –Reward: Δt is the session duration η(v,k) is the service revenue coefficient embodies the suitability of RAT k to the traffic type v β(h) is the reward gain coefficient of handover which gives a higher reward to the handover session to reduce the handover dropping proportion δ is the sharing coefficient between the orginal RAT and target RAT

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Research approach, Methodology Algorithm Implementation –(1) Initialization –(2) Q value acquisition and update –(3) Action selection and execution –(4) Reward calculation –(5) Parameter update –(6) Return to (2) –Under the state s, it will choose an action a according to a proportion which is defined as: T is the “temperature” coefficient which will gradually reduce to 0 along with the iteration.

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Major Outcomes/Results Performance Evaluation Simulation scenario GSM/GPRSUMTSWLAN Cell capacity (kbps) 200 kbps800 kbps2,000 kbps η(v,k)η(v,k) Voice 351 Data 315 Area DistributionA: 10%B: 10%C: 80% New sessionHandover β(h)110 δ = 0.2γ = 0.5T 0 = 10α 0 = 0.5 N = 2K = 3 Iteration times: 20,000 Simulation Configuration

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Major Outcomes/Results Performance Evaluation The blocking probability and handover dropping probability performanceThe load difference between voice and data service in each RAT (λ/μ=20) Average revenue per hour performanceThe convergence proceed of iteration

Session 2a, 10th June 2008 ICT-MobileSummit 2008 Copyright E3 project, BUPT Conclusion and outlook A novel JOSAC algorithm based on reinforcement learning has been presented and evaluated in a multi-radio environment. It solves the joint admission control problem in an autonomic fashion using Q- learning method. Comparing to the Non-JOSAC algorithm and the LB-JOSAC algorithm, the proposed distributed RL-JOSAC can provide the optimized admission control policies that reduce the overall blocking probability while achieve lower handover dropping probability as well as higher revenue. Its learning behavior provides the opportunity to improve the online performance by exploiting the past experience. This can be a great advantage to handle the dynamic and complex situations in the B3G environment intelligently without much human efforts.