Download presentation

Presentation is loading. Please wait.

Published byDestin Nelson Modified over 3 years ago

1
Network Utility Maximization over Partially Observable Markov Channels 1 1 Channel State 1 = ? Channel State 2 = ? Channel State 3 = ? 2 2 3 3 Restless Multi-Arm Bandit

2
This work is from the following papers:* Li, Neely WiOpt 2010 Li, Neely ArXiv 2010, submitted for conference Neely Asilomar 2010 Chih-Ping Li is graduating and is currently looking for post-doc positions! *The above paper titles are given below, and are available at: http://www-bcf.usc.edu/~mjneely/ C. Li and M. J. Neely “Exploiting Channel Memory for Multi-User Wireless Scheduling without Channel Measurement: Capacity Regions and Algorithms,” Proc. WiOpt 2010. C. Li and M. J. Neely, “Network Utility Maximization over Partially Observable Markovian Channels,” arXiv:1008.3421, Aug. 2010. M. J. Neely, “Dynamic Optimization and Learning for Renewal Systems,” Proc. Asilomar Conf. on Signals, Systems, and Computers, Nov. 2010.

3
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 N-user wireless system. Timeslots t in {0, 1, 2, …}. Choose one channel for transmission every slot t. Channels S i (t) ON/OFF Markov, current states S i (t) unknown. Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards

4
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 Restless Multi-Arm Bandit with vector rewards Suppose we serve channel i on slot t: Process S i (t) for Channel i: εiεi δiδi

5
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 Suppose we serve channel i on slot t: If S i (t)=ON ACK Reward vector r(t) = (0, …, 0, 1, 0, …, 0). Process S i (t) for Channel i: εiεi δiδi 0 1 0 Restless Multi-Arm Bandit with vector rewards = r(t)

6
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 Suppose we serve channel i on slot t: If S i (t)=ON ACK Reward vector r(t) = (0, …, 0, 1, 0, …, 0). If S i (t)=OFF NACK Reward vector r(t) = (0, …, 0, 0, 0, …, 0). Process S i (t) for Channel i: εiεi δiδi 0 0 0 = r(t) Restless Multi-Arm Bandit with vector rewards

7
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 Let ω i (t) = Pr[S i (t) = ON]. If we serve channel i, we update: ω i (t+1) = { (1-ε i ) if we get “ACK” { δ i if we get “NACK” Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards

8
1 1 S 1 (t) = ? S 2 (t) = ? S 3 (t) = ? 2 2 3 3 Let ω i (t) = Pr[S i (t) = ON]. If we do not serve channel i, we update: ω i (t+1) = ω i (t)(1-ε i ) + (1-ω i (t))δ i Process S i (t) for Channel i: εiεi δiδi Restless Multi-Arm Bandit with vector rewards

9
We want to: 1)Characterize the capacity region Λ of the system. Λ = { all stabilizable input rate vectors (λ 1,..., λ Ν ) } = { all possible time average reward vectors } 2) Perform concave utility maximization over Λ. Maximize: g(r 1,..., r Ν ) Subject to: (r 1,..., r Ν ) in Λ 1 1 2 2 3 3 λ1λ1 λ2λ2 λ3λ3

10
What is known about such systems? 1)If (S 1 (t), …, S N (t)) known every slot: Capacity Region known [Tassiulas, Ephremides 1993]. Greedy “Max-Weight” optimal [Tassiulas, Ephremides 1993]. Capacity Region is same, and Max-Weight works, for both iid vectors and time-correlated Markov vectors. 2) If (S 1 (t), …, S N (t)) unknown but iid over slots: Capacity Region is known. Greedy Max-Weight decisions are optimal. [Gopalan, Caramanis, Shakkottai Allerton 2007] [Li, Neely CDC 2007, TMC 2010] 3) If (S 1 (t), …, S N (t)) unknown and time-correlated: Capacity Region is unknown. Seems to be an intractable multi-dimensional Markov Decision Problem (MDP). Current decisions affect future (ω 1 (t), …, ω N (t)) probability vectors.

11
Our Contributions: 1) We construct an operational capacity region (inner bound). Our Contributions: 1) We construct an operational capacity region (inner bound). 2) We construct a novel frame based technique for utility maximization over this region.

12
Assume channels are positively correlated: ε i + δ i ≤ 1. εiεi δiδi ω i (t) t 1-ε i δiδi After “ACK” ω i (t) > Steady state Pr[S i (t) = ON] = δ i /(δ i +ε i ) After “NACK” ω i (t) < Steady state Pr[S i (t) = ON] = δ i /(δ i +ε i ) Gives good intuition for scheduling decisions. For Special Case of channel symmetry (ε i = ε, δ i = δ for all i), “round-robin” maximizes sum output rate. [Ahmad, Liu, Javidi, Zhao, Krishnamachari, Trans IT 2009] How to use intuition to construct a capacity region (for possibly asymmetric channels)?

13
Inner Bound on Λ int (“Operational Capacity Region”): 1 1 2 2 N N λ1λ1 λ2λ2 λNλN 3 3 1 1 7 7 4 4 Variable Length Frame Every frame, randomly pick a subset and an ordering according to some probability distribution over the ≈ N!2 N choices. Λ int = Convex hull of all randomized round-robin policies.

14
Inner Bound Properties: Bound contains a huge number of policies. Touches true capacity boundary as N ∞. Even a good bound for N=2: Can obtain efficient algorithms for optimizing over this region! Let’s see how…

15
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: 3 3 1 1 7 7 4 4 Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) }

16
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: 3 3 1 1 7 7 4 4 Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Tassiulas, Ephremides 90, 92, 93 (queue stability) Tassiulas, Ephremides 90, 92, 93 (queue stability)

17
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: 3 3 1 1 7 7 4 4 Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Neely, Modiano 2003, 2005 (queue stability + utility optimization) Neely, Modiano 2003, 2005 (queue stability + utility optimization)

18
New Lyapunov Drift Analysis Technique: Lyapunov Function: L(t) = ∑ Q i (t) 2 T-Slot Drift for frame k: Δ[k] = L(t[k] + T[k]) – L(t[k]) New Drift-Plus-Penalty Ratio Method on each frame: 3 3 1 1 7 7 4 4 Variable Length Frame t[k]t[k]+T[k] Minimize: E{ Δ[k] + V x Penalty[k] | Q(t[k]) } E{ T[k] | Q(t[k]) } Li, Neely 2010 (queue stability + utility optimization for variable frames) Li, Neely 2010 (queue stability + utility optimization for variable frames)

20
Conclusions: Quick Advertisement: New Book: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010. PDF also available from “Synthesis Lecture Series” (on digital library) Link available on Mike Neely homepage. Lyapunov Optimization theory (including renewal system problems) Detailed Examples and Problem Set Questions. Multi-Armed Bandit Problem with Reward Vectors (complex MDP). Operational Capacity Region = Convex Hull over Frame- Based Randomized Round-Robin Policies. Stochastic Network Optimization via the Drift-Plus- Penalty Ratio method.

Similar presentations

OK

Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)

Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn)

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Micro display ppt online Ppt on polynomials and coordinate geometry graph Ppt on p&g company Ppt on gas power plant in india Ppt on child labour pdf Ppt on air conditioning auditorium Ppt on structure of chromosomes in eukaryotes Decoding in reading ppt on ipad Ppt on macbook pro Human brain anatomy and physiology ppt on cells