Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and.

Slides:



Advertisements
Similar presentations
Optimal Pricing in a Free Market Wireless Network Michael J. Neely University of Southern California *Sponsored in part.
Advertisements

Network Utility Maximization over Partially Observable Markov Channels 1 1 Channel State 1 = ? Channel State 2 = ? Channel State 3 = ? Restless.
Stochastic optimization for power-aware distributed scheduling Michael J. Neely University of Southern California t ω(t)
Dynamic Data Compression in Multi-hop Wireless Networks Abhishek B. Sharma (USC) Collaborators: Leana Golubchik Ramesh Govindan Michael J. Neely.
Delay Reduction via Lagrange Multipliers in Stochastic Network Optimization Longbo Huang Michael J. Neely WiOpt *Sponsored in part by NSF.
Stochastic Network Optimization with Non-Convex Utilities and Costs Michael J. Neely University of Southern California
Intelligent Packet Dropping for Optimal Energy-Delay Tradeoffs for Wireless Michael J. Neely University of Southern California
Dynamic Product Assembly and Inventory Control for Maximum Profit Michael J. Neely, Longbo Huang (University of Southern California) Proc. IEEE Conf. on.
Distributed Association Control in Shared Wireless Networks Krishna C. Garikipati and Kang G. Shin University of Michigan-Ann Arbor.
Lecture 13 – Continuous-Time Markov Chains
Decision Theory: Single Stage Decisions Computer Science cpsc322, Lecture 33 (Textbook Chpt 9.2) March, 30, 2009.
Dynamic Index Coding Broadcast Station N N Michael J. Neely, Arash Saber Tehrani, Zhen Zhang University of Southern California Paper available.
Universal Scheduling for Networks with Arbitrary Traffic, Channels, and Mobility Michael J. Neely, University of Southern California Proc. IEEE Conf. on.
Efficient Algorithms for Renewable Energy Allocation to Delay Tolerant Consumers Michael J. Neely, Arash Saber Tehrani, Alexandros G. Dimakis University.
Utility Optimization for Dynamic Peer-to-Peer Networks with Tit-for-Tat Constraints Michael J. Neely, Leana Golubchik University of Southern California.
Stock Market Trading Via Stochastic Network Optimization Michael J. Neely (University of Southern California) Proc. IEEE Conf. on Decision and Control.
Delay-Based Network Utility Maximization Michael J. Neely University of Southern California IEEE INFOCOM, San Diego, March.
Dynamic Index Coding User set N Packet set P Broadcast Station N N p p p Michael J. Neely, Arash Saber Tehrani, Zhen Zhang University.
Dynamic Optimization and Learning for Renewal Systems -- With applications to Wireless Networks and Peer-to-Peer Networks Michael J. Neely, University.
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments Michael J. Neely University of Southern California
Dynamic Data Compression for Wireless Transmission over a Fading Channel Michael J. Neely University of Southern California CISS 2008 *Sponsored in part.
Dynamic Power Management for Systems with Multiple Power Saving States Sandy Irani, Sandeep Shukla, Rajesh Gupta.
*Sponsored in part by the DARPA IT-MANET Program, NSF OCE Opportunistic Scheduling with Reliability Guarantees in Cognitive Radio Networks Rahul.
Multi-Hop Networking with Hard Delay Constraints Michael J. Neely, University of Southern California DARPA IT-MANET Presentation, January 2011 PDF of paper.
Cross Layer Adaptive Control for Wireless Mesh Networks (and a theory of instantaneous capacity regions) Michael J. Neely, Rahul Urgaonkar University of.
CISS Princeton, March Optimization via Communication Networks Matthew Andrews Alcatel-Lucent Bell Labs.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
CPSC 322, Lecture 32Slide 1 Probability and Time: Hidden Markov Models (HMMs) Computer Science cpsc322, Lecture 32 (Textbook Chpt 6.5) March, 27, 2009.
Optimal Adaptive Data Transmission over a Fading Channel with Deadline and Power Constraints Murtaza Zafer and Eytan Modiano Laboratory for Information.
Optimal Energy and Delay Tradeoffs for Multi-User Wireless Downlinks Michael J. Neely University of Southern California
A Lyapunov Optimization Approach to Repeated Stochastic Games Michael J. Neely University of Southern California Proc.
Resource Allocation for E-healthcare Applications
Steady and Fair Rate Allocation for Rechargeable Sensors in Perpetual Sensor Networks Zizhan Zheng Authors: Kai-Wei Fan, Zizhan Zheng and Prasun Sinha.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
Optimal Backpressure Routing for Wireless Networks with Multi-Receiver Diversity Michael J. Neely University of Southern California
Delay Analysis for Maximal Scheduling in Wireless Networks with Bursty Traffic Michael J. Neely University of Southern California INFOCOM 2008, Phoenix,
By Avinash Sridrahan, Scott Moeller and Bhaskar Krishnamachari.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 31 Alternative System Description If all w k are given initially as Then,
Utility-Optimal Scheduling in Time- Varying Wireless Networks with Delay Constraints I-Hong Hou P.R. Kumar University of Illinois, Urbana-Champaign 1/30.
Michael J. Neely, University of Southern California CISS, Princeton University, March 2012 Wireless Peer-to-Peer Scheduling.
1 A Simple Asymptotically Optimal Energy Allocation and Routing Scheme in Rechargeable Sensor Networks Shengbo Chen, Prasun Sinha, Ness Shroff, Changhee.
Michael J. Neely, University of Southern California CISS, Princeton University, March 2012 Asynchronous Scheduling for.
Utility Maximization for Delay Constrained QoS in Wireless I-Hong Hou P.R. Kumar University of Illinois, Urbana-Champaign 1 /23.
Stochastic Optimal Networking: Energy, Delay, Fairness Michael J. Neely University of Southern California
Finite-Horizon Energy Allocation and Routing Scheme in Rechargeable Sensor Networks Shengbo Chen, Prasun Sinha, Ness Shroff, Changhee Joo Electrical and.
Energy-Aware Wireless Scheduling with Near Optimal Backlog and Convergence Time Tradeoffs Michael J. Neely University of Southern California INFOCOM 2015,
Super-Fast Delay Tradeoffs for Utility Optimal Scheduling in Wireless Networks Michael J. Neely University of Southern California
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
ITMANET PI Meeting September 2009 ITMANET Nequ-IT Focus Talk (PI Neely): Reducing Delay in MANETS via Queue Engineering.
The Scaling Law of SNR-Monitoring in Dynamic Wireless Networks Soung Chang Liew Hongyi YaoXiaohang Li.
Fairness and Optimal Stochastic Control for Heterogeneous Networks Time-Varying Channels     U n (c) (t) R n (c) (t) n (c) sensor.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Order Optimal Delay for Opportunistic Scheduling In Multi-User Wireless Uplinks and Downlinks Michael J. Neely University of Southern California
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern.
Delay Analysis for Max Weight Opportunistic Scheduling in Wireless Systems Michael J. Neely --- University of Southern California
Energy Optimal Control for Time Varying Wireless Networks Michael J. Neely University of Southern California
Asynchronous Control for Coupled Markov Decision Systems Michael J. Neely University of Southern California Information Theory Workshop (ITW) Lausanne,
Online Fractional Programming for Markov Decision Systems
Joint work with Bo Ji, Kannan Srinivasan, and Ness Shroff Zhenzhi Qian
Delay Efficient Wireless Networking
University of Southern California
energy requests a(t) renewable source s(t) non-renewable source x(t)
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Throughput-Optimal Broadcast in Dynamic Wireless Networks
Utility Optimization with “Super-Fast”
CS723 - Probability and Stochastic Processes
Optimal Control for Generalized Network-Flow Problems
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3
Presentation transcript:

Dynamic Optimization and Learning for Renewal Systems Michael J. Neely, University of Southern California Asilomar Conference on Signals, Systems, and Computers, Nov PDF of paper at: of paper at: Sponsored in part by the NSF Career CCF , ARL Network Science Collaborative Tech. Alliance t T/R Network Coordinator Task 1 Task 2 Task 3 T[0]T[1]T[2]

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [y 0 (π[r]), y 1 (π[r]), …, y L (π[r])] T[r] = T(π[r]) = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.2, 1.8, …, 0.4] T[r] = 8.1 = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [0.0, 3.8, …, -2.0] T[r] = 12.3 = Frame Duration

A General Renewal System t T[0]T[1]T[2] y[2] y[1]y[0] Renewal Frames r in {0, 1, 2, …}. π[r] = Policy chosen on frame r. P = Abstract policy space (π[r] in P for all r). Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]) : π[r] y[r] = [1.7, 2.2, …, 0.9] T[r] = 5.6 = Frame Duration

Example 1: Opportunistic Scheduling S[r] = (S 1 [r], S 2 [r], S 3 [r]) All Frames = 1 Slot S[r] = (S 1 [r], S 2 [r], S 3 [r]) = Channel States for Slot r Policy p[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}). Example Objectives: thruput, energy, fairness, etc.

Example 2: Markov Decision Problems M(t) = Recurrent Markov Chain (continuous or discrete) Renewals are defined as recurrences to state 1. T[r] = random inter-renewal frame size (frame r). y[r] = penalties incurred over frame r. π[r] = policy that affects transition probs over frame r. Objective: Minimize time average of one penalty subj. to time average constraints on others

Example 3: Task Processing over Networks T/R Network Coordinator Infinite Sequence of Tasks. E.g.: Query sensors and/or perform computations. Renewal Frame r = Processing Time for Frame r. Policy Types: Low Level: {Specify Transmission Decisions over Net} High Level: {Backpressure1, Backpressure2, Shortest Path} Example Objective: Maximize quality of information per unit time subject to per-node power constraints. Task 1 Task 2 Task 3 T/R

Quick Review of Renewal-Reward Theory (Pop Quiz Next Slide!) Define the frame-average for y 0 [r]: The time-average for y 0 [r] is then: *If i.i.d. over frames, by LLN this is the same as E{y 0 }/E{T}.

Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)

Pop Quiz: (10 points) Let y 0 [r] = Energy Expended on frame r. Time avg. power = (Total Energy Use)/(Total Time) Suppose (for simplicity) behavior is i.i.d. over frames. To minimize time average power, which one should we minimize? (a)(b)

Two General Problem Types: 1) Minimize time average subject to time average constraints: 2) Maximize concave function φ(x 1, …, x L ) of time average:

Solving the Problem (Type 1): Define a “Virtual Queue” for each inequality constraint: Z l [r] c l T[r] y l [r] Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

Lyapunov Function and “Drift-Plus-Penalty Ratio”: Z 2 (t) Z 1 (t) L[r] = Z 1 [r] 2 + Z 2 [r] 2 + … + Z L [r] 2 Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift” Scalar measure of queue sizes: Algorithm Technique: Every frame r, observe Z 1 [r], …, Z L [r]. Then choose a policy π[r] in P to minimize: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} “Drift-Plus-Penalty Ratio” =

The Algorithm Becomes: Observe Z[r] = (Z 1 [r], …, Z L [r]). Choose π[r] in P to solve: Then update virtual queues: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0]

Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve: Δ(Z[r]) + VE{y 0 [r]|Z[r]} E{T|Z[r]} DPP Ratio: (a) (b) For all frames r in {1, 2, 3, …}

Solving the Problem (Type 2): We reduce it to a problem with the structure of Type 1 via: Auxiliary Variables γ[r] = (γ 1 [r], …, γ L [r]). The following variation on Jensen’s Inequality: For any concave function φ(x 1,.., x L ) and any (arbitrarily correlated) vector of random variables (x 1, x 2, …, x L, T), where T>0, we have: E{Tφ(X 1, …, X L )} E{T} E{T(X 1, …, X L )} E{T} φ( ) ≤

The Algorithm (type 2) Becomes: On frame r, observe Z[r] = (Z 1 [r], …, Z L [r]). (Auxiliary Variables) Choose γ 1 [r], …, γ L [r] to max the below deterministic problem: (Policy Selection) Choose π[r] in P to minimize: Then update virtual queues: Z l [r+1] = max[Z l [r] – c l T[r] + y l [r], 0], G l [r+1] = max[G l [r] + γ l [r]T[r] - y l [r], 0]

Example Problem – Task Processing: T/R Network Coordinator Task 1 Task 2 Task 3 Every Task reveals random task parameters η[r]: η[r] = [(qual 1 [r], T 1 [r]), (qual 2 [r], T 2 [r]), …, (qual 5 [r], T 5 [r])] Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, I max ] Transmissions incur power We use a quality distribution that tends to be better for higher-numbered nodes. Maximize quality/time subject to p av ≤ 0.25 for all nodes. Setup Transmit Idle I[r] Frame r

Minimizing the Drift-Plus-Penalty Ratio: Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP). Define: “Bisection Lemma”:

Learning via Sampling from the past: Suppose randomness characterized by: {η 1, η 2,..., η W } (past random samples) Want to compute (over unknown random distribution of η): Approximate this via W samples from the past:

Simulation: Sample Size W Quality of Information / Unit Time Drift-Plus-Penalty Ratio Alg. With Bisection Alternative Alg. With Time Averaging

Concluding Sims (values for W=10): Quick Advertisement: New Book: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, T007 PDF also available from “Synthesis Lecture Series” (on digital library) Lyapunov Optimization theory (including these renewal system problems) Detailed Examples and Problem Set Questions.