### Similar presentations

Sergiy Nesterko, Alice Gao

Outline Introduction Problem Examples of applications

Introduction - problem
Online decision/prediction Each period, need to pick an expert and follow his "advice" Incur cost that is associated with the expert we have picked The goal is to devise a strategy to incur a total cost not much larger than the minimum total cost of any expert

Online decision problems
Shortest paths Tree update problem Spam prediction Potfolio selection Adaptive Huffman coding etc

Why not pick the best performing expert every time?
Suppose there are two experts, and cost sequence of (0,1), (1,0), (0,1), ... Picking a leader every time would give the cost of t at time t, whereas the best expert would have incurred a cost of about t/2 Aggravated if there are more experts, prone to adversarial action

The main topic of the first paper we are considering today Different from the weighted majority by the way randomness is introduced Applies to a broader set of problems (for example, tree update problem) Is arguably more elegant However, the idea is the same: give more chance for the leader(s) to be selected, and be random in your choice

The algorithm, intuitive version
At time t, for each expert i, pick p_t[i] ~ Expo(e) Choose expert with minimal c_t[i] - p_t[i] c_t[i] is the total cost of expert i so far

Example: online shortest path problem
Choose a path from vertex a to vertex b on a graph that minimizes travel time Every time, have to pick a path from a to b, which is when we learn how much time is spent on each edge Online version: treat all possible paths as experts

Online shortest path algorithm
Assign travel time 0 to all edges initially At every time t and for every edge j, generate an Expo p_t[j], assign every edge weight of c_t[j] - p_t[j], where c_t[j] is the total time on edge j so far Pick a path with smallest total aggregate travel time

The experts problem - why following the perturbed leader works
Can assume that the only p[i] is generated for every expert for all periods to build intuition if so, expert i is a leader if p[i] > v, for some v, dependent on all other experts' costs and perturbations Expert i stays a winner, if p[i] > v + c[i] Then can bound the probability that i stays the leader:

Similar to the follow-the-perturbed-leader algorithm Instead of adding randomized perturbation, add a regularizer function in order to stabilize the decision made, and thus leading to low regret Choose a decision vector that will minimize cumulative cost + regularization term  Regret bound:  Average regret -> 0 as T -> +infinity

Main idea for proving regret bound: The hypothetical Be-The-Leader algorithm has no regret.  If FTRL chooses the decisions to be close to BTL, then FTRL would have low regret.  Tradeoff for choosing a regularizer If range of the regularizer is too small, cannot achieve sufficient stability. If range of the regularizer is too large, we are too far away from choosing the optimal decision.

Weighted majority Can be interpreted as a FTRL algorithm with the following regularizer. Update rule:

Gradient descent Can be interpreted as a FTRL algorithm with the following regularizer:  Update rule:

Online convex optimization
At iteration t, the decision maker chooses x_t in a convex set K. A convex cost function f_t: K -> R is revealed, and the player incurs the cost f_t(x_t). The regret of algorithm A at time T is total cost incurred - cost of best single decision Goal:  Have a regret sublinear in T, i.e. in terms of average per-period regret, the algorithm performs as well as the best single decision in hindsight. Examples: the experts problem, online shortest paths

Online convex optimization