Presentation is loading. Please wait.

Presentation is loading. Please wait.

Follow the regularized leader

Similar presentations


Presentation on theme: "Follow the regularized leader"— Presentation transcript:

1 Follow the regularized leader
Sergiy Nesterko, Alice Gao

2 Outline Introduction Problem Examples of applications
Follow the ??? leader Follow the leader Follow the perturbed leader Follow the regularized leader Online learning algorithms Weighted majority Gradient descent Online convex optimization

3 Introduction - problem
Online decision/prediction Each period, need to pick an expert and follow his "advice" Incur cost that is associated with the expert we have picked The goal is to devise a strategy to incur a total cost not much larger than the minimum total cost of any expert

4 Online decision problems
Shortest paths Tree update problem Spam prediction Potfolio selection Adaptive Huffman coding etc

5 Why not pick the best performing expert every time?
Suppose there are two experts, and cost sequence of (0,1), (1,0), (0,1), ... Picking a leader every time would give the cost of t at time t, whereas the best expert would have incurred a cost of about t/2 Aggravated if there are more experts, prone to adversarial action

6 Instead, follow perturbed leader
The main topic of the first paper we are considering today Different from the weighted majority by the way randomness is introduced Applies to a broader set of problems (for example, tree update problem) Is arguably more elegant However, the idea is the same: give more chance for the leader(s) to be selected, and be random in your choice

7 The algorithm, intuitive version
At time t, for each expert i, pick p_t[i] ~ Expo(e) Choose expert with minimal c_t[i] - p_t[i] c_t[i] is the total cost of expert i so far

8 Example: online shortest path problem
Choose a path from vertex a to vertex b on a graph that minimizes travel time Every time, have to pick a path from a to b, which is when we learn how much time is spent on each edge Online version: treat all possible paths as experts

9 Online shortest path algorithm
Assign travel time 0 to all edges initially At every time t and for every edge j, generate an Expo p_t[j], assign every edge weight of c_t[j] - p_t[j], where c_t[j] is the total time on edge j so far Pick a path with smallest total aggregate travel time

10 The experts problem - why following the perturbed leader works
Can assume that the only p[i] is generated for every expert for all periods to build intuition if so, expert i is a leader if p[i] > v, for some v, dependent on all other experts' costs and perturbations Expert i stays a winner, if p[i] > v + c[i] Then can bound the probability that i stays the leader:

11 Follow the regularized leader (1/2)
Similar to the follow-the-perturbed-leader algorithm Instead of adding randomized perturbation, add a regularizer function in order to stabilize the decision made, and thus leading to low regret Choose a decision vector that will minimize cumulative cost + regularization term  Regret bound:  Average regret -> 0 as T -> +infinity

12 Follow the regularized leader (2/2)
Main idea for proving regret bound: The hypothetical Be-The-Leader algorithm has no regret.  If FTRL chooses the decisions to be close to BTL, then FTRL would have low regret.  Tradeoff for choosing a regularizer If range of the regularizer is too small, cannot achieve sufficient stability. If range of the regularizer is too large, we are too far away from choosing the optimal decision.

13 Weighted majority Can be interpreted as a FTRL algorithm with the following regularizer. Update rule:

14 Gradient descent Can be interpreted as a FTRL algorithm with the following regularizer:  Update rule:

15 Online convex optimization
At iteration t, the decision maker chooses x_t in a convex set K. A convex cost function f_t: K -> R is revealed, and the player incurs the cost f_t(x_t). The regret of algorithm A at time T is total cost incurred - cost of best single decision Goal:  Have a regret sublinear in T, i.e. in terms of average per-period regret, the algorithm performs as well as the best single decision in hindsight. Examples: the experts problem, online shortest paths

16 Online convex optimization
The follow the regularized leader algorithm  The primal-dual approach

17 The primal-dual approach
Performing updates and optimization in the dual space defined by the regularizer Project the dual solution y_t into the solution x_t in the primal space x_t using Bregman divergence For linear cost functions, the primal-dual approach is equivalent to the FTRL algorithm.

18 Discussion Would you be able to think of a way to connect FTRL algorithms (e.g. weighted majority) to market scoring rules? The algorithms strive to achieve single best expert's performance, what if it is not very good? Tradeoff between speed of execution/performance of experts for a given problem would be interesting to explore


Download ppt "Follow the regularized leader"

Similar presentations


Ads by Google