Download presentation

Presentation is loading. Please wait.

Published byJoseph Nash Modified over 3 years ago

1
Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU

2
Standard convex optimization Convex feasible set S ½ < d Concave function f : S ! < } Goal: find x f(x) ¸ max z2S f(z) – = f(x*) - x* RdRd

3
Steepest ascent Move in the direction of steepest ascent Compute f(x) (rf(x) in higher dimensions) Works for convex optimization (and many other problems) x1x1 x2x2 x3x3 x4x4

4
Typical application Company produces certain numbers of cars per month Vector x 2 < d (#Corollas, #Camrys, …) Profit of company is concave function of production vector Maximize total (eq. average) profit PROBLEMS

5
Sequence of unknown concave functions period t: pick x t 2 S, find out only f t (x t ) convex Problem definition and results Theorem:

6
Online model Holds for arbitrary sequences Stronger than stochastic model: –f 1, f 2, …, i.i.d. from D –x * = arg min x2S E D [f(x)] expected regret

7
Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

8
First try x1x1 f 1 (x 1 ) PROFIT #CAMRYS x2x2 f 2 (x 2 ) x3x3 f 3 (x 3 ) x4x4 f 4 (x 4 ) f1f1 f2f2 f3f3 f4f4 Zinkevich 03: If we could only compute gradients… x*

9
Idea: one point gradient PROFIT #CAMRYS x x+ x- With probability ½, estimate = f(x + )/ With probability ½, estimate = –f(x – )/ E[ estimate ] ¼ f(x)

10
d-dimensional online algorithm S x1x1 x2x2 x3x3 x4x4

11
Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

12
Analysis ingredients E[1-point estimate] is gradient of is small Online gradient ascent analysis [Z03] Online expected gradient ascent analysis (Hidden complications)

13
1-pt gradient analysis PROFIT #CAMRYS x+ x-

14
1-pt gradient analysis (d-dim) E[1-point estimate] is gradient of is small 2 1

15
Online gradient ascent [Z03] (concave, bounded gradient)

16
Expected gradient ascent analysis Regular deterministic gradient ascent on g t (concave, bounded gradient)

17
Adaptive adversary…

18
Hidden complication… S

19
S

20
S

21
Thin sets are bad S

22
Hidden complication… Round sets are good …reshape into isotropic position [LV03]

23
Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

24
Variations Works against adaptive adversary –Chooses f t knowing x 1, x 2, …, x t-1 Also works if we only get a noisy estimate of f t (x t ), i.e. E[h t (x t )|x t ]=f t (x t ) diameter gradient bound

25
Finite difference Related convex optimization Sighted (see entire function(s)) Blind (evaluations only) Regular (single f) Stochastic (dist over fs or dist over errors) Online (f 1, f 2, f 3, …) Gradient descent (stoch.) Gradient descent,...Ellipsoid, Random walk [BV02], Sim. annealing [KV05], Finite difference Gradient descent (online) [Z03] 1-pt. gradient appx. [BKM04] Finite difference [Kleinberg04] 1-pt. gradient appx. [G89,S97]

26
Related discrete optimization Linear function(s) over discrete set Sighted (see entire function(s)) Blind aka bandit (evaluations only) Regular (single f) Shortest path, max, … Stochastic (dist over fs) Huffman trees, … Online (f 1, f 2, f 3, …) Weighted majority, … Online linear optimization [Hannan57,KV03] Adversarial bandits, Blind linear optimization [AK04, MB04 (adaptive adversary)]

27
2 235 235 25 235 Switching lanes (experts) 031 503 034 230 S

28
2 235 235 25 235 Multi-armed bandit (experts) 1 0 0 0 S [R52,ACFS95,…]

29
Driving to work (online routing) Exponentially many paths… Exponentially many slot machines? Finite dimensions Exploration/exploitation tradeoff 25 [TW02,KV02, AK04,BM04] S

30
Online product design

31
One-dimensional problem easy Discretize, special case of multi-armed bandit problem 1/ slot machines No need for convexity d-dimensional problem harder Discretizing at granularity Exp many (1/ d ) slot machines ) exponential regret } High dimensions

32
Non-linear applications

33
Conclusions and future work Can learn to optimize a sequence of unrelated functions from evaluations Answer to: What is the sound of one hand clapping? Applications –Cholesterol –Paper airplanes –Advertising Future work –Many players using same algorithm (game theory)

Similar presentations

Presentation is loading. Please wait....

OK

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on single entry system Ppt on seven segment display common Ppt on business letter writing Ppt on cross docking technique Laser video display ppt on ipad Show ppt on drainage pattern of indian rivers Ppt on high level languages low level Ppt on writing business letters Ppt on power system harmonics thesis Ppt on review of literature sample