# Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU.

## Presentation on theme: "Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU."— Presentation transcript:

Blind online optimization Gradient descent without a gradient Abie Flaxman CMU Adam Tauman Kalai TTI Brendan McMahan CMU

Standard convex optimization Convex feasible set S ½ < d Concave function f : S ! < } Goal: find x f(x) ¸ max z2S f(z) – = f(x*) - x* RdRd

Steepest ascent Move in the direction of steepest ascent Compute f(x) (rf(x) in higher dimensions) Works for convex optimization (and many other problems) x1x1 x2x2 x3x3 x4x4

Typical application Company produces certain numbers of cars per month Vector x 2 < d (#Corollas, #Camrys, …) Profit of company is concave function of production vector Maximize total (eq. average) profit PROBLEMS

Sequence of unknown concave functions period t: pick x t 2 S, find out only f t (x t ) convex Problem definition and results Theorem:

Online model Holds for arbitrary sequences Stronger than stochastic model: –f 1, f 2, …, i.i.d. from D –x * = arg min x2S E D [f(x)] expected regret

Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

First try x1x1 f 1 (x 1 ) PROFIT #CAMRYS x2x2 f 2 (x 2 ) x3x3 f 3 (x 3 ) x4x4 f 4 (x 4 ) f1f1 f2f2 f3f3 f4f4 Zinkevich 03: If we could only compute gradients… x*

Idea: one point gradient PROFIT #CAMRYS x x+ x- With probability ½, estimate = f(x + )/ With probability ½, estimate = –f(x – )/ E[ estimate ] ¼ f(x)

d-dimensional online algorithm S x1x1 x2x2 x3x3 x4x4

Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

Analysis ingredients E[1-point estimate] is gradient of is small Online gradient ascent analysis [Z03] Online expected gradient ascent analysis (Hidden complications)

1-pt gradient analysis PROFIT #CAMRYS x+ x-

1-pt gradient analysis (d-dim) E[1-point estimate] is gradient of is small 2 1

Hidden complication… S

S

S

Hidden complication… Round sets are good …reshape into isotropic position [LV03]

Outline Problem definition Simple algorithm Analysis sketch Variations Related work & applications

Variations Works against adaptive adversary –Chooses f t knowing x 1, x 2, …, x t-1 Also works if we only get a noisy estimate of f t (x t ), i.e. E[h t (x t )|x t ]=f t (x t ) diameter gradient bound

Finite difference Related convex optimization Sighted (see entire function(s)) Blind (evaluations only) Regular (single f) Stochastic (dist over fs or dist over errors) Online (f 1, f 2, f 3, …) Gradient descent (stoch.) Gradient descent,...Ellipsoid, Random walk [BV02], Sim. annealing [KV05], Finite difference Gradient descent (online) [Z03] 1-pt. gradient appx. [BKM04] Finite difference [Kleinberg04] 1-pt. gradient appx. [G89,S97]

Related discrete optimization Linear function(s) over discrete set Sighted (see entire function(s)) Blind aka bandit (evaluations only) Regular (single f) Shortest path, max, … Stochastic (dist over fs) Huffman trees, … Online (f 1, f 2, f 3, …) Weighted majority, … Online linear optimization [Hannan57,KV03] Adversarial bandits, Blind linear optimization [AK04, MB04 (adaptive adversary)]

2 235 235 25 235 Switching lanes (experts) 031 503 034 230 S

2 235 235 25 235 Multi-armed bandit (experts) 1 0 0 0 S [R52,ACFS95,…]

Driving to work (online routing) Exponentially many paths… Exponentially many slot machines? Finite dimensions Exploration/exploitation tradeoff 25 [TW02,KV02, AK04,BM04] S

Online product design

One-dimensional problem easy Discretize, special case of multi-armed bandit problem 1/ slot machines No need for convexity d-dimensional problem harder Discretizing at granularity Exp many (1/ d ) slot machines ) exponential regret } High dimensions

Non-linear applications

Conclusions and future work Can learn to optimize a sequence of unrelated functions from evaluations Answer to: What is the sound of one hand clapping? Applications –Cholesterol –Paper airplanes –Advertising Future work –Many players using same algorithm (game theory)