Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization, Learnability, and Games: From the Lens of Smoothed Analysis Shang-Hua Teng Computer School of TexPoint fonts.

Similar presentations


Presentation on theme: "Optimization, Learnability, and Games: From the Lens of Smoothed Analysis Shang-Hua Teng Computer School of TexPoint fonts."— Presentation transcript:

1

2 Optimization, Learnability, and Games: From the Lens of Smoothed Analysis Shang-Hua Teng Computer Science@Viterbi School of Engineering@USC TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA Joint work with Daniel Spielman (Yale), Heiko Röglin (Maastricht University), Adam Kalai (Microsoft New England Lab), Alex Samorodnitsky (Hebrew University), Xi Chen (USC), Xiaotie Deng (City University of Hong Kong)

3 This Talk Part I: Overview of Smoothed Analysis Part II : Multiobjective Optimization Part III: Machine Learning Part VI: Games, Markets and Equilibrium Part V: Discussions

4 Practical Performance of Algorithms “While theoretical work on models of computation and methods for analyzing algorithms has had enormous payoff, we are not done. In many situations, simple algorithms do well. Take for example the Simplex algorithm for linear programming, or the success of simulated annealing of contain supposedly intractable problems. We don't understand why! It is apparent that worst-case analysis does not provide useful insights on the performance of algorithms and heuristics and our models of computation need to be further developed and refined. Theoreticians are investing increasingly in careful experimental work leading to identification of important new questions in algorithms area. Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms.” -- Challenges for Theory of Computing: Report for an NSF-Sponsored Workshop on Research in Theoretical Computer Science (Condon, Edelsbrunner, Emerson, Fortnow, Haber, Karp, Leivant, Lipton, Lynch, Parberry, Papadimitriou, Rabin, Rosenberg, Royer, Savage,Selman, Smith, Tardos, and Vitter), 1999

5 Linear Programming & Simplex Method max s.t. Worst-Case: exponential Average-Case: polynomial Widely used in practice

6 Smoothed Analysis of Simplex Method (Spielman + Teng, 2001) Theorem: For all A, b, c, simplex method takes expected time polynomial in max s.t. max s.t. G is Gaussian

7 Smoothed Complexity Interpolates between worst and average case Considers neighborhood of every input If low, all bad inputs are unstable Data in practice are not arbitrary but could be generated with noises and imprecision

8 Optimization: Single Criterion & Multiobjective min f(x) subject to x ∈ S. Examples: Linear Programming Shortest path Minimum spanning tree TSP Set cover

9 Optimization: Single Criterion & Multiobjective Real-life logistical problems often involve multiple objectives Travel time, fare, departure time Delay, cost, reliability Profit and risk

10 Optimization: Single Criterion & Multiobjective min f 1 (x),..., min f d (x) subject to x ∈ S There may not be a solution that is simultaneously optimal for all f i Question: What can we do algorithmically to support a decision maker?

11 Pareto-Optimal Solutions x ∈ S dominates y ∈ S iff ∀ i : f i (x) ≤ f i (y) and ∃ i : f i (x) < f i (y)

12 Pareto-Optimal Solutions

13

14 Pareto Curve

15 Pareto Surface

16 Decision Makers only Choose Pareto- Optimal Solutions Fact: Every monotone function, e.g., 1 f 1 (x)+... +  d f d (x) is optimized by a Pareto-optimal solution. Computational Problem: Return the Pareto curve (surface, set)

17 Decision Makers only Choose Pareto- Optimal Solutions Return the Pareto curve (surface, set) Central Question: How large is the Pareto set?

18 A Concrete Model S : can encode arbitrary combinatorial structure. Examples: all paths from s to t, all Hamiltonian cycles, all spanning trees,...

19 How Large can a Pareto Set be? Worst Case: Exponential In Practice: Usually smaller –Müller-Hannemann, Weihe (2001) Train Connection travel time, fare, number of train changes

20 Smoothed Models

21 Pareto Set is Usually Small (Röglin-Teng) d = 2 [Beier-Röglin-Vöcking, 2007]: O(n 2 φ)

22 How Many Pareto Points in an  -interval

23 The Winner

24 The Losers and their Gaps

25 A Non-Concentration Lemma

26 Putting Together

27 Nearly Tight Smoothed Bounds for 2D: Many Moments

28 Three or More Objectives

29 Not So Tight Yet: But Polynomial Smoothed Bound for Fixed Dimensions

30 This Talk Part I: Overview of Smoothed Analysis Part II : Multi-objective Optimization Part III: Machine Learning Part VI: Games, Markets and Equilibrium Part V: Discussions

31 P.A.C. Learning !? X = {0,1}ⁿ f: X → {–1,+1} PAC assumption: target is from a particular concept class (for example, an AND, e.g. f(x) = “Bank” & “Adam” & “Free”) Input: training data  (x j from D, f(x j ))  j≤m Noiseless NIGERIABANKVIAGRAADAMLASERSALEFREEINf(x) x1x1 YES NO YESNOYESSPAM x2x2 YESNO YES LEGIT x3x3 NOYES LEGIT x4x4 YES NO YESSPAM x5x5 YES NOYES SPAM [Valiant84]

32 P.A.C Learning Poly-time learning algorithm –Succeed with prob. ≥ 1-  (e.g. 0.99) –m = # examples = poly(n/ε) Output: h: X → {–1,+1} with err(h) = Pr x←D [ h(x)≠f(x) ] ≤  OPTIONAL: “Proper” learning: the class from which h is.

33 Agnostic P.A.C. Learning !? X = {0,1}ⁿ f: X → {–1,+1} Without PAC assumption: target is from a particular concept class Input: training data  (x j from D, f(x j ))  j≤m Poly-time learning algorithm –Succeed with prob. ≥ 1-  (e.g. 0.99) –m = # examples = poly(n/ε) Output: h: X → {–1,+1} with err(h) = Pr x←D [ h(x) ≠ f(x) ] ≤  + min g from the class err(g) [Kearns Schapire Sellie 92]

34 Computation is limiting resource –“Easy” ignoring computation – YET: Children learn many things computers can’t –Worst-case poly-time algorithms? PAC-learn DNF, Decision trees, juntas Learning parity with noise Computational Learning Theory

35 Some Smoothed Results in Learning (Kalai-Samorodnitsky-Teng) PAC learn decision trees over smoothed (constant-bounded) product distributions PAC learn DNFs over smoothed (constant-bounded) product distribution Agnostically learn decision trees over smoothed (constant-bounded) product distributions

36 A Formal Statement of the First Result For μ [0,1]ⁿ, let π μ be the product distribution where entries of μ define the mean of Boolean variables Theorem 1: Concept Function: decision tree f: {0,1}ⁿ → {–1,+1} of size s Distribution: π μ defined by μ ν+[–.01,.01]ⁿ where ν [.02,.98]ⁿ Data: m=poly(ns/ε) training examples  (x j, f(x j ))  j≤m : x j iid from π μ, Learning Algorithm: a polynomial-time algorithm Output: a function h Quality: Pr x←π μ [ sgn(h(x))≠f(x) ] ≤ ε.

37 Fourier over Product Distributions x {0,1}ⁿ, μ [0,1]ⁿ,

38 Non-Concentration Bound on Fourier Structures For any f:{0,1}ⁿ→{–1,1}, α,β > 0, and d ≥ 1, Continuous generalization of Schwartz-Zippel theorem Let p:Rⁿ→R be a degree-d multi-linear polynomial with leading coefficient of 1. Then, for any ό>0, e.g., p(x)=x 1 x 2 x 9 +.3x 7 –0.2

39 Some Related Work Decision Trees: P.A.C. Membership Queries: Uniform Distributions [Kushilevitz-Mansour’91; Goldreich-Levin’89] [Bshouty’94] Agnostic Membership Queries: Uniform D [Gopalan-Kalai-Klivans’08] DNF: P.A.C. Membership Queries + Uniform D [Jackson’94]

40 Some Smoothed Results in Learning (Kalai-Samorodnitsky-Teng) PAC learn decision trees over smoothed (constant-bounded) product distributions PAC learn DNFs over smoothed (constant-bounded) product distribution Agnostically learn decision trees over smoothed (constant-bounded) product distributions

41 Games and Optimization

42 Optimization President U USA (x USA,x CA,x MA,…) Global optimum Local optimum Approximation

43 Multi-Objective Optimization President U USA (x USA,x CA,x MA,…) Pareto optimum [Approximation] U CA (x USA,x CA,x MA …) U MA (x USA,x CA,x MA,…)

44 Multi-Player Games President U USA (x USA,x CA,x MA,…) Best response Nash equilibrium Governor of CA U CA (x USA,x CA,x MA,…) Governor of MA U MA (x USA,x CA,x MA,…)

45 “Is the smoothed complexity of (another classic algorithm,) Lemke-Howson (algorithm) for two-player games, polynomial?” 0 1 1 1 0 1 1 1 0 BIMATRIX Games Mixed Strategies

46 Mixed equilibrium always exists: Search Problem: Find an equilibrium Nash Equilibria in Two-Player Games

47 Exchange Economies Traders Goods Initial Endowments: Utilities:

48 Arrow-Debreu Equilibrium Price A price vector Distributed Exchange Every Trader: –Sells the initial endowment to “market”: (to get a budget) –Buys from the “market” to optimize her individual utilities Market Clearing Price

49 Smoothed Model

50 Complexity of Nash Equilibria [Daskalakis-Goldberg-Papadimitriou, 2005] For any constant k ≥ 3, NASH is PPAD-hard. [Chen-Deng, 2005] 2-player NASH is PPAD-complete. [Chen-Deng-Teng, 2006] If PPAD is not in P, then 2-player NASH does not have a fully polynomial-time approximation scheme

51 Smoothed Complexity of Equilibria [Chen-Deng-Teng, 2006] NO Smoothed Polynomial-Time Complexity for Lemke- Howson or any BIMATRIX algorithm, unless computation of game and market equilibria and Brouwer fixed points is in randomized P! [Huang-Teng, 2006] Computation of Arrow-Debreu equilibria in Leontief Exchange Economies is not in Smoothed P, unless …

52 PSPACE NP PLS PPAD Complexity Classes and Complete Problems P

53 Tale of Two Types of Equilibria Local Search (Potential Games) Linear Programming –P Simplex Method –Smoothed P PLS –FPTAS Intuitive Fixed-Point Computation (Matrix Games) 2-Player Nash equilibrium –Unknown Lemke-Howson Algorithm –If in P, then NASH in RP PPAD –FPTAS, then NASH in RP Intuitive to some

54 A Basic Question Is fixed point computation fundamentally harder than local search?

55 Random Separation of Local Search and Fixed Point Computation Aldous (1983): Randomization helps local search Chen & Teng (2007): Randomization doesn’t help Fixed-Point- Computation!!! … in the black-box query model

56 Open Questions How hard is PPAD? Non-concentration of multi-linear polynomials Optimal smoothed bound for Pareto Sets

57 Non-Concentration of Multi-linear Polynomials Continuous Schwartz-Zippel Conjecture: Let p:Rⁿ→R be a degree-d multi-linear polynomial with constant coefficient of 1. Then, for any ό>0,


Download ppt "Optimization, Learnability, and Games: From the Lens of Smoothed Analysis Shang-Hua Teng Computer School of TexPoint fonts."

Similar presentations


Ads by Google