Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina.

Similar presentations


Presentation on theme: "Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina."— Presentation transcript:

1 Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina Balcan (CMU)

2 f( ) ! R n items, {1,2,…,n} = [n] f : 2 [n] ! R. Focus on combinatorial settings: Valuation Functions Individuals have valuation functions giving utility for different outcomes or events. This talk: Learning valuation functions from a distribution on the data. Motivating application: Bundle pricing

3 Submodular valuations x S T x + + Large improvement Small improvement For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) T S S [ T S Å T + + ¸ Equivalent to decreasing marginal return: For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T) [n]={1,…,n}; Function f : 2 [n] ! R submodular if

4 Submodular valuations Concave Functions Let h : R ! R be concave. For each S µ [n], let f(S) = h(|S|) Vector Spaces Let V={v 1, ,v n }, each v i 2 R d. For each S µ [n], let f(S) = dim span { v i : i 2 S} E.g., For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) Decreasing marginal return: x S T x + + Large improvement Small improvement

5 S 1,…, S k Labeled Examples Passive Supervised Learning Learning Algorithm Expert / Oracle Data Source Alg. outputs Distribution D on 2 [n] f : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) g : 2 [n] ! R +

6 S 1,…, S k PMAC model for learning real valued functions Distribution D on 2 [n] Labeled Examples Learning Algorithm Expert / Oracle Data Source Alg.outputs f : 2 [n] ! R + g : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) Alg. sees (S 1,f(S 1 )),…, (S k,f(S k )), S i i.i.d. from D, produces g Probably Mostly Approximately Correct With probability ¸ 1- ±, we have Pr S [ g(S) · f(S) · ® g(S) ] ¸ 1- ² PAC Boolean {0,1}

7 Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

8 Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)

9 Computing Linear Separators + – + + + + – – – – + – + + – – – Given {+,–}-labeled points in R n, find a hyperplane c T x = b that separates the +s and –s. Easily solved by linear programming.

10 Learning Linear Separators + – + + + + – – – – + – + + – – – Given random sample of {+,–}-labeled points in R n, find a hyperplane c T x = b that separates most of the +s and –s. Classic machine learning problem. Error!

11 Learning Linear Separators + – + + + + – – – – + – + + – – – Classic Theorem: [Vapnik-Chervonenkis 1971] O( n/ ² 2 ) samples suffice to get error ². Error! ~

12 Approximating Submodular Functions Existential result from last time: In other words: Given a non-negative, monotone, submodular function f, there exists a linear function g (= 2 ) such that f 2 (S) · g(S) · n ¢ f 2 (S) for all S.

13 V f2f2 n¢f2n¢f2 g Approximating Submodular Functions

14 V + + + + + + + + + + + + + + f2f2 n¢f2n¢f2 Randomly sample {S 1,…,S k } from distribution Create + for f 2 (S i ) and – for n ¢ f 2 (S i ) Now just learn a linear separator! – – – – – – – – – – – – – – g

15 V f2f2 n¢f2n¢f2 Theorem: g approximates f to within a factor on a 1- ² fraction of the distribution. g

16 Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

17 Yesterday’s Lower Bound ; V A1A1 Yesterday’s construction: Simple function with a hidden “valley” Distribution is uniform on sets of size n 1/2 Is this example hard to learn? n 1/2 log n

18 LB for Learning Submodular Functions ; V A2A2 A1A1 Can we have multiple “valleys”? If there are exponentially many valleys, the algorithm can’t learn all them in polynomially many queries.

19 Matroids Ground Set V Family of Independent Sets I Axioms: ; 2 I “nonempty” J ½ I 2 I ) J 2 I “downwards closed” J, I 2 I and |J|<|I| ) 9 x 2 I n J s.t. J+x 2 I “maximum-size sets can be found greedily” Rank function: r(S) = max { |I| : I 2I and I µ S }

20 Partition Matroid · 2 A1A1 A2A2 This is a matroid In general, if V = A 1 [  [ A k, then is a partition matroid V..

21 Intersecting A i ’s abcdefghijkl · 2 A1A1 A2A2 Our lower bound considers the question: What if A i ’s intersect? Then I is not a matroid. For example, {a,b,k,l} and {f,g,h} are both maximal sets in I. V

22 A fix abcdefghijkl · 2 A1A1 A2A2 After truncating the rank to 3, then {a,b,k,l}  I. Checking a few cases shows that I is a matroid. V

23 A general fix (for two A i ’s) abcdefghijkl · b 1 · b 2 A1A1 A2A2 This works for any A 1,A 2 and bounds b 1,b 2 (unless b 1 +b 2 -|A 1 Å A 2 |<0) Summary: There is a matroid that’s like a partition matroid, if b i ’s large relative to |A 1 Å A 2 | V

24 The Main Question Let V = A 1 [  [ A k and b 1, ,b k 2 N Is there a matroid s.t. r(A i ) · b i 8 i r(S) is “as large as possible” for S  A i (this is not formal) If A i ’s are disjoint, solution is partition matroid If A i ’s are “almost disjoint”, can we find a matroid that’s “almost” a partition matroid? Next: formalize this

25 Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Every small left-set has nearly-maximal number of right-neighbors” UV

26 Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Neighborhoods of left-vertices are K-wise-almost-disjoint” Why “lossless”? Spectral techniques cannot obtain ² < 1/2. UV

27 Trivial Example: Disjoint Neighborhoods Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } If left-vertices have disjoint neighborhoods, this gives an expander with ² =0, K= 1 UV

28 Main Theorem: Trivial Case Suppose G =(U [ V, E) has disjoint left-neighborhoods. Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U }. Let b 1, …, b k be non-negative integers. Theorem: is family of independent sets of a matroid. A1A1 A2A2 · b1· b1 · b2· b2 U V

29 Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i A1A1 · b1· b1 A2A2 · b2· b2

30 Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i “Wishful Thinking”: I is a matroid, where

31 Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where

32 Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where Trivial case: G has disjoint neighborhoods, i.e., K= 1 and ² =0. = 0 = 1 = 0 = 1

33 LB for Learning Submodular Functions ; V A2A2 A1A1 n 1/3 log 2 n Let G =(U [ V, E) be a (D,K, ² )-lossless expander, where A i = ¡ (u i ) and – |V|=n − |U|=n log n – D = K = n 1/3 − ² = log 2 (n)/n 1/3 So, n log n valleys. Depth of valley is 1/ ² = n 1/3 /log 2 (n)

34 LB for Learning Submodular Functions Let G =(U [ V, E) be a (D,K, ² )-lossless expander, where A i = ¡ (u i ) and – |V|=n − |U|=n log n – D = K = n 1/3 − ² = log 2 (n)/n 1/3 Lower bound using (D,K, ² )-lossless expanders: – Delete each node in U with prob. ½, then use main theorem to get a matroid – If u i 2 U was not deleted then r(A i ) · b i = 4 ² D = O(log 2 n) – Claim: If u i deleted then A i 2 I (Needs a proof) ) r(A i ) = |A i | = D = n 1/3 – Distribution is uniform on the points { A i : i 2 U } – Since # A i ’s = |U| = n log n, no algorithm can learn a significant fraction of r(A i ) values in polynomial time

35 Lemma: Slight extension of [Edmonds ’70, Thm 15] Let where f : C ! Z is some function. For any I 2 I, let be the “tight sets” for I. Suppose that Then I is independent sets of a matroid. Proof: Let J,I 2 I and |J|<|I|. Must show 9 x 2 I n J s.t. J+x 2 I. Let C be the maximal set in T(J). Then |I Å C| · f(C) = |J Å C|. Since |I|>|J|, 9 x in I n (C [ J). We must have J+x 2 I, because every C’ 3 x has C’  T(J). So |(J+x) Å C’| · f(C’). So J+x 2 I. C J I x

36 Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

37 Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …] Intuitively, increasing the prices for some items does not decrease demand for the other items. Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation?

38 Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso, Crawford, Gul, Stacchetti, …] Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation? Fact: Every matroid rank function is GS. Corollary: The answer to the question is no. There is a distribution D and a randomly chosen function f s.t. f is a matroid rank function poly(n) bits of information do not suffice to predict the value of f on samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~

39 Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

40 Learning submodular functions Hypotheses: – Pr X » D [ X=x ] =  i Pr[ X i = x i ] (“Product distribution”) – f ( {i} ) 2 [0,1] for all i 2 [n] (“Lipschitz function”) – f ( {i} ) 2 {0,1} for all i 2 [n] Stronger condition! Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)

41 ; V Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).

42 Concentration Let f : 2 [n] ! R be the function f(S) = § i 2 S a i where each a i 2 [0,1]. Let X have a product distribution on [n] (each i included in X independently) Chernoff Bound: For any ® 2 [0,1], What if f is an arbitrary Lipschitz function? Azuma’s Inequality: For any ® 2 [0,1], This is useless unless E [ f ( X )] ¸ n 1 / 2.

43 Talagrand’s Inequality Def: The function f : 2 [n] ! R is certifiable if, whenever f(S) ¸ x, there exists a set I µ S, |I| · x, such that f(T) ¸ x whenever I µ T. Theorem: [Talagrand] For any Lipschitz, certifiable function f, and any ® 2 [0,1], Suppose f is a matroid rank function. Is it certifiable? Yes! Just let I be a maximal independent set.

44 Concentration for Matroid Rank Functions Linear Functions: Let f : 2 [n] ! R be f(S) = § i 2 S a i where each a i 2 [0,1]. Chernoff: For any ® 2 [0,1], Matroid Rank Functions: Let f : 2 [n] ! R be a matroid rank function. Theorem: For any ® 2 [0,1], Chekuri-Vondrak-Zenklusen ‘10 improve 422 to 3.

45 Conclusions Learning-theoretic view of submodular fns Structural properties: – Very “bumpy” under arbitrary distributions – Very “smooth” under product distributions Learnability in PMAC model: – O(n 1/2 ) approximation algorithm –  (n 1/3 ) inapproximability – O(1) approx for Lipschitz fns & product distrs No concise representation for gross substitutes


Download ppt "Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina."

Similar presentations


Ads by Google