Download presentation
Presentation is loading. Please wait.
Published bySandra Grant Modified over 9 years ago
1
Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina Balcan (CMU)
2
f( ) ! R n items, {1,2,…,n} = [n] f : 2 [n] ! R. Focus on combinatorial settings: Valuation Functions Individuals have valuation functions giving utility for different outcomes or events. This talk: Learning valuation functions from a distribution on the data. Motivating application: Bundle pricing
3
Submodular valuations x S T x + + Large improvement Small improvement For T µ S, x S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) T S S [ T S Å T + + ¸ Equivalent to decreasing marginal return: For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T) [n]={1,…,n}; Function f : 2 [n] ! R submodular if
4
Submodular valuations Concave Functions Let h : R ! R be concave. For each S µ [n], let f(S) = h(|S|) Vector Spaces Let V={v 1, ,v n }, each v i 2 R d. For each S µ [n], let f(S) = dim span { v i : i 2 S} E.g., For T µ S, x S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) Decreasing marginal return: x S T x + + Large improvement Small improvement
5
S 1,…, S k Labeled Examples Passive Supervised Learning Learning Algorithm Expert / Oracle Data Source Alg. outputs Distribution D on 2 [n] f : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) g : 2 [n] ! R +
6
S 1,…, S k PMAC model for learning real valued functions Distribution D on 2 [n] Labeled Examples Learning Algorithm Expert / Oracle Data Source Alg.outputs f : 2 [n] ! R + g : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) Alg. sees (S 1,f(S 1 )),…, (S k,f(S k )), S i i.i.d. from D, produces g Probably Mostly Approximately Correct With probability ¸ 1- ±, we have Pr S [ g(S) · f(S) · ® g(S) ] ¸ 1- ² PAC Boolean {0,1}
7
Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.
8
Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)
9
Computing Linear Separators + – + + + + – – – – + – + + – – – Given {+,–}-labeled points in R n, find a hyperplane c T x = b that separates the +s and –s. Easily solved by linear programming.
10
Learning Linear Separators + – + + + + – – – – + – + + – – – Given random sample of {+,–}-labeled points in R n, find a hyperplane c T x = b that separates most of the +s and –s. Classic machine learning problem. Error!
11
Learning Linear Separators + – + + + + – – – – + – + + – – – Classic Theorem: [Vapnik-Chervonenkis 1971] O( n/ ² 2 ) samples suffice to get error ². Error! ~
12
Approximating Submodular Functions Existential result from last time: In other words: Given a non-negative, monotone, submodular function f, there exists a linear function g (= 2 ) such that f 2 (S) · g(S) · n ¢ f 2 (S) for all S.
13
V f2f2 n¢f2n¢f2 g Approximating Submodular Functions
14
V + + + + + + + + + + + + + + f2f2 n¢f2n¢f2 Randomly sample {S 1,…,S k } from distribution Create + for f 2 (S i ) and – for n ¢ f 2 (S i ) Now just learn a linear separator! – – – – – – – – – – – – – – g
15
V f2f2 n¢f2n¢f2 Theorem: g approximates f to within a factor on a 1- ² fraction of the distribution. g
16
Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.
17
Yesterday’s Lower Bound ; V A1A1 Yesterday’s construction: Simple function with a hidden “valley” Distribution is uniform on sets of size n 1/2 Is this example hard to learn? n 1/2 log n
18
LB for Learning Submodular Functions ; V A2A2 A1A1 Can we have multiple “valleys”? If there are exponentially many valleys, the algorithm can’t learn all them in polynomially many queries.
19
Matroids Ground Set V Family of Independent Sets I Axioms: ; 2 I “nonempty” J ½ I 2 I ) J 2 I “downwards closed” J, I 2 I and |J|<|I| ) 9 x 2 I n J s.t. J+x 2 I “maximum-size sets can be found greedily” Rank function: r(S) = max { |I| : I 2I and I µ S }
20
Partition Matroid · 2 A1A1 A2A2 This is a matroid In general, if V = A 1 [ [ A k, then is a partition matroid V..
21
Intersecting A i ’s abcdefghijkl · 2 A1A1 A2A2 Our lower bound considers the question: What if A i ’s intersect? Then I is not a matroid. For example, {a,b,k,l} and {f,g,h} are both maximal sets in I. V
22
A fix abcdefghijkl · 2 A1A1 A2A2 After truncating the rank to 3, then {a,b,k,l} I. Checking a few cases shows that I is a matroid. V
23
A general fix (for two A i ’s) abcdefghijkl · b 1 · b 2 A1A1 A2A2 This works for any A 1,A 2 and bounds b 1,b 2 (unless b 1 +b 2 -|A 1 Å A 2 |<0) Summary: There is a matroid that’s like a partition matroid, if b i ’s large relative to |A 1 Å A 2 | V
24
The Main Question Let V = A 1 [ [ A k and b 1, ,b k 2 N Is there a matroid s.t. r(A i ) · b i 8 i r(S) is “as large as possible” for S A i (this is not formal) If A i ’s are disjoint, solution is partition matroid If A i ’s are “almost disjoint”, can we find a matroid that’s “almost” a partition matroid? Next: formalize this
25
Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Every small left-set has nearly-maximal number of right-neighbors” UV
26
Lossless Expander Graphs Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } “Neighborhoods of left-vertices are K-wise-almost-disjoint” Why “lossless”? Spectral techniques cannot obtain ² < 1/2. UV
27
Trivial Example: Disjoint Neighborhoods Definition: G =(U [ V, E) is a (D,K, ² )-lossless expander if – Every u 2 U has degree D – | ¡ (S)| ¸ (1- ² ) ¢ D ¢ |S| 8 S µ U with |S| · K, where ¡ (S) = { v 2 V : 9 u 2 S s.t. {u,v} 2 E } If left-vertices have disjoint neighborhoods, this gives an expander with ² =0, K= 1 UV
28
Main Theorem: Trivial Case Suppose G =(U [ V, E) has disjoint left-neighborhoods. Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U }. Let b 1, …, b k be non-negative integers. Theorem: is family of independent sets of a matroid. A1A1 A2A2 · b1· b1 · b2· b2 U V
29
Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i A1A1 · b1· b1 A2A2 · b2· b2
30
Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i “Wishful Thinking”: I is a matroid, where
31
Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where
32
Main Theorem Let G =(U [ V, E) be a (D,K, ² )-lossless expander Let A ={A 1,…,A k } be defined by A = { ¡ (u) : u 2 U } Let b 1, …, b k satisfy b i ¸ 4 ² D 8 i Theorem: I is a matroid, where Trivial case: G has disjoint neighborhoods, i.e., K= 1 and ² =0. = 0 = 1 = 0 = 1
33
LB for Learning Submodular Functions ; V A2A2 A1A1 n 1/3 log 2 n Let G =(U [ V, E) be a (D,K, ² )-lossless expander, where A i = ¡ (u i ) and – |V|=n − |U|=n log n – D = K = n 1/3 − ² = log 2 (n)/n 1/3 So, n log n valleys. Depth of valley is 1/ ² = n 1/3 /log 2 (n)
34
LB for Learning Submodular Functions Let G =(U [ V, E) be a (D,K, ² )-lossless expander, where A i = ¡ (u i ) and – |V|=n − |U|=n log n – D = K = n 1/3 − ² = log 2 (n)/n 1/3 Lower bound using (D,K, ² )-lossless expanders: – Delete each node in U with prob. ½, then use main theorem to get a matroid – If u i 2 U was not deleted then r(A i ) · b i = 4 ² D = O(log 2 n) – Claim: If u i deleted then A i 2 I (Needs a proof) ) r(A i ) = |A i | = D = n 1/3 – Distribution is uniform on the points { A i : i 2 U } – Since # A i ’s = |U| = n log n, no algorithm can learn a significant fraction of r(A i ) values in polynomial time
35
Lemma: Slight extension of [Edmonds ’70, Thm 15] Let where f : C ! Z is some function. For any I 2 I, let be the “tight sets” for I. Suppose that Then I is independent sets of a matroid. Proof: Let J,I 2 I and |J|<|I|. Must show 9 x 2 I n J s.t. J+x 2 I. Let C be the maximal set in T(J). Then |I Å C| · f(C) = |J Å C|. Since |I|>|J|, 9 x in I n (C [ J). We must have J+x 2 I, because every C’ 3 x has C’ T(J). So |(J+x) Å C’| · f(C’). So J+x 2 I. C J I x
36
Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.
37
Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …] Intuitively, increasing the prices for some items does not decrease demand for the other items. Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation?
38
Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso, Crawford, Gul, Stacchetti, …] Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation? Fact: Every matroid rank function is GS. Corollary: The answer to the question is no. There is a distribution D and a randomly chosen function f s.t. f is a matroid rank function poly(n) bits of information do not suffice to predict the value of f on samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~
39
Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.
40
Learning submodular functions Hypotheses: – Pr X » D [ X=x ] = i Pr[ X i = x i ] (“Product distribution”) – f ( {i} ) 2 [0,1] for all i 2 [n] (“Lipschitz function”) – f ( {i} ) 2 {0,1} for all i 2 [n] Stronger condition! Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)
41
; V Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).
42
Concentration Let f : 2 [n] ! R be the function f(S) = § i 2 S a i where each a i 2 [0,1]. Let X have a product distribution on [n] (each i included in X independently) Chernoff Bound: For any ® 2 [0,1], What if f is an arbitrary Lipschitz function? Azuma’s Inequality: For any ® 2 [0,1], This is useless unless E [ f ( X )] ¸ n 1 / 2.
43
Talagrand’s Inequality Def: The function f : 2 [n] ! R is certifiable if, whenever f(S) ¸ x, there exists a set I µ S, |I| · x, such that f(T) ¸ x whenever I µ T. Theorem: [Talagrand] For any Lipschitz, certifiable function f, and any ® 2 [0,1], Suppose f is a matroid rank function. Is it certifiable? Yes! Just let I be a maximal independent set.
44
Concentration for Matroid Rank Functions Linear Functions: Let f : 2 [n] ! R be f(S) = § i 2 S a i where each a i 2 [0,1]. Chernoff: For any ® 2 [0,1], Matroid Rank Functions: Let f : 2 [n] ! R be a matroid rank function. Theorem: For any ® 2 [0,1], Chekuri-Vondrak-Zenklusen ‘10 improve 422 to 3.
45
Conclusions Learning-theoretic view of submodular fns Structural properties: – Very “bumpy” under arbitrary distributions – Very “smooth” under product distributions Learnability in PMAC model: – O(n 1/2 ) approximation algorithm – (n 1/3 ) inapproximability – O(1) approx for Lipschitz fns & product distrs No concise representation for gross substitutes
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.