Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech

OR, Optimization Machine Learning AGT, Economics CS, Approximation Algorithms Who studies submodular functions?

f( ) ! R Valuation Functions A first step in economic modeling: individuals have valuation functions giving utility for different outcomes or events.

f( ) ! R n items, {1,2,…,n} = [n] f : 2 [n] ! R. Focus on combinatorial settings: Valuation Functions A first step in economic modeling: individuals have valuation functions giving utility for different outcomes or events.

Learning Valuation Functions This talk: learning valuation functions from past data. Package travel deals Bundle pricing

Submodular valuations x S T x + + Large improvement Small improvement For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) T S S [ T S Å T + + ¸ Equivalent to decreasing marginal return: For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T) [n]={1,…,n}; Function f : 2 [n] ! R submodular if

Submodular valuations Concave Functions Let h : R ! R be concave. For each S µ [n], let f(S) = h(|S|) Vector Spaces Let V={v 1, ,v n }, each v i 2 F n. For each S µ [n], let f(S) = rank({ v i : i 2 S}) E.g., x S T x + + Large improvement For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) Small improvement Decreasing marginal return:

S 1,…, S k Labeled Examples Passive Supervised Learning Learning Algorithm Expert / Oracle Data Source Alg. outputs Distribution D on 2 [n] f : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) g : 2 [n] ! R +

S 1,…, S k PMAC model for learning real valued functions Distribution D on 2 [n] Labeled Examples Learning Algorithm Expert / Oracle Data Source Alg.outputs f : 2 [n] ! R + g : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) Alg. sees (S 1,f(S 1 )),…, (S k,f(S k )), S i i.i.d. from D, produces g Probably Mostly Approximately Correct With probability ¸ 1- ±, we have Pr S [ g(S) · f(S) · ® g(S) ] ¸ 1- ² PAC Boolean {0,1}

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

Computing Linear Separators + – + + + + – – – – + – + + – – – Given {+,–}-labeled points in R n, find a hyperplane c T x = b that separates the +s and –s. Easily solved by linear programming.

Learning Linear Separators + – + + + + – – – – + – + + – – – Given random sample of {+,–}-labeled points in R n, find a hyperplane c T x = b that separates most of the +s and –s. Classic machine learning problem. Error!

Learning Linear Separators + – + + + + – – – – + – + + – – – Classic Theorem: [Vapnik-Chervonenkis 1971?] O( n/ ² 2 ) samples suffice to get error ². Error! ~

Submodular Functions are Approximately Linear Let f be non-negative, monotone and submodular Claim: f can be approximated to within factor n by a linear function g. Proof Sketch: Let g(S) = § s 2 S f({s}). Then f(S) · g(S) · n ¢ f(S). Submodularity: f(S)+f(T) ¸ f(S Å T)+f(S [ T) 8 S,T µ V Monotonicity: f(S) · f(T) 8 S µ T Non-negativity: f(S) ¸ 0 8 S µ V

V Submodular Functions are Approximately Linear f n¢fn¢f g

V + + + + + + + + + + + + + + f n¢fn¢f Randomly sample {S 1,…,S k } from distribution Create + for f(S i ) and – for n ¢ f(S i ) Now just learn a linear separator! – – – – – – – – – – – – – – g

V f n¢fn¢f Theorem: g approximates f to within a factor n on a 1- ² fraction of the distribution. g

V f2f2 n¢f2n¢f2 Can improve to O(n 1/2 ): in fact f 2 and n ¢ f 2 are separated by a linear function [Goemans et al. ‘09] John’s Ellipsoid theorem: any centrally symmetric convex body is approximated by an ellipsoid to within factor n 1/2 g

f(S) = min{ |S|, k } f(S) = |S|(if |S| · k) k(otherwise) ; V

; V f(S) = |S|(if |S| · k) k-1(if S=A) k(otherwise) A

; V f(S) = |S|(if |S| · k) k-1(if S 2 A ) k(otherwise) A1A1 A2A2 A3A3 AkAk A = {A 1, ,A m }, |A i |=k Claim: f is submodular if |A i Å A j | · k-2 8 i  j

; V f(S) = |S| (if |S| · k) k-1 (if S 2 A and wasn’t deleted) k (otherwise) A1A1 A3A3 Delete half of the bumps at random. Then f is very unconcentrated on A ) any algorithm to learn f has additive error 1 If algorithm sees only these examples Then f can’t be predicted here A2A2 AkAk

; V A1A1 A3A3 Can we force a bigger error with bigger bumps? Yes, if A i ’s are very “far apart”. This can be achieved by picking them randomly. AkAk A2A2

Plan: Choose two values High=n 1/3 and Low=O(log 2 n). Choose random sets A 1,…,A m µ [n], with |A i |=High and m = n log n. D is the uniform distribution on {A 1,…,A m }. Create a function f : 2 [n] ! R. For each i, randomly set f(A i )=High or f(A i )=Low. Extend f to a monotone, submodular function on 2 [n]. There is a distribution D and a randomly chosen function f s.t. f is monotone, submodular Knowing the value of f on poly(n) random samples from D does not suffice to predict the value of f on future samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~

Creating the function f We choose f to be a matroid rank function – Such functions have a rich combinatorial structure, and are always submodular The randomly chosen A i ’s form an expander: The expansion property can be leveraged to ensure f(A i )=High or f(A i )=Low as desired. where H = { j : f(A j ) = High }

Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …] Intuitively, increasing the prices for some items does not decrease demand for the other items. Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation?

Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso, Crawford, Gul, Stacchetti, …] Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation? Fact: Every matroid rank function is GS. Corollary: The answer to the question is no. There is a distribution D and a randomly chosen function f s.t. f is a matroid rank function poly(n) bits of information do not suffice to predict the value of f on samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~

Learning submodular functions Hypotheses: – Pr X » D [ X=x ] =  i Pr[ X i = x i ] (“Product distribution”) – f ( {i} ) 2 [0,1] for all i 2 [n] (“Lipschitz function”) – f ( {i} ) 2 {0,1} for all i 2 [n] Stronger condition! Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)

; V Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).

Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Algorithm: Let ¹ = § i =1 f(x i ) / m Let g be the constant function with value ¹ This achieves approximation factor O(log 2 (1/ ² )) on a 1- ² fraction of points, with high probability. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k). Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) m

Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Concentration Lemma: Let X have a product distribution. For any ® 2 [0,1], Proof: Based on Talagrand’s concentration inequality. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).

Follow-up work Subadditive & XOS functions [Badanidiyuru et al., Balcan et al.] – O(n 1/2 ) approximation –  (n 1/2 ) inapproximability Symmetric submodular functions [Balcan et al.] – O(n 1/2 ) approximation –  (n 1/3 ) inapproximability

Conclusions Learning-theoretic view of submodular fns Structural properties: – Very “bumpy” under arbitrary distributions – Very “smooth” under product distributions Learnability in PMAC model: – O(n 1/2 ) approximation algorithm –  (n 1/3 ) inapproximability – O(1) approx for Lipschitz fns & product distrs No concise representation for gross substitutes

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Similar presentations

Presentation on theme: "Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Similar presentations

Presentation on theme: "Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback