Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Slides:

Advertisements

Similar presentations

1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )

Advertisements

Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.

Combinatorial Auctions with Complement-Free Bidders – An Overview Speaker: Michael Schapira Based on joint works with Shahar Dobzinski & Noam Nisan.

Conditional Equilibrium Outcomes via Ascending Price Processes Joint work with Hu Fu and Robert Kleinberg (Computer Science, Cornell University) Ron Lavi.

6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 16.

Learning and Testing Submodular Functions Grigory Yaroslavtsev Columbia University October 26, 2012 With Sofya Raskhodnikova (SODA’13) + Work in progress.

Online Mechanism Design (Randomized Rounding on the Fly)

OR of XORs of Singletons. Approximation Algorithms for CAs with Complement-Free Bidders ClassUpper BoundLower Bound Generalmin {n, O(m 1/2 )} min {n, O(m.

Dependent Randomized Rounding in Matroid Polytopes (& Related Results) Chandra Chekuri Jan VondrakRico Zenklusen Univ. of Illinois IBM ResearchMIT.

A general agnostic active learning algorithm

Introduction to PCP and Hardness of Approximation Dana Moshkovitz Princeton University and The Institute for Advanced Study 1.

1/17 Optimal Long Test with One Free Bit Nikhil Bansal (IBM) Subhash Khot (NYU)

Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech.

Learning Submodular Functions Nick Harvey, Waterloo C&O Joint work with Nina Balcan, Georgia Tech.

Ron Lavi Presented by Yoni Moses.  Introduction ◦ Combining computational efficiency with game theoretic needs  Monotonicity Conditions ◦ Cyclic Monotonicity.

Learning and Testing Submodular Functions Grigory Yaroslavtsev Slides at

Item Pricing for Revenue Maximization in Combinatorial Auctions Maria-Florina Balcan, Carnegie Mellon University Joint with Avrim Blum and Yishay Mansour.

Active Learning of Binary Classifiers

Limitations of VCG-Based Mechanisms Shahar Dobzinski Joint work with Noam Nisan.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005

An Improved Approximation Algorithm for Combinatorial Auctions with Submodular Bidders.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

An Improved Approximation Algorithm for Combinatorial Auctions with Submodular Bidders Speaker: Shahar Dobzinski Joint work with Michael Schapira.

The Goldreich-Levin Theorem: List-decoding the Hadamard code

Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,

Vapnik-Chervonenkis Dimension Part II: Lower and Upper bounds.

On Testing Convexity and Submodularity Michal Parnas Dana Ron Ronitt Rubinfeld.

Maria-Florina Balcan Learning with Similarity Functions Maria-Florina Balcan & Avrim Blum CMU, CSD.

Quantum Algorithms II Andrew C. Yao Tsinghua University & Chinese U. of Hong Kong.

Maria-Florina Balcan Mechanism Design, Machine Learning, and Pricing Problems Maria-Florina Balcan 11/13/2007.

1 A New Interactive Hashing Theorem Iftach Haitner and Omer Reingold WEIZMANN INSTITUTE OF SCIENCE.

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Incorporating Unlabeled Data in the Learning Process

Fast Algorithms for Submodular Optimization

Learning Submodular Functions

Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

A Conjecture for the Packing Density of 2413 Walter Stromquist Swarthmore College (joint work with Cathleen Battiste Presutti, Ohio University at Lancaster)

Approximation algorithms for sequential testing of Boolean functions Lisa Hellerstein Polytechnic Institute of NYU Joint work with Devorah Kletenik (Polytechnic.

Preference elicitation Communicational Burden by Nisan, Segal, Lahaie and Parkes October 27th, 2004 Jella Pfeiffer.

Approximating Submodular Functions Part 2 Nick Harvey University of British Columbia Department of Computer Science July 12 th, 2015 Joint work with Nina.

Kernels, Margins, and Low-dimensional Mappings [NIPS 2007 Workshop on TOPOLOGY LEARNING ] Maria-Florina Balcan, Avrim Blum, Santosh Vempala.

Slide 1 of 16 Noam Nisan The Power and Limitations of Item Price Combinatorial Auctions Noam Nisan Hebrew University, Jerusalem.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

Maria-Florina Balcan Mechanism Design, Machine Learning and Pricing Problems Maria-Florina Balcan Joint work with Avrim Blum, Jason Hartline, and Yishay.

Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.

Submodular Maximization with Cardinality Constraints Moran Feldman Based On Submodular Maximization with Cardinality Constraints. Niv Buchbinder, Moran.

Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

A Unified Continuous Greedy Algorithm for Submodular Maximization Moran Feldman Roy SchwartzJoseph (Seffi) Naor Technion – Israel Institute of Technology.

Maximization Problems with Submodular Objective Functions Moran Feldman Publication List Improved Approximations for k-Exchange Systems. Moran Feldman,

Approximation Algorithms for Combinatorial Auctions with Complement-Free Bidders Speaker: Shahar Dobzinski Joint work with Noam Nisan & Michael Schapira.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Learning with General Similarity Functions Maria-Florina Balcan.

Reconstructing Preferences from Opaque Transactions Avrim Blum Carnegie Mellon University Joint work with Yishay Mansour (Tel-Aviv) and Jamie Morgenstern.

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Approximation algorithms for combinatorial allocation problems

New Characterizations in Turnstile Streams with Applications

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Generalization and adaptivity in stochastic convex optimization

Distributed Submodular Maximization in Massive Datasets

Vapnik–Chervonenkis Dimension

Combinatorial Prophet Inequalities

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

k-center Clustering under Perturbation Resilience

Linear sketching with parities

Generalization bounds for uniformly stable algorithms

Submodular Maximization with Cardinality Constraints

Guess Free Maximization of Submodular and Linear Sums

Presentation transcript:

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech

OR, Optimization Machine Learning AGT, Economics CS, Approximation Algorithms Who studies submodular functions?

f( ) ! R Valuation Functions A first step in economic modeling: individuals have valuation functions giving utility for different outcomes or events.

f( ) ! R n items, {1,2,…,n} = [n] f : 2 [n] ! R. Focus on combinatorial settings: Valuation Functions A first step in economic modeling: individuals have valuation functions giving utility for different outcomes or events.

Learning Valuation Functions This talk: learning valuation functions from past data. Package travel deals Bundle pricing

Submodular valuations x S T x + + Large improvement Small improvement For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) T S S [ T S Å T + + ¸ Equivalent to decreasing marginal return: For all S,T µ [n]: f(S)+f(T) ¸ f(S [ T) + f(S Å T) [n]={1,…,n}; Function f : 2 [n] ! R submodular if

Submodular valuations Concave Functions Let h : R ! R be concave. For each S µ [n], let f(S) = h(|S|) Vector Spaces Let V={v 1, ,v n }, each v i 2 F n. For each S µ [n], let f(S) = rank({ v i : i 2 S}) E.g., x S T x + + Large improvement For T µ S, x  S, f(T [ {x}) – f(T) ¸ f(S [ {x}) – f(S) Small improvement Decreasing marginal return:

S 1,…, S k Labeled Examples Passive Supervised Learning Learning Algorithm Expert / Oracle Data Source Alg. outputs Distribution D on 2 [n] f : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) g : 2 [n] ! R +

S 1,…, S k PMAC model for learning real valued functions Distribution D on 2 [n] Labeled Examples Learning Algorithm Expert / Oracle Data Source Alg.outputs f : 2 [n] ! R + g : 2 [n] ! R + (S 1,f(S 1 )),…, (S k,f(S k )) Alg. sees (S 1,f(S 1 )),…, (S k,f(S k )), S i i.i.d. from D, produces g Probably Mostly Approximately Correct With probability ¸ 1- ±, we have Pr S [ g(S) · f(S) · ® g(S) ] ¸ 1- ² PAC Boolean {0,1}

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

Computing Linear Separators + – – – – – + – + + – – – Given {+,–}-labeled points in R n, find a hyperplane c T x = b that separates the +s and –s. Easily solved by linear programming.

Learning Linear Separators + – – – – – + – + + – – – Given random sample of {+,–}-labeled points in R n, find a hyperplane c T x = b that separates most of the +s and –s. Classic machine learning problem. Error!

Learning Linear Separators + – – – – – + – + + – – – Classic Theorem: [Vapnik-Chervonenkis 1971?] O( n/ ² 2 ) samples suffice to get error ². Error! ~

Submodular Functions are Approximately Linear Let f be non-negative, monotone and submodular Claim: f can be approximated to within factor n by a linear function g. Proof Sketch: Let g(S) = § s 2 S f({s}). Then f(S) · g(S) · n ¢ f(S). Submodularity: f(S)+f(T) ¸ f(S Å T)+f(S [ T) 8 S,T µ V Monotonicity: f(S) · f(T) 8 S µ T Non-negativity: f(S) ¸ 0 8 S µ V

V Submodular Functions are Approximately Linear f n¢fn¢f g

V f n¢fn¢f Randomly sample {S 1,…,S k } from distribution Create + for f(S i ) and – for n ¢ f(S i ) Now just learn a linear separator! – – – – – – – – – – – – – – g

V f n¢fn¢f Theorem: g approximates f to within a factor n on a 1- ² fraction of the distribution. g

V f2f2 n¢f2n¢f2 Can improve to O(n 1/2 ): in fact f 2 and n ¢ f 2 are separated by a linear function [Goemans et al. ‘09] John’s Ellipsoid theorem: any centrally symmetric convex body is approximated by an ellipsoid to within factor n 1/2 g

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

f(S) = min{ |S|, k } f(S) = |S|(if |S| · k) k(otherwise) ; V

; V f(S) = |S|(if |S| · k) k-1(if S=A) k(otherwise) A

; V f(S) = |S|(if |S| · k) k-1(if S 2 A ) k(otherwise) A1A1 A2A2 A3A3 AkAk A = {A 1, ,A m }, |A i |=k Claim: f is submodular if |A i Å A j | · k-2 8 i  j

; V f(S) = |S| (if |S| · k) k-1 (if S 2 A and wasn’t deleted) k (otherwise) A1A1 A3A3 Delete half of the bumps at random. Then f is very unconcentrated on A ) any algorithm to learn f has additive error 1 If algorithm sees only these examples Then f can’t be predicted here A2A2 AkAk

; V A1A1 A3A3 Can we force a bigger error with bigger bumps? Yes, if A i ’s are very “far apart”. This can be achieved by picking them randomly. AkAk A2A2

Plan: Choose two values High=n 1/3 and Low=O(log 2 n). Choose random sets A 1,…,A m µ [n], with |A i |=High and m = n log n. D is the uniform distribution on {A 1,…,A m }. Create a function f : 2 [n] ! R. For each i, randomly set f(A i )=High or f(A i )=Low. Extend f to a monotone, submodular function on 2 [n]. There is a distribution D and a randomly chosen function f s.t. f is monotone, submodular Knowing the value of f on poly(n) random samples from D does not suffice to predict the value of f on future samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~

Creating the function f We choose f to be a matroid rank function – Such functions have a rich combinatorial structure, and are always submodular The randomly chosen A i ’s form an expander: The expansion property can be leveraged to ensure f(A i )=High or f(A i )=Low as desired. where H = { j : f(A j ) = High }

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso-Crawford ‘82, Gul-Stacchetti ‘99, Milgrom ‘00, …] Intuitively, increasing the prices for some items does not decrease demand for the other items. Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation?

Gross Substitutes Functions Class of utility functions commonly used in mechanism design [Kelso, Crawford, Gul, Stacchetti, …] Question: [Blumrosen-Nisan, Bing-Lehman-Milgrom] Do GS functions have a concise representation? Fact: Every matroid rank function is GS. Corollary: The answer to the question is no. There is a distribution D and a randomly chosen function f s.t. f is a matroid rank function poly(n) bits of information do not suffice to predict the value of f on samples from D, even to within a factor o(n 1/3 ). Theorem: (Main lower bound construction) ~

Learning submodular functions Monotone, submodular functions can be PMAC-learned (w.r.t. an arbitrary distribution) with approximation factor ® =O(n 1/2 ). Monotone, submodular functions cannot be PMAC-learned with approximation factor õ(n 1/3 ). Theorem: (Our general lower bound) Theorem: (Our general upper bound) Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) Corollary: Gross substitutes functions do not have a concise, approximate representation.

Learning submodular functions Hypotheses: – Pr X » D [ X=x ] =  i Pr[ X i = x i ] (“Product distribution”) – f ( {i} ) 2 [0,1] for all i 2 [n] (“Lipschitz function”) – f ( {i} ) 2 {0,1} for all i 2 [n] Stronger condition! Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions)

; V Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).

Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Algorithm: Let ¹ = § i =1 f(x i ) / m Let g be the constant function with value ¹ This achieves approximation factor O(log 2 (1/ ² )) on a 1- ² fraction of points, with high probability. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k). Lipschitz, monotone submodular funtions can be PMAC-learned under a product distribution with approximation factor O(1). Theorem: (Product distributions) m

Technical Theorem: For any ² >0, there exists a concave function h : [0,n] ! R s.t. for every k 2 [n], and for a 1- ² fraction of S µ V with |S|=k, we have: In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k. Concentration Lemma: Let X have a product distribution. For any ® 2 [0,1], Proof: Based on Talagrand’s concentration inequality. h(k) · f(S) · O(log 2 (1/ ² )) ¢ h(k).

Follow-up work Subadditive & XOS functions [Badanidiyuru et al., Balcan et al.] – O(n 1/2 ) approximation –  (n 1/2 ) inapproximability Symmetric submodular functions [Balcan et al.] – O(n 1/2 ) approximation –  (n 1/3 ) inapproximability

Conclusions Learning-theoretic view of submodular fns Structural properties: – Very “bumpy” under arbitrary distributions – Very “smooth” under product distributions Learnability in PMAC model: – O(n 1/2 ) approximation algorithm –  (n 1/3 ) inapproximability – O(1) approx for Lipschitz fns & product distrs No concise representation for gross substitutes