Presentation is loading. Please wait.

Presentation is loading. Please wait.

Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York.

Similar presentations


Presentation on theme: "Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York."— Presentation transcript:

1 Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York

2 Outline Core-sets and their Applications Why submodular functions? A simple algorithm for monotone and non- monotone submodular functions. Improving approximation factor for monotone submodular functions.

3 Processing Big Data Input is too large to fit in one machine Extract and process a compact representation: – Sampling: focus only on a small subset of data – Sketching: compute a small summary of data, e.g. mean, variance, … – Mergeable Summaries: if multiple summaries can be merged while preserving accuracy: See [Ahn, Guha, McGregor PODS,SODA2012], [Agarwal et al. PODS2012]. – Composable core-sets [Indyk et al. 2014]

4 Composable Core-sets Partition input into several parts T 1, T 2, …, T m In each part, select a subset S i T i Take the union of selected sets:S=S 1 S 2 … S m Solve the problem on S Evaluation: We want set S to represent the original big input well, and preserve the optimum solution approximately.

5 Core-sets Applications Diversity and Coverage Maximization; Indyk, Mahabadi, Mahdian, Mirrokni [PODS 2014] k-means and k-median; Balcan et al. NIPS2013 Distributed Balanced Clustering; Bateni et al. NIPS 2014

6 Submodular Functions A non-negative set function f defined on subsets of a ground set N, i.e. f: 2 N R + {0} f is submodular iff for any two subsets A and B – f(A) + f(B) ≥ f(A B) + f(A B) Alternative definition: f is submodular iff for any two subsets A B, and element x: – f(A {x}) – f(A) ≥ f(B {x}) - f(B)

7 Submodular Maximization with Cardinality Constraints Given a cardinality constraint k, find a size k subset of N with maximum f value Machine Learning Applications: Exemplar based clustering, active, set selections, graph cuts, … [Mirzasoleiman, Karbasi, Sarkar, Krause NIPS 2013] Applications for search engines: coverage utility functions

8 Submodular Maximization via MapReduce Fast Greedy Algorithms in MapReduce and Streaming; Kumar et al. [SPAA 2013] They get 1/2 approximation factor with super constant number of rounds

9 Searched Keywords with Multiple Meanings

10 Formal Definition of Composable Core-sets Define, e.g. f k (N) is the value of optimum solution. Define ALG(T) to be the output of algorithm ALG on input set T. Suppose |ALG(T)| ≤ k. ALG is α-approximate composable core-set iff for any collection of sets T 1, T 2, …, T m we have

11 Applications of composable core-sets Distributed Approximation: – Distribute input between m machines, – ALG selects set S i = ALG(T i ) in machine 1 ≤ i ≤ m, – Gather the union of selected items, S 1 S 2 … S m, on a single machine, and select k elements. Streaming Models: Partition the sequence of elements, and simulate the above procedure. A class of nearest neighbor search problems

12 Bad News! Theorem[Indyk et al. 2013] There exists no better than approximate composable core-set for submodular maximization

13 Randomization comes to rescue Instead of working with worst case collection of sets T 1, T 2, …, T m, suppose we have a random partitioning of the input. We say ALG is α-approximate randomized composable core-set iff where the expectation is taken over the random choice of {T 1, T 2, …, T m }

14 General Framework Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run ALG in each machine Selected elements S1S1 S2S2 SmSm output set Run ALG’ to find the final size k output set

15 Good news! Theorem [Mirrokni, Z]: There exists a class of O(1)-approximate randomized composable core-sets for monotone and non-monotone submodular maximization. In particular, algorithm Greedy is 1/3- approximate randomized core-set for monotone f, and (1/3-1/3m)-approximate for non-monotone f.

16 Family of β-nice algorithms ALG is β-nice if for any set T and element x T \ ALG(T) we have: – ALG(T) = ALG(T\{x}) – Δ(x, ALG(T)) is at most βf(ALG(T))/k where Δ(x, A) is the marginal value of adding x to set A, i.e. Δ(x, A) = f(A {x})-f(A) Theorem: A β-nice algorithm is (1/(2+β))-approx randomized composable core-sets for monotone f and ((1-1/m)/(2+β))-approx for non-monotone.

17 Algorithm Greedy Given input set T, Greedy returns a size k output set S as follows: – Start with an empty set For k iterations, find an item x T with maximum marginal value to S, Δ(x, S), and add x to S. Remark: Greedy is a 1-nice algorithm. In the rest, we analyze algorithm Greedy for a monotone submodular function f.

18 Analysis Let OPT be the size k subset with maximum value of f. Let OPT’ be OPT (S 1 S 2 … S m ), and OPT’’ be OPT\OPT’ We prove that E[max{f(OPT’), f(S 1 ), f(S 2 ), …, f(S m )}] ≥ f(OPT)/3

19 Linearizing marginal contributions of elements in OPT Consider an arbitrary permutation π on elements of OPT For each x OPT, define OPT x to be elements of OPT that appear before x in π By definition of Δ values, we have: f(OPT) = x OPT Δ(x, OPT x )

20 Lower bounding f(OPT ’ ) f(OPT’) is x OPT’ Δ(x, OPT x OPT’) Using submodularity, we have: Δ(x, OPT x OPT’) ≥ Δ(x, OPT x ) Therefore: f(OPT ’ ) ≥ x OPT’ Δ(x, OPT x ) It suffices to upper bound x OPT’’ Δ(x, OPT x )

21 Proof Scheme OPT OPT’ ’ OPT’ Selected Not Selected ≥ x OPT’ Δ(x, OPT x ) Suffices to upper bound x OPT’’ Δ(x, OPT x ) f(OPT) = x OPT Δ(x, OPT x ) For each x in T i OPT’’ : Δ(x, S i ) ≤ f(S i )/k 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, S i ) ≤ max i {f(S i )} How large can Δ(x, OPT x ) - Δ(x, S i ) be? Goal: Lower bound max{f(OPT’), f(S 1 ), f(S 2 ), …, f(S m )}

22 Upper bounding Δ reductions Δ(x, OPT x ) - Δ(x, S i ) ≤ Δ(x, OPT x ) - Δ(x, OPT x S i ) x in OPT Δ(x, OPT x ) - Δ(x, OPT x S i ) = f(OPT) – (f(OPT S i ) – f(S i )) ≤ f(S i ) in worst case: 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, OPT x ) - Δ(x, S i ) ≤ 1 ≤ i ≤ m f(S i ) in expectation: 1 ≤ i ≤ m x in OPT’’ Ti Δ(x, OPT x ) - Δ(x, S i ) ≤ 1 ≤ i ≤ m f(S i )/m Conclusion: E[f(OPT’)] ≥ f(OPT) - max i {f(S i )} - Average i {f(S i )} Greedy is a 1/3-approximate randomized core-set

23 Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy in each machine Selected elements S1S1 S2S2 SmSm output set Run Greedy to find the final size k output set with value ≥ (1-1/e)f(OPT)/3 There exists a solution with f(OPT)/3 value. Take the maximum of max i {f(S i )} and Greedy(S 1 S 2 … S m ) to achieve 0.27 approximation factor

24 Improving Approximation Factors for Monotone Submodular Functions Hardness Result [Mirrokni, Z]: With output sizes (|S i |) ≤ k, Greedy, and locally optimum algorithms are not better than ½ approximate randomized core-sets. Can we increase the output sizes and get better results?

25 -approximate Randomized Core-set Positive Result [Mirrokni, Z]: If we increase the output sizes to be 4k, Greedy will be (2-√2)- o(1) ≥ 0.585-approximate randomized core-set for a monotone submodular function. Remark: In this result, we send each item to C random machines instead of one. As a result, the approximation factors are reduced by a O(ln(C)/C) term.

26 0.545 Distributed Approximation Factor If we run Greedy again at the second stage, the distributed approximation factor will be 0.585(1-1/e) < 0.37. Positive Distributed Result [Mirrokni, Z]: We propose a new algorithm PseudoGreedy for the with distributed approximation factor 0.545. We still use Greedy in the first stage to return 4k elements in each machine.

27 Improved Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy, and return 4k items in each machine Selected elements S1S1 S2S2 SmSm output set Run PseudoGreedy to find the final size k output set with value ≥ 0.545f(OPT) There exists a size k subset with value ≥ 0.585f(OPT)

28 Analysis: Greedy is -approximate Core-set OPT OPT1 := {x | x in OPT and x in Greedy(T 1 {x})} K 1 := |OPT1| OPT1 := {x | x in OPT and x in Greedy(T 1 {x})} K 1 := |OPT1| OPT2 := {x | x in OPT and x not in Greedy(T 1 {x})} K 2 := |OPT2| OPT2 := {x | x in OPT and x not in Greedy(T 1 {x})} K 2 := |OPT2| Assume that OPT1 OPT’ with an O(ln(C)/C) error term. Use OPT2 to lower bound marginal gains of selected items in the first machine, S 1.

29 Analysis (cont’d) Greedy starts with an empty set, and adds items y 1, y 2, …, y 4k to S 1 Z 1 = {y 1 } Z 2 = {y 1, y 2 } Z i = {y 1, y 2, …, y i } S 1 = Z 4k = {y 1, y 2, …, y 4k } Marginal gain of y i ≥ x in OPT2 Δ(x, Z i-1 )/K 2 ≥ x in OPT2 Δ(x, OPT x Z i-1 )/K 2 x in OPT2 Δ(x, OPT x Z i-1 ) x in OPT2 Δ(x, OPT x Z i ) x in OPT1 Δ(x, OPT x Z i-1 ) x in OPT1 Δ(x, OPT x Z i ) Adding y i reduces Δ values aiai bibi Linear Program Variables

30 Analysis (cont’d): Factor Revealing LP x in OPT2 Δ(x, OPT x Z i-1 ) x in OPT2 Δ(x, OPT x Z i ) x in OPT1 Δ(x, OPT x Z i-1 ) x in OPT1 Δ(x, OPT x Z i ) Define a i := minus Define b i := minus For each 1 ≤ i ≤ 4k Assume f(OPT) =1 α := x in OPT2 Δ(x, OPT x ) β := f k (S 1 OPT1) Assume f(OPT) =1 α := x in OPT2 Δ(x, OPT x ) β := f k (S 1 OPT1) Linear Program LP k,k2 : Minimize β -------------------------------------------------------- a i + b i ≥ (α - 1 ≤ i’ ≤ i a i’ )/K 2 1 ≤ i ≤ 4k 1 - α ≥ 1 ≤ i ≤ 4k b i β ≥ 1 – α + i in L a j set L {1, 2, …, 4k} with |L|=K 2 Linear Program LP k,k2 : Minimize β -------------------------------------------------------- a i + b i ≥ (α - 1 ≤ i’ ≤ i a i’ )/K 2 1 ≤ i ≤ 4k 1 - α ≥ 1 ≤ i ≤ 4k b i β ≥ 1 – α + i in L a j set L {1, 2, …, 4k} with |L|=K 2

31 Analysis: wrapping up LP k,k2 is lower bounded by β in the following mathematical program: 0 ≤ α, β, λ, r ≤ 1 -------------------------------------------- β ≥ 1 – α + (1 – e -r ) α + (1 - r) λ 1 – α ≥ (e -r α - λ) 2 /2λ 0 ≤ α, β, λ, r ≤ 1 -------------------------------------------- β ≥ 1 – α + (1 – e -r ) α + (1 - r) λ 1 – α ≥ (e -r α - λ) 2 /2λ Minimum occurs at: r = 0, λ = 1 -, α =, and β = 2 - Minimum occurs at: r = 0, λ = 1 -, α =, and β = 2 -

32 Improved Distributed Approximation Factor Input Set N Machine 1 Machine m Machine 2 T1T1 T2T2 TmTm Run Greedy, and return 4k items in each machine Selected elements S1S1 S2S2 SmSm output set Run PseudoGreedy to find the final size k output set with value ≥ 0.545f(OPT) There exists a size k subset with value ≥ 0.585f(OPT)

33 Algorithm PseudoGreedy Forall 1 ≤ K 2 ≤ k – Set K’ := K 2 /4 – Set K 1 := k – K 2 – Partition the first 8K’ items of S 1 into sets {A 1, …,A 8 } – For each L {1, …, 8} Let S’ be union of A i where i is in L Among selected items, insert K 1 + (4 - |L|)K’ items to S’ greedily – If (f(S’) > f(S)) then S := S’ Return S

34 Small Size Core-sets Each machine can only select k’ << k items For random partitions, both distributed and core-set approximations factors are For worst case, they are

35 Thank you! Questions?


Download ppt "Randomized Composable Core-sets for Submodular Maximization Morteza Zadimoghaddam and Vahab Mirrokni Google Research New York."

Similar presentations


Ads by Google