# A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.

## Presentation on theme: "A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia."— Presentation transcript:

A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia

Set Cover We are given an (unweighted!) collection F of subsets of U={1,…,n}. We must find as few as possible subsets from F such that their union covers U. Hypergraph vertex cover (lecture of 10/3) is a special case of set cover.

Approximation algorithms A greedy algorithm:  Recall algorithms 1…  At each stage, add to the cover the subset which maximizes the number of new elements.  This algorithm gives an approximation of ln(n)- lnln(n)+O(1). An approximation of ln(n) can also be achieved using a linear programming relaxation.

A bad example for the greedy algorithm U={(x,y): 1 ≤ x,y ≤ n} S 1 = {(x,y): x ≤ n/2} S 2 = {(x,y): x ≥ n/2} T 1 = {(x,y): 1 ≤ y ≤ n/2} T 2 = {(x,y): n/2 ≤ y ≤ 3n/4} … 32 16 8 8 Ratio: logn/2

Overview (1) We wish to prove Theorem 8: If a poly time algorithm can approximate set cover within (1- ε)ln(n), then NP is a subset of DTIME(n O(loglog(n)) ). The strategy: a reduction from a k-prover proof system for an NP-complete problem. Familiarwith IP?

Overview (2) Parts of the lecture:  MAX 3SAT-5.  The k-prover proof system.  Partition systems.  The reduction to set cover.  Thundering applause. May the force be with us. Similar to the lecture of 17/3.

MAX 3SAT-5 Input: A CNF formula with n variables and 5n/3 clauses, in which every clause contains exactly three literals, every variable appears in exactly 5 clauses, and a variable does not appear in a clause more than once. Theorem 1: For some  >0, it is NP-hard to distinguish between 3CNF-5 formulas where OPT=1 or OPT≤(1-  ). Reduction from MAX- 3SAT-B

Two prover proof system for 3SAT- 5 Presented as to provide intuition and help in the analysis of the k-prover system. One round, two prover system for 3SAT-5. Protocol:  V selects an index of a clause, sends it to the first prover, selects a random var in the clause, and sends it to the second prover.  First prover returns 3 bits, second returns 1 bit.  V accepts if the following conditions hold: Clause check: the assignment sent by the first prover satisfies the clause. Consistency check: the assignment sent by the second prover is identical to the assignment for the same variable sent by the first prover. Assgn. To clause and var

Proposition 2: If OPT=1- ε, then under the optimal strategy of the provers, V accepts with probability (1-ε/3). Proof:  The strategy of the second prover defines an assignment A to the variables.  If V selects a clause that is not satisfied by A (prob. ε), the first prover must set one of the variables differently from A.  The consistency check fails with prob. ≥ 1/3. ■ There is a strategy with acceptance prob ε/3

Parallel repetition We would like to lower the error. Modified proof system: V sends to the first prover l clauses, from each chooses a var, and sends these l vars to the second prover. Theorem 3 (Ran Raz): If a one round two prover system is repeated l times independently in parallel, then the error is 2 -cl, where c>0 is a constant that depends only on the original proof system. The error of the modified two prover system is ≤ 2 -cl, for some universal constant c. C for subtle

The k-prover proof system Binary code with k codewords, each of length l and weight l/2, with Hamming distance at least l/3. Each prover is associated with a codeword. The Protocol:  The verifier selects l clauses C 1,…,C l, then selects a var from each clause: the distinguished variables x 1,…,x l.  Prover P i receives C j for those coordinates in its codeword that are 1, and x j for the coordinates that are 0, and replies with 2l bits.  The answer of the prover induces an assignment to the distinguished variables.  Acceptance predicate: Weak: at least one pair of provers is consistent. Strong: every pair of provers is consistent. P 1 : 0011P 2 : 0101P 3 : 1100 V c 1 c 2 v 3 v 4 100,010,1,0

Lemma 4: If OPT=1 then the provers have a strategy that causes V to always strongly accept. If OPT≤(1- ε), then the verifier weakly accepts with probability at most k 2 2 -cl, where c>0 is a constant that depends only on ε. Proof:  If OPT=1, the provers can base their answers on a satisfying assignment.  Assume OPT=(1- ε), and that V weakly accepts with prob. ≥ δ. Then with respect to P i and P j, V accepts with probability δ/k 2.  There are ≥ l/6 coordinates on which P i receives a clause, and P j a var in this clause. Other coordinates: for free.   The provers have a strategy that succeeds with prob. ≥ δ/k 2 on l/6 parallel repetitions of the original proof system.   δ/k 2 <2 -cl. ■

Partition systems A partition system B(m,L,k,d) has the following properties. 1. There exists a ground set B of m distinct points. 2. There is a collection of L distinct partitions p 1,…,p L. 3. For 1≤i≤L, partition p i is a collection of k disjoint subsets of B whose union is B. 4. Any cover of the m points by subsets that appear in pairwise different partitions requires at least d subsets. Lemma 6: For every c≥0 and m sufficiently large, there is a partition system B(m,L,k,d) whose parameters satisfy the following inequalities: 1. L ≈ (log(m)) c. 2. K can be chosen arbitrarily as long as k < ln(m)/3ln(ln(m)). 3. d = (1-f(k))k∙ln(m), where f(k)→∞ as k→∞. Proof: a randomized construction usually works. Looks familiar? L=4 k = 4

The reduction R=(5n) l : num of r. r↔B r (m,L,k,d): m=n Θ(l), L=2 l, d=(1-f(k))k∙ln(m). Partition↔dist. vars, subset↔prover. B(r,j,i) = i’th subset, partition j, partition system r. Subsets are S(q,a,i): all r s.t. (q,i)@r, extract from a an assignment a r to dist. vars. S(q,a,i) is the union of all subsets B(r,a r,i), for all r s.t. (q,i)@r. Q=n l/2 (5n/3) l/2 (questions to P i ). r=1 r=2 r=R L=2 l m points, k subsets 00 01 10 11 B(2,4,3) S(q,a,1) |U|?

Lemma 5: 1. Completeness: OPT=1  there is a set cover of size kQ. 2. Soundness: OPT=(1- ε)  more than (1- 2f(k))kQln(m) subsets are required. Proof of Completeness:  If OPT=1, the provers answer consistently with the satisfying assignment.  For any r, consider S(q 1,a 1,r),…,S(q k,a k,r) s.t. (q i,i)@r, and a i is the appropriate answer.  B r (m,L,k,d) is covered by these k sets.  Similar for every r. The number of subsets is kQ. ■ a r =01

Proof of soundness: the saga begins Assume OPT≤1-ε, and that there exists C that covers U, such that |C|=(1-δ)kQln(m), δ=2f(k). q to prover P i ↔ weight w(q,i)=number of answers a s.t. S(q,a,i) is in C. Σ q,i w(q,i) = |C|. r↔w(r)= Σ (q,i)@r w(q,i). This weight = number of subsets that participate in covering B r (m,L,k,d). Call r good if w(r)<(1-δ/2)k∙ln(m).

The good, the bad, and the random Proposition 6: The fraction of good r is at least δ/2. Proof:  Assume otherwise, then:  On the other hand,  Hence |C|>(1- δ)kQln(m) – contradiction. ■ r is good if: w(r)= Σ (q,i)@r w(q,i)<(1-δ/2)k∙ln(m)

Proposition 7: C covers U, |C|=(1-δ)kQln(m)  for some strategy for the k provers, V weakly accepts with prob. ≥ 2δ/(k∙ln(m)) 2. This proves soundness, since 2δ/(k∙ln(m)) 2 >k 2 2 -cl, for l=Θ(loglogn). Proof of proposition 7:  Randomized strategy: on q to P i, select a from the set of a s.t. S(q,a,i) is in C.  For a fixed r: sets B(r,p,i) in the cover of B r (m,L,k,d) ↔ sets S(q,a,i) in C.  For a good r: C used two subsets from p in the cover of B r (m,L,k,d).  Denote B(r,p,i) and B(r,p,j) ↔ S(q i,a i,i) and S(q j,a j,j).  Denote: a is in A r,i iff S(q i,a,i) is in C.  r is good  |A r,i |+|A r,j |<k∙ln(m).  The prob. that the provers answer a i and a j ≥ 4/(k∙ln(m)) 2. Answers are consistent with p  V weakly accepts.  The prob. that V chooses a good r ≥ δ/2. ■ Fix coin tosses for provers

Theorem 8: If there is some ε>0 such that a polynomial time algorithm can approximate set cover within (1-ε)ln(n), then NP is a subset of DTIME(n O(loglog(n)) ). Proof:  Assume there is a poly time algorithm A that approximates set cover within (1-ε)ln(m).  Reduce GAP-3SAT-5 to set cover as described, with k s.t. f(k)<ε/4, and m=(5n) 2l/ε.  m,R,Q=n O(loglogn)  time to perform the reduction is n O(loglogn).  ln(m)>(1-ε/2)ln(N), where N=mR.  By lemma 5, for a YES instance all points can be covered by kQ subsets, and for a NO instance all points cannot be covered by (1-2f(k))kQln(m).  The ratio is (1-2f(k))ln(m)>(1-ε)lnN. ■

Closing remarks (1) Max k-cover:  Given U and F as before, select k subsets that such that their union has maximum cardinality.  The obvious greedy algorithm approximates max k- cover within a ratio of at least 1-1/e ≈ 0.632.  Theorem 8 can be directly used to prove: if max k- cover can be approximated in a polynomial time within a ration of (1 - 1/e + ε) for some ε>0, then NP is a subset of DTIME(n O(loglogn) ).

Closing remarks (2) Refinements:  In our hardness of approximation result for set cover, ε is a constant. We may strengthen our assumption so as to improve our result.  ZTIME=class of languages that have a probabilistic algorithm that runs in expected time t (with zero error).  If for some η>0, NP is not a subset of ZTIME(2 n^η), then for some constant c’>0 there is no polynomial time algorithm that approximates set cover within ln(n) –c’(lnln(n)) 2.

Questions and cool animations ?

Download ppt "A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia."

Similar presentations