Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013.

Similar presentations


Presentation on theme: "Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013."— Presentation transcript:

1 Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013

2 $100 $130 maintenance cost transportation cost $10 $20 $50 $30 + minimize

3 BALINSKI, M. L.1966. On finding integer solutions to linear programs. In Proceedings of the IBM Scientific Computing Symposium on Combinatorial Problems. IBM, New York, pp. 225–248. KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for locating warehouses. STOLLSTEIMER, J. F.1961. The effect of technical change and output expansion on the optimum number, size and location of pear marketing facilities in a California pear producing region. Ph.D. thesis, Univ. California at Berkeley, Berkeley, Calif. STOLLSTEIMER, J. F.1963. A working model for plant numbers and locations. J. Farm Econom. 45, 631– 645. Facility Location Problem

4 Uncapacitated Facility Location (UFL) facility cost connection cost + F : potential facility locations C : set of clients f i, i  F : cost for opening i d : metric over F  C find S  F, minimize facilities clients $30 $100 $20 $100

5 Wal-mart Stores in New Jersey Question : Suppose you have budget for 50 stores, how will you select 50 locations?

6 k -median facilities clients + F : potential facility locations C : set of clients d : metric over F  C find S  F, minimize f i, i  F : cost for opening i k : number of facilities to open | S |= k

7 k -median clustering

8 Known Results: UFL O(log n)-approximation [Hoc82] constant approximations 3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01] 1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11] 1.463-hardness of approx. [GK98]

9 4Deterministic rounding of linear programs 4.5 The uncapacitated facility location problem 5Random sampling and randomized rounding of linear programs 5.8 The uncapacitated facility location problem 7The primal-dual method 7.6 The uncapacitated facility location problem 9Further uses of greedy and local search algorithms 9.1 A local search algorithm for the uncapacitated facility location problem 9.4 A greedy algorithm for the uncapacitated facility location problem 12 Further uses of random sampling and randomized rounding of linear programmings 12.1 The uncapacitated facility location problem

10 Know results : k -median  pseudo-approximation  1-approx with O(k log n) facilities [Hoc82]  2(1+ε)-approx. with (1+1/ε)k facilities[LV92]  super-constant approximation  O(log n loglog n) [Bar96,Bar98]  O(log k loglog k) [CCGS98]

11 Known Results: k -median  constant approximation LP rounding Primal-Dual Local Search 6.667 [CGTS99]6 [JV99] 4 [CG99]4 [JMS03]3.25 [CL12] 3+ε [AGK + 01] 1+√3+ε [LS13]  (1+2/e)-hardness of approximation [JMS03]

12 Lloyd Algorithm[Lloyd82]  k-means clustering : min total squared distances  k-means vs k-median clustering: k-means is more often used Walmart example: k-median is more appropriate approximation: k-median is “easier”

13 Local Search  Can we improve the solution by p swaps?  No : stop  Yes : swap and repeat  Approximation :  k-median : 3+2/p [AGK + 01]  k-means : (3+2/p) 2 [KMN + 02]

14 LP for k -median y i : whether to open i x i,j : whether connect j to i open at most k facilities client j must be connected client j can only connected to an open facility integrality gap is at least 2 integrality gap is at most 3 (proof non-constructive)

15 (1+√3+ε)-approximation on k-median

16 k -median and UFL  f = cost of a facility  f #open facilities Given a black-box α-approximation A for UFL Naïve try : find an f such that A opens k facilities α-approxition for k-median? Proof : α ≈ 1.488 for UFL, α > 1.736 for k-median

17 k -median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need LMP α-approximation for UFL α- approximation: LMP α-approximation LMP = Lagragean Multiplier Preserving

18 k -median and UFL S 1 : set of k 1 < k facilities S 2 : set of k 2 > k facilities bi-point solution Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need LMP α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities

19 k -median and UFL 2 issues with naïve try : 1. need LMP α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities LMP approx. factor bi-point  integral final ratio for k-median [JV] [JMS] 3 x 2 6 2 4 our result 2 do not know how to improve this factor of 2 is tight !!

20 bi-point solution k 1 = | S 1 | < k ≤ | S 2 | = k 2 a, b : ak 1 + bk 2 = k, a + b = 1 bi-point solution : a S 1 +b S 2 cost(a S 1 +b S 2 ) = a cost( S 1 ) + b cost( S 2 ) S1S1 S2S2

21 gap-2 instance 1 0 k + 1 cost of integral solution = 2 k 1 = 1, k 2 = k+1 cost ( S 1 ) = k+1, cost ( S 2 ) = 0 S1S1 S2S2

22 k -median and UFL Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV][JMS]our result LMP approx. factor 322 bi-point  integral x 2 final ratio for k-median 64 this factor of 2 is tight !! bi-point  pseudo-integral Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

23 Main Lemma 1 with k+1 open facilities, cost = 0 with k open facilities, cost huge A : black-box α-approximation with k+c open facilities A ' : (α+ε)-approximation with k open facilities A ' calls A n O(c/ε) times. bad instance:

24 Dense Facility B i : set of clients in a small ball around i i is A-dense, if connection cost of B i in OPT is ≥ A i BiBi this instance : i is A-dense for A ≈ opt

25 Dense Facility BiBi Reduction component works directly if there are no opt/t-dense facilities, t = O(c/ε) can reduce to such an instance in n O(t) time i

26 [Awasthi-Blum-Sheffet] : ε, δ >0 constants, OPT k-1 ≥ (1+δ)OPT k  can find (1+ε)-approximation Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities  k-median clustering is easy in practice  reason : there is a “meaningful” clustering Lemma 1 from [ABS]

27 Algorithm  Apply A to (k-c, F, C, d)  solution with k facilities of cost ≤ αOPT k-c  Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1  Output the best of the c+1 solutions Proof  If OPT k-c ≤ (1+ε)OPT k, then done.  otherwise, consider the smallest i s.t. OPT k-i-1 ≥ (1+ε) 1/c OPT k-i  [ABS] on (k-i, F, C, d)  solution of cost (1+ε)OPT k-i ≤ (1+ε) 2 OPT k [ABS] OPT k-1 ≥ (1+δ)OPT k  (1+ε)-approximation A : α-approximation algorithm for k-median with k+c medians

28 Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV] bi-point solution of cost C  solution of cost 2C  based on improving [JV] algorithm

29 S1S1 S2S2 given : bi-point solution a S 1 +b S 2 select S’ 2  S 2, | S’ 2 | = | S 1 | = k 1 with prob. a, open S 1 with prob. b, open S’ 2 randomly open k-k 1 facilities in S 2 \ S’ 2 i JV algorithm τ i = nearest facility of i guarantee : either i is open, or τ i is open

30 Analysis of JV algorithm i1i1 i2i2 i3i3 ≤ d 1 + d 2 If i 2 is open, connect j to i 2 Otherwise, if i 1 is open, connect j to i 1 Otherwise connect j to i 3 E[cost of j] ≤ × [cost of j in a S 1 +b S 2 ] d1d1 d2d2 j i 1  S 1, i 3  S’ 2 either i 1 or i 3 is open 2

31 Our Algorithm on average, d 1 >> d 2 d(j, i 3 ) ≤ i1i1 i2i2 i3i3 d1d1 d2d2 ≤ d 1 + d 2 j i3i3 If i 2 is open, connect j to i 2 Otherwise, if i 1 is open, connect j to i 1 Otherwise connect j to i 3 E[cost of j] ≤ × [cost of j in a S 1 +b S 2 ] 2 d 1 +2 d 2 2d1+d22d1+d2

32 Our Algorithm for a star, either the center is open, or all leaves are open idea : big stars: always open the center, open each leaf with prob. ≈b group small stars of the same size, dependent rounding for each group, open 3 more facilities than expected first try open each star independently? with prob. a, open the center, with prob. b, open the leaves problem : can not bound the number of open facilities need to guarantee : either i is open, or τ i is open i τiτi

33 small stars small star : star of size ≤ 2/(abε ) M h : set of stars of size h, m = |M h | Roughly, for am stars, open the center for bm stars, open the leaves More accurately, permute the stars and the facilities open top centers open bottom leaves

34 big stars size h > 2/(abε ) always open the center randomly open leaves ≈ bh for big star

35 Lemma : we open at most k + 6/(abε) facilities. for a big star of size h, FRAC : a+bh ALG : for a group of m small stars of size h FRAC : m(a+bh) ALG : there are at most 2/(abε) groups

36 Summary Main Lemma 2 : bi-point solution of cost C  solution of cost with k+O(1/ε) facilities [JV][JMS]our result LMP approx. factor 322 x 2 final ratio for k-median 64 bi-point  pseudo-integral Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities

37 Open Problems gap between integral solution with k+1 open facilities and LP value(with k open facilities)? tight analysis? algorithm works for k-means?


Download ppt "Improved approximation for k-median Shi Li Department of Computer Science Princeton University Princeton, NJ, 08540 04/20/2013."

Similar presentations


Ads by Google