Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT.

Similar presentations


Presentation on theme: "1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT."— Presentation transcript:

1 1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT

2 2 Computational Model Single * pass over the data: e 1, e 2, …,e n Bounded storage Fast processing time per element * For the purpose of this talk

3 3 Streaming Data Types Vector problems: Stream defines an array of numbers Maintain stats of the array, e.g., median Metric problems Clustering Graph problems, Text problems Geometric Problems [this talk]

4 4 Geometric Data Stream Algorithms as Data Structures Data structures that support: Insert(p) to P Possibly: Delete(p) from P Compute(P) Use space that is sub-linear in |P|

5 5 Insertions-only

6 6 Clustering in Geometric Spaces Problems: k-center [Charikar-Chekuri-Feder-Motwani’97] k-median [Guha-Mishra-Motwani-O’Callaghan’00, Meyerson’01, Charikar-O’Callaghan-Panigrahy’03] Bounds: poly(k,log n) space O(1)-approximation

7 7 k-median/k-center k is given Goal: choose k medians/centers to minimize: k-median: the sum of the distances k-center: the max distance

8 8 Geometric Space Bounds: poly(k,log n) space (1+  )-approximation Problems: Diameter, Minimum Enclosing Ball [Agarwal-Har-Peled’01, Feigenbaum- Kannan-Zhang’02, Cormode-Muthukrishnan’02, Hershberger-Suri’04] k-center [Agarwal-HarPeled’01, Agarwal-HarPeled-Varadarajan’04] k-median [HarPeled-Mazumdar’04] Range searching via  -approximations: [Suri-Toth-Zhou’04] [Bagchi-Chaudhary-Eppstein-Goodrich’04] …

9 9 Dominant Approach: Merge and Reduce Main ideas: Design an (off-line) algorithm that converts the input into a “sketch”: Small size Sufficient to solve the problem A sketch of sketches is a sketch Partition the input in a tree-like fashion Simulate tree computation in small space Technique can traced back to ancient times i.e., 80’s [Munro-Paterson’80]

10 10 Tree Computation p1p1 p2p2 p3p3 p4p4 p5p5 p7p7 p6p6 p8p8 p9p9 p 10 p 11 p 12 p 13 p 15 p 14 p 16

11 11 Analysis Space: (sketch size)*log n Time: sketch computation time Question: Where do sketches come from ?

12 12 Idea I: solution=sketch Consider k-median [GMMO’00] : approximate k- median of approximate weighted k-medians is an approximate k-median Result: Constant depth tree Space: kn ,  >0 O(1) -approximation Works for any metric space       k=3

13 13 Use the solution, ctd.  -Approximations: find a subset S  P, such that for any rectangle/halfspace/etc R, |R  S|/|S| = |R  P|/|P|  [Matousek] : approximation of a union of approximations is an approximation [BCEG’04] : convert it into streaming algorithm, applications  1/  2 space [STZ’04] : better/optimal bounds for rectangles and halfspaces

14 14 Idea 2: Core-Sets [AHP’01] Assume we want to minimize C P (o) S  P is an  -core-set for P, if for any o, and a set T: C P  T (o) = (1 ±  ) C S  T (o) Note: this must hold for all o, not just the optimal one o P

15 15 Example: Core-set for MEB Compute extremal points: Choose “densely” spaced direction v 1 …v k I.e., for any u there is v i such that u*v i ≥ ||u|| 2 / (1+  ) For each direction maintain extremal point k=O(1/  ) (d-1)/2 suffice

16 16 Stream Algorithms via Core-sets Diameter/MEB/width: O(1/  ) (d-1)/2 log n space [AHP ’ 01] k-center: O(k/  d ) log n [HP ’ 01] k-median: O(k/  d ) log n [HPM ’ 04] O(k 2 /  d ) [HPK ’ 05] O(k 2 d log 6 n/  ) [Chen ’ 05] O(d 3 /  7 ), k=1 [Indyk ’ 05] Faster algorithms and other results

17 17 Limitations Small core-sets might not exist Do not support deletions

18 18 Insertions and Deletions

19 19 Insertions and Deletions Technique: Reduction of geometric problems to vector problems Use of randomized linear embeddings Problems: Maintaining histograms of the data Classic geometric problems (matching, MST, clustering etc)

20 20 Streaming Algorithms for Vector Problems Norm estimation: Stream elements: (i,b), i=1…m Interpretation: x i =x i +b Want to maintain ||x|| p Why ? Examples: ||x|| p p =Σ i x i p = #non-zero coordinates in x, as p  0 … How ?

21 21 Dimensionality reduction x is an m-dimensional vector A is a “random” m times k matrix, k “small” Store Ax Recover (1±  )||x|| 2 from Ax (with prob. 1-1/N ) [Alon-Matias-Szegedy’96] Estimator: median[ (A 1 x) 2 +..+ (A c x) 2, (A c+1 x) 2 +..+ (A 2c x) 2,..] 1/2, c=1/  2, k=c log N A: constructed from 4-wise independent random variables [Johnson-Lindenstrauss’85] Estimator: ||Ax|| 2 A: each entry independently drawn from e.g. Gaussian distribution constructed using Nisan’s PRG [Indyk’00] [Indyk’00] Estimator: median[ (A 1 x),…, (A k x) ] A: as above Works for ||x|| p any p  (0,2] (using p-stable distributions)

22 22 What it means To know ||x|| 2, suffices to know Ax Can maintain Ax when the coordinates are incremented: A(x+ be i )=Ax+ bA e i Ax Ax

23 23 Applications of Vector Approach Histograms/wavelet approximation Classic geometric problems (matching, MST, clustering etc)

24 24 Histograms View x as a function x:[1…n]  [1…M] Approximate it using piecewise constant function h, with B pieces (buckets) Problem can be formulated in 2D as well (buckets become rectangular tiles)

25 25 Results: 1D [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss’02] : Maintains h with B pieces such that ||x-h|| 2 ≤ (1+  )||x-h OPT || 2 Under increments/decrements of x Space: poly(B,1/ ,log n) Time: poly(B,1/ ,log n)

26 26 Results: 2D [Thaper-Guha-Indyk-Koudas’02] : Maintains h with B log (nM) tiles such that ||x-h|| 2 ≤ (1+  )||x-h OPT || 2 Under increments/decrements of x Space/Update time: poly(B,1/ ,log n) Histogram reconstruction time: poly(B,1/ , n) [Muthukrishnan-Strauss’03] : Maintains h with 4B tiles Time: poly(B,1/ , log(nM))

27 27 Minimum Weight Bi-chromatic Matching Estimate the cost of MWBM

28 28 Minimum Weight Matching Estimate the cost of MWM

29 29 Minimum Spanning Tree Estimate the cost of MST

30 30 Facility Location Goal: choose a set F of facilities to minimize the sum of the distances to nearest facility plus the number of facilities times f Again, report the cost

31 31 Approach [Indyk’04] Assume P  {1…  } 2 Reduce to vector problems Impose square grids G 0 …G k, with side lengths 2 0,2 1, …, 2 k, shifted at random. For each square cell c in G i, let n P (c) be the number of points from P in c. The algorithms will maintain certain statistics over n P (.), which will allow it to approximately solve the problems 2 1 1 1 3 1 1 5

32 32 Estimators MST:∑ i 2 i ∑ c  G i [n P (c)>0] MWM:∑ i 2 i ∑ c  G i [n P (c) is odd] MWBM:∑ i 2 i ∑ c  G i |n G (c)-n B (c)| Fac. Loc.:∑ i 2 i ∑ c  G i min[n P (c), T i ] K-median:∑ i 2 i ∑ c  G i - B(Q, 2^i) n P (c) (given medians Q) Maintain #non-zero entries in n P [FM’85] Maintain L 1 difference [I’00]

33 33 Results ProblemAppr. MST log  → 1+  MWM log  MWBM log  Fac.Loc. log 2  K-median XYZ → 1+  Space: (log  +log n + K ) O(1) [Frahling-Indyk-Sohler’05] [Frahling-Sohler’05] […, Charikar’02, …]

34 34 XYZ Space: (K+log +  log n) O(1) Computation TimeApproximation  O(k) poly(log n+1/  )1+   2 poly(log n+log +k) O(1) poly(log n+log  +k)[ 1+ , log n log  ]

35 35 Probabilistic embeddings into HST’s 2 1 1 1 3 1 1 Known [Bartal’96, Charikar-Chekuri-Goel-Guha-Plotkin’98]: ||p-q|| ≤ D tree (p,q) E[ D tree (p,q) ] ≤ ||p-q|| * O(log  ) T 5

36 36 MST E[Cost(MST in T)] ≤ O(log  ) Cost(MST) Cost(MST in T)  Cost(T) How to compute Cost(T) ? Sum over all levels i, of the #nodes at i, times 2 i Node c exists iff n i (c)>0 2 1 1 1 3 1 1 5

37 37 Matching Algorithm: Match what you can at the current level Odd leftovers wait for the next level Repeat Optimal on the HST Cost=∑ i 2 i ∑ c  Gi [n P (c) is odd] 0 1 1 1 1 0 0 1 1

38 38 Conclusions Algorithms for geometric data streams Insertions-only: merge and reduce, coresets Insertions and deletions: randomized linear embeddings

39 39 Open Problems Matchings, Facility Location, etc: Replace log  by O(1) or even 1+  Possible for: MST [Frahling-Indyk-Sohler’05] k-median [Frahling-Sohler’05] Related to computing bi-chromatic matching [Agarwal-Varadarajan’04] Min-sum clustering ?

40 40 Open Problems High dimensions: Diameter: 2 1/2 -approx, O(d 2 n 1/2 ) space, follows from [Goel-Indyk-Varadarajan’01] c-approx, O( dn 1/(c 2 - 1) ) [Indyk’03] Conjecture:  2 1/2 -approx, O(d polylog n) space Min-width cylinder: 18-approx, O(d) space [Chan’04]

41 41 Open Problems Range queries: General lower bounds ? (Not just for  - approximations)  (1/  2 ) -bit bound for general queries follows from LB for dot product [Indyk-Woodruff’03] and is tight (for randomized algorithms) What about e.g., half-space queries ? O(1/  4/3 ) is known [STZ’04] Other problems [STZ’04]


Download ppt "1 Approximations and Streaming Algorithms for Geometric Problems Piotr Indyk MIT."

Similar presentations


Ads by Google