Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Navigating Nets2 A classical problem Fix a metric space (X,d): X = set of points. d = distance function over X. Near-neighbor search (NNS) [Minsky-Papert]: 1. Preprocess a given n-point subset S  X. 2. Given a query point q 2 X, quickly compute the closest point to q among S.

Navigating Nets3 Variations on NNS (1+  )-approximate nearest neighbor search: Find a 2 X such that d(q,a) · (1+  ) d(q,S). Dynamic case: Allow updates to S (insertions and deletions). Distributed case: No central index (e.g., nodes in a network). Other cost measures (e.g., communication, stretch, load).

Navigating Nets4 General metrics Only oracle access to distance function d( ¢, ¢ ). Models a complicated metric or on-demand measurement. No “hashing of coordinates” or tuning for a specific metric. Goal: efficient query (sublinear or polylog time). Impossible, even if the data set S is a path metric: 1 2n n-1 n n What about approximate NNS?

Navigating Nets5 Approximate NNS Hard even for (near) uniform metrics d(x,y) = 1 for all x,y 2 S. 1 1 1  But many data sets lack large uniform subsets. Can we quantify this?

Navigating Nets6 Abstract dimension The doubling constant X of a metric (X,d) is the minimum such that every ball can be covered by balls of half the radius. The metric is doubling if X = O(1). The (abstract) dimension is dim (X) = log 2 X. Immediate properties: dim A (R d, || · || 2 ) = O(d). dim A (X’)  dim A (X) for all X’  X. dim A (X)  log |X|. (Equality for a uniform metric.)

Navigating Nets7 Illustration Grid with missing piece

Navigating Nets8 Illustration Grid with missing piece Low-dimensional manifold (bounded curvature)

Navigating Nets9 Illustration Grid with missing piece Manifold Union of curves in Euclidean space

Navigating Nets10 Embedding doubling metrics Theorem [Assouad, 1983] [Gupta, K., Lee, 2003]: Fix 0<  <1, and let (X,d) be a doubling metric. Then (X,d  ) can be embedded with O(1) distortion into l 2 O(1). Not true for  =1 [Semmes, 1996]. Motivation: Embed S and then apply Euclidean NNS.

Navigating Nets11 Our results Simple data structure for maintaining S: (1+  )-NNS query time: (1/  ) O(dim(S)) · log  (for  <½), where  d max /d min is the normalized diameter of S (typically  =n O(1) ). Space: n · 2 O(dim(S))  Dynamic maintenance of S: Insertion / deletion time: 2 O(dim(S)) · log  · loglog . Additional properties: Best possible dependency on dim(S) (in a certain model). Oblivious to dim(S) and robust against “bad localities”. Matches/improves known (more specialized) results.

Navigating Nets12 Nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 )  r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs.  -net. Running example – a path metric: An 8-net A 4-net A 16-net

Navigating Nets13 More nets Definition: An r-net of X is a subset Y with 1. d(y 1,y 2 )  r for all y 1,y 2 2 Y. 2. d(x,Y) < r for all x 2 X n Y. (I.e., a maximal r-separated subset.) Note: Compare vs.  -net. Y r YY Y

Navigating Nets14 The data structure For every r = 2 i, let Y r be an r-net of S. Only O(log  ) values of r are non-trivial. A 16-net An 8-net A 4-net For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z)  2r}

Navigating Nets15 More on the data structure 3r Y r/2 YrYr For every r = 2 i, let Y r be an r-net of S. Only O(log  ) values of r are non-trivial. For every y 2 Y r maintain a navigation list L y,r = {z 2 Y r/2 : d(y,z)  2r}

Navigating Nets16 Space requirement Lemma: |L y,r |  2 O(dim(S)) for all y 2 Y, r ¸ 0. Proof: L y,r is contained in a ball of radius 2r. This ball can be covered by S 3 balls of radius r/4. Every point in L y,r  Y r/2 must be covered by a distinct ball. Hence, | L y,r |  S 3 = 2 3dim(S).  Corollary: Total space is 2 O(dim(S)) · n · log . We actually improve it to 2 O(dim(S)) · n.

Navigating Nets17 Back to running example A 16-net An 8-net A 4-net

Navigating Nets18 Navigating nets Let $ denote the query point. Initially z 16 = only point in Y 16. Find z 8 = closest Y 8 point to $. Find z 4 = closest Y 4 point to $ etc. $ $ $

Navigating Nets19 How to find z r/2 ? Assume each z r 2 Y r is the closest point to a (instead of to q). Then d(z r,z r/2 ) · r+r/2 = 3r/2. And z r/2 must be in z r ‘s list L y,r. q zr zr · r a z r/2 · r/2 · r/4 For z r to be closest Y r point to q, It suffices that d(q,a) · r/4. And then z r ’s list L y,r contains z r/2. Note: d(q,z r ) · 3r/2.

Navigating Nets20 Stopping point If we find a point z r with d(q,z r ) · 3r/2, But not a point z r/2 with d(q,z r/2 ) · 3r/4, We know that d(q,S) > r/4, Yielding 6-NNS with query time 2 O(dim(S)) · log . This can be extended to (1+  )-NNS Similar principles yield insertions and deletions.

Navigating Nets21 Near-optimality The basic idea: Consider a uniform metric on points. Let the query point be at distance 1 from all of them, Except for one point whose distance is 1- . Finding this point requires (in an oracle model) computing all distances to q. Can happen at every distance scale r. We get a lower bound of 2  (dim(S)) log .

Navigating Nets22 Related work – general metrics Let K X be the smallest K such that |B(x,r)|  K ¢ |B(x,r/2)| for all x 2 X, r ¸ 0. Define the KR-dimension as log 2 K X. Randomized exact NNS [Karger-Ruhl’02, Hildrum et al.’04] : Space n · 2 O(dim(S)) · log . Query time : 2 O(dim(S)) · log . If dim KR (S) = O(1) the log  term is actually O(log n). Our results extend to this setting: 1. KR-metrics are doubling: dim(X)  4dim KR (X). 2. Our algorithms actually give exact NNS. Assumptions on query distribution [Clarkson’99].

Navigating Nets23 Related work – Euclidean metrics Exact NNS for R d : O(d 5 log n) query time and O(n d+  ) space. [Meiser’93]  - NNS for R d : O((d/  ) d log n) query time and O(dn) space by quad-tree like decompositions [AMNSW’94]. Our algorithm achieves similar bounds. O(d polylog(dn)) query time and (dn) O(1) space is useful for higher dimensions [IM’98, KOR’98].

Navigating Nets24 Concluding remarks Our approach: A “decision tree” that is not really a tree (saves space). In progress: A different (static) scheme where log  is replaced by log n. Bounds on the help of “ambient” space points. Our data structure yields a spanner of the metric Immediate: O(1) stretch with average degree 2 dim(S). More work: O(1) stretch with maximum degree 2 dim(S). [Guibas,’04] applied the nets data structure for moving points in the plane.

Navigating Nets25

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Similar presentations

Presentation on theme: "Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)

Similar presentations

Presentation on theme: "Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)"— Presentation transcript:

Similar presentations

About project

Feedback