A Note on Finding the Nearest Neighbor in Growth-Restricted Metrics Kirsten Hildrum John Kubiatowicz Sean Ma Satish Rao
Peer-to-Peer Networks A B C D
Challenge: Efficiency Low number of links per node Short paths –Hops –Network distance Challenging –Nodes connect via the Internet –Can’t see the “big picture” when choosing overlay links
Overlay Routing Source Destination
One Way (prefix routing) Level 1 Level 2 Level 3
Maintaining Structure Make assumptions about network Note: –Linear space search for sequential case –Really, tree for every destination This talk: Routing structure can be used to find nearest neighbor (or nearest-at-level)
Metric space Basic technique distance-halving Need to have some point at half-distance Points roughly evenly-spaced good query bad query
Growth Restriction Growth Restriction: –Ball of radius 3r contains only a factor of c more nodes than ball of radius r. –Stronger than “polynomial growth” Average tree degree more than c –Sample less than 1/c r 3r
Algorithm Idea, in pictures new
The Trimming Lemma Circle of radius r around new with at least one level i node. To find level (i-1) must find its parent. Level (i-1) node in little circle must point to a level i node in 3r of new <2r r 3r
How Many? Keep fixed k. Expect about c nodes in big circle, O(log n) with high probability Closest O(log n) level i nodes, have all within circle, w.h.p. r 3r
Counting Messages Only O(log n) per level needed. Total is O(log 2 n). –(proved previously) We show: Only O(log n) total need to be contacted.
Certificate of “neighborliness” A “proof” that the end node found really is the closest (all the nodes that had to be contacted) Recursive, use level i to verify level (i-1) For every level, keep nodes required by the trimming lemma and no more Then bound overall size of certificate (Algorithm is closely connected)
Adaptive k Before, saved closest k Only need ones required by trimming lemma c, in expectation log n levels, each constant expectation, total is O( log n) r 3r
My little lie Trimming lemma fails if level (i-1) node was outside circle Must expand circle & keep more level i nodes More level i means more level i+1… if sample rate < 1/c, level (i-1)’s overall effect is still constant r 3r
Algorithm Keep what you think you need for trimming lemma Get more later if needed Touch only nodes in certificate –Return nodes on a level in order of distance from query node –Use get-level-i-in-order to implement get-level-(i-1)- in-order r 3r
(Some) Related Work Karger and Ruhl, STOC02, find nearest neighbor with O(log n) queries HKRZ, fixed k technique, O(log 2 n) Krauthgamer and Lee, SODA04 –deterministic, wider class of metric spaces –not peer-to-peer
Final Notes O(log n) neighbor search w/ applications to peer-to-peer networks –Prefix routing: PRR97, Pastry, Tapestry, LAND Question Is stronger metric assumption necessary for load balance?