Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Similar presentations


Presentation on theme: "Optimal Data-Dependent Hashing for Approximate Near Neighbors"— Presentation transcript:

1 Optimal Data-Dependent Hashing for Approximate Near Neighbors
Ilya Razenshteyn (CSAIL MIT) Alexandr Andoni (Simons Institute)

2 Approximate Near Neighbors (ANN)
Dataset: n points in d dimensions Query: a point within 1 from a data point Want: a data point within 2 from the query Regime: d ≈ log n

3 Locality-Sensitive Hashing (LSH)
Hash family on Rd: points at distance ≤ 1 collide more often (with probability ≥ p1) than points at distance ≥ 2 (with probability ≤ p2) (Indyk, Motwani 1998): LSH can be used to solve ANN with space O(n1+ρ) and query time O(nρ), where ρ depends on the gap between p1 and p2 Question: what is the best ρ?

4 Bounds on LSH (Indyk, Motwani 1998): ρ = 1/2 (defined LSH and first constructions) (Andoni, Indyk 2006): ρ = 1/4 (better LSH construction) (O’Donnell, Wu, Zhou 2011): ρ = 1/4 is optimal! Can one improve upon LSH for ANN?

5 Beyond LSH (Andoni, Indyk, Nguyen, R 2014), (Andoni, R 2015): ρ = 1/7 for ANN (the best LSH gives only ρ = 1/4) The main idea LSH is data-oblivious: good collision probabilities for every pair from Rd For ANN one of the points may be assumed to lie in the dataset Data-dependent hashing: a hash family depends on the points Achieve improvement for every dataset (Andoni, R 2015): ρ = 1/7 is optimal for data-dependent hashing See the arXiv preprint (intricate proof, but very simple algorithm!)

6 Open problems Dynamic case Instance-optimal data-dependent hashing


Download ppt "Optimal Data-Dependent Hashing for Approximate Near Neighbors"

Similar presentations


Ads by Google