Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)

Similar presentations


Presentation on theme: "Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)"— Presentation transcript:

1 Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)

2 Nearest Neighbor Search (NNS)

3 Approximate NNS c-approximate q r p cr

4 Heuristic for Exact NNS q r p cr c-approximate

5 Locality-Sensitive Hashing q p 1 [Indyk-Motwani’98] q “ not-so-small ”

6 Locality sensitive hash functions 6

7 Full algorithm 7

8 Analysis of LSH Scheme 8

9 Analysis: Correctness 9

10 Analysis: Runtime 10

11 NNS for Euclidean space 11 [Datar-Immorlica-Indyk-Mirrokni’04]

12 Regular grid → grid of balls p can hit empty space, so take more such grids until p is in a ball Need (too) many grids of balls Start by projecting in dimension t Analysis gives Choice of reduced dimension t? Tradeoff between # hash tables, n , and Time to hash, t O(t) Total query time: dn 1/c 2 +o(1) Optimal* LSH 2D p p RtRt [A-Indyk’06]

13 x Proof idea Claim:, i.e., P(r)=probability of collision when ||p-q||=r Intuitive proof: Projection approx preserves distances [JL] P(r) = intersection / union P(r)≈random point u beyond the dashed line Fact (high dimensions): the x-coordinate of u has a nearly Gaussian distribution → P(r)  exp(-A·r 2 ) p q r q P(r) u p

14 LSH Zoo 14 To be or not to be To Simons or not to Simons …21102… be to or not Simons …01122… be to or not Simons …11101… …01111… {be,not,or,to}{not,or,to, Simons} 1 1 not beto

15 LSH in the wild 15 safety not guaranteed fewer false positives fewer tables

16 Time-Space Trade-offs [AI’06] [KOR’98, IM’98, Pan’06] [Ind’01, Pan’06] SpaceTimeCommentReference [DIIM’04, AI’06] [IM’98] query time space medium low high low ω(1) memory lookups [AIP’06] ω(1) memory lookups [PTW’08, PTW’10] [MNP’06, OWZ’11] 1 mem lookup

17 LSH is tight… leave the rest to cell-probe lower bounds?

18 Data-dependent Hashing! 18 [A-Indyk-Nguyen-Razenshteyn’14]

19 A look at LSH lower bounds 19 [O’Donnell-Wu-Zhou’11]

20 Why not NNS lower bound? 20

21 Intuition 21

22 Nice Configuration: “sparsity” 22

23 Reduction: into spherical LSH 23

24 Two-level algorithm

25 Details Inside a bucket, need to ensure “sparse” case 1) drop all “far pairs” 2) find minimum enclosing ball (MEB) 3) partition by “sparsity” (distance from center) 25

26 1) Far points 26

27 2) Minimum Enclosing Ball 27

28 3) Partition by “sparsity” 28

29 Practice of NNS 29 Data-dependent partitions… Practice: Trees: kd-trees, quad-trees, ball-trees, rp-trees, PCA-trees, sp-trees… often no guarantees Theory? assuming more about data: PCA-like algorithms “work” [Abdullah-A-Kannan- Krauthgamer’14]

30 Finale 30

31 Open question: [Prob. needle of length 1 is not cut] [Prob needle of length c is not cut] ≥ 1/c 2


Download ppt "Summer School on Hashing’14 Locality Sensitive Hashing Alex Andoni (Microsoft Research)"

Similar presentations


Ads by Google