Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,

Similar presentations


Presentation on theme: "© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,"— Presentation transcript:

1 © 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York, NY, USA 3 Facebook, Menlo Park, CA, USA Reciprocal Hash Tables for Nearest Neighbor Search

2 Introduction – Nearest Neighbor Search – Motivation Reciprocal Hash Tables – Formulation – Solutions Experiments Conclusion Outline

3 Introduction: Nearest Neighbor Search (1) 3

4 Hash based nearest neighbor search – Locality sensitive hashing [Indyk and Motwani, 1998]: close points in the original space have similar hash codes Introduction: Nearest Neighbor Search (2) 4 x1x1 Xx1x1 x2x2 x3x3 x4x4 x5x5 h1h h2h h1h1 h2h2 ……………… hkhk …………… 010…100…111…001…110… x2x2 x3x3 x4x4 x5x5

5 Hash based nearest neighbor search – Compressed storage: binary codes – Efficient computations: hash table lookup or Hamming distance ranking based on binary operations Introduction: Nearest Neighbor Search (3) 5 … … wkwk 1 0/-1 HashingHash Table Bucket Indexed Image

6 Problems – build multiple hash tables and probe multiple buckets to improve the search performance [Gionis, Indyk, and Motwani, 1999; Lv et al. 2007] – not much research studies the general strategy for multiple hash table construction random selection: widely-used general strategy, usually need a large number of hash tables Motivation – Similar to the well-studied feature selection problem, select the most informative and independent hash functions support various types of hashing algorithms, different data sets and scenarios, etc. Introduction: Motivation 6 … Search results

7 7 Reciprocal Hash Tables: Formulation (1)

8 8 Reciprocal Hash Tables: Formulation (2)

9 9 Reciprocal Hash Tables: Formulation (3)

10 10 Reciprocal Hash Tables: Solutions (1)

11 11 Reciprocal Hash Tables: Solutions (2)

12 12 Reciprocal Hash Tables: Solutions (3)

13 Boosting style: try to correct the previous mistakes by updating weights on neighbor pairs in each round Sequential Strategy: Boosting x l1 x l2 x l3 … … x l1 x l2 x l3 … … x l1 x l2 x l3 … … 13 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 … x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 …x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 … > 0< 0= 0 similaritiesprediction errorupdated similarities

14 14 Reciprocal Hash Tables: Solutions (4)

15 Experiments Datasets – SIFT-1M: 1 Million 128-D SIFT – GIST-1M: 1 Million 960-D GIST Baselines: – Random selection Setting: – 10,000 training samples and 1,000 queries on each set – 100 neighbors and 200 non-neighbors for each training sample – The groundtruth for each query is defined as the top 5 nearest neighbors based on Euclidean distances – Average performance of 10 independent runs 15

16 16 Experiments: Over Basic Hashing Algorithms (1) Hash Lookup Evaluation the precision of RAND deceases dramatically with more hash tables, while (R)DHF increase their performance first and attain significant performance gains over RAND both methods faithfully improve the performance over RAND in terms of hash lookup.

17 17 Experiments: Over Basic Hashing Algorithms (2) Hamming Ranking Evaluation DHF and RDHF consistently achieve the best performance over LSH, KLSH and RMMH in most cases RDHF gains significant performance improvements over DHF

18 18 Experiments: Over Multiple Hashing Algorithms build multiple hash tables using different hashing algorithms with different settings, because many hashing algorithms are prevented from being directly used to construct multiple tables, due to the upper limit of the hash function number double bit (DB) quantization [Liu et al. 2011] on PCA-based Random Rotation Hashing (PCARDB) and Iterative Quantization (ITQDB) [Gong and Lazebnik 2011].

19 Summary and contributions – a unified strategy for hash table construction supporting different hashing algorithms and various scenarios. – two important selection criteria for hashing performance – formalize it as the dominant set problem in a vertex- and edge-weighted graph representing all pooled hash functions – a reciprocal strategy based on boosting to reduce the redundancy between hash tables Conclusion 19

20 Thank you!


Download ppt "© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,"

Similar presentations


Ads by Google