Download presentation
Presentation is loading. Please wait.
Published byDarren Fox Modified over 9 years ago
1
Dictionary search Exact string search Paper on Cuckoo Hashing
2
Exact String Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support searches for a pattern P over them. Hashing
3
Hashing with chaining
4
Key issue: a good hash function Basic assumption: Uniform hashing Avg #keys per slot = n * (1/m) = n/m = (load factor)
5
Search cost m = (n)
6
In practice A trivial hash function is: prime
7
A “provably good” hash is Each a i is selected at random in [0,m) k0k0 k1k1 k2k2 krkr ≈log 2 m r ≈ L / log 2 m a0a0 a1a1 a2a2 arar K a prime l = max string len m = table size not necessarily: (...mod p) mod m
8
Cuckoo Hashing ABC ED 2 hash tables, and 2 random choices where an item can be stored
9
ABC ED F A running example
10
ABFC ED
11
ABFC ED G
12
EGBFC AD
13
Cuckoo Hashing Examples ABC ED F G Random (bipartite) graph: node=cell, edge=key
14
Natural Extensions More than 2 hashes (choices) per key. Very different: hypergraphs instead of graphs. Higher memory utilization 3 choices : 90+% in experiments 4 choices : about 97% 2 hashes + bins of B-size. Balanced allocation and tightly O(1)-size bins Insertion sees a tree of possible evict+ins paths but more insert time (and random access) more memory...but more local
15
Dictionary search Making one-side errors Paper on Bloom Filter
16
Crawling How to keep track of the URLs visited by a crawler? URLs are long Check should be very fast No care about small errors (≈ page not crawled) Bloom Filter over crawled URLs
17
Searching with errors...
20
Problem: false positives
21
TTT 2
22
Not perfectly true but...
23
m/n = 8 Opt k = 5.45... We do have an explicit formula for the optimal k
26
Dictionary search Prefix-string search Reading 3.1 and 5.2
27
Prefix-string Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support prefix searches for a pattern P over them.
28
Trie: speeding-up searches 1 2 2 0 4 5 6 7 2 3 y s 1 z stile zyg 5 etic ial ygy aibelyite czecin omo Pro: O(p) search time Cons: edge + node labels and tree structure
29
Front-coding: squeezing strings http://checkmate.com/All_Natural/ http://checkmate.com/All_Natural/Applied.html http://checkmate.com/All_Natural/Aroma.html http://checkmate.com/All_Natural/Aroma1.html http://checkmate.com/All_Natural/Aromatic_Art.html http://checkmate.com/All_Natural/Ayate.html http://checkmate.com/All_Natural/Ayer_Soap.html http://checkmate.com/All_Natural/Ayurvedic_Soap.html http://checkmate.com/All_Natural/Bath_Salt_Bulk.html http://checkmate.com/All_Natural/Bath_Salts.html http://checkmate.com/All/Essence_Oils.html http://checkmate.com/All/Mineral_Bath_Crystals.html http://checkmate.com/All/Mineral_Bath_Salt.html http://checkmate.com/All/Mineral_Cream.html http://checkmate.com/All/Natural/Washcloth.html... 0 http://checkmate.com/All_Natural/ 33 Applied.html 34 roma.html 38 1.html 38 tic_Art.html 34 yate.html 35 er_Soap.html 35 urvedic_Soap.html 33 Bath_Salt_Bulk.html 42 s.html 25 Essence_Oils.html 25 Mineral_Bath_Crystals.html 38 Salt.html 33 Cream.html 33 45% 0 http://checkmate.com/All/Natural/Washcloth.html... ….systile syzygetic syzygial syzygy…. 2 55 Gzip may be much better...
30
….70systile 92zygeti c85ial 65y 110szaibelyite 82czecin92omo…. systile szaielyite CT on a sample 2-level indexing Disk Internal Memory A disadvantage: Trade-off ≈ speed vs space (because of bucket size) 2 advantages: Search ≈ typically 1 I/O Space ≈ Front-coding over buckets
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.