Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dictionary search Exact string search Paper on Cuckoo Hashing.

Similar presentations


Presentation on theme: "Dictionary search Exact string search Paper on Cuckoo Hashing."— Presentation transcript:

1 Dictionary search Exact string search Paper on Cuckoo Hashing

2 Exact String Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support searches for a pattern P over them. Hashing

3 Hashing with chaining

4 Key issue: a good hash function Basic assumption: Uniform hashing Avg #keys per slot = n * (1/m) = n/m =  (load factor)

5 Search cost m =  (n)

6 In practice A trivial hash function is: prime

7 A “provably good” hash is Each a i is selected at random in [0,m) k0k0 k1k1 k2k2 krkr ≈log 2 m r ≈ L / log 2 m a0a0 a1a1 a2a2 arar K a prime l = max string len m = table size not necessarily: (...mod p) mod m

8 Cuckoo Hashing ABC ED 2 hash tables, and 2 random choices where an item can be stored

9 ABC ED F A running example

10 ABFC ED

11 ABFC ED G

12 EGBFC AD

13 Cuckoo Hashing Examples ABC ED F G Random (bipartite) graph: node=cell, edge=key

14 Natural Extensions More than 2 hashes (choices) per key. Very different: hypergraphs instead of graphs. Higher memory utilization 3 choices : 90+% in experiments 4 choices : about 97% 2 hashes + bins of B-size. Balanced allocation and tightly O(1)-size bins Insertion sees a tree of possible evict+ins paths but more insert time (and random access) more memory...but more local

15 Dictionary search Making one-side errors Paper on Bloom Filter

16 Crawling How to keep track of the URLs visited by a crawler? URLs are long Check should be very fast No care about small errors (≈ page not crawled) Bloom Filter over crawled URLs

17 Searching with errors...

18

19

20 Problem: false positives

21 TTT 2

22 Not perfectly true but...

23 m/n = 8 Opt k = 5.45... We do have an explicit formula for the optimal k

24

25

26 Dictionary search Prefix-string search Reading 3.1 and 5.2

27 Prefix-string Search Given a dictionary D of K strings, of total length N, store them in a way that we can efficiently support prefix searches for a pattern P over them.

28 Trie: speeding-up searches 1 2 2 0 4 5 6 7 2 3 y s 1 z stile zyg 5 etic ial ygy aibelyite czecin omo Pro: O(p) search time Cons: edge + node labels and tree structure

29 Front-coding: squeezing strings http://checkmate.com/All_Natural/ http://checkmate.com/All_Natural/Applied.html http://checkmate.com/All_Natural/Aroma.html http://checkmate.com/All_Natural/Aroma1.html http://checkmate.com/All_Natural/Aromatic_Art.html http://checkmate.com/All_Natural/Ayate.html http://checkmate.com/All_Natural/Ayer_Soap.html http://checkmate.com/All_Natural/Ayurvedic_Soap.html http://checkmate.com/All_Natural/Bath_Salt_Bulk.html http://checkmate.com/All_Natural/Bath_Salts.html http://checkmate.com/All/Essence_Oils.html http://checkmate.com/All/Mineral_Bath_Crystals.html http://checkmate.com/All/Mineral_Bath_Salt.html http://checkmate.com/All/Mineral_Cream.html http://checkmate.com/All/Natural/Washcloth.html... 0 http://checkmate.com/All_Natural/ 33 Applied.html 34 roma.html 38 1.html 38 tic_Art.html 34 yate.html 35 er_Soap.html 35 urvedic_Soap.html 33 Bath_Salt_Bulk.html 42 s.html 25 Essence_Oils.html 25 Mineral_Bath_Crystals.html 38 Salt.html 33 Cream.html 33  45% 0 http://checkmate.com/All/Natural/Washcloth.html... ….systile syzygetic syzygial syzygy…. 2 55 Gzip may be much better...

30 ….70systile 92zygeti c85ial 65y 110szaibelyite 82czecin92omo…. systile szaielyite CT on a sample 2-level indexing Disk Internal Memory A disadvantage: Trade-off ≈ speed vs space (because of bucket size) 2 advantages: Search ≈ typically 1 I/O Space ≈ Front-coding over buckets


Download ppt "Dictionary search Exact string search Paper on Cuckoo Hashing."

Similar presentations


Ads by Google