An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.

An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch

Hashing techniques currently available  Many hashing algorithms out there: Separate chaining Cuckoo hashing FKS perfect hashing  Also many hash functions designed, including several universal families  Good: O(1) expected amortized time for updates, and many have O(1) worst case time for searches  Bad: Require fixed-length keys and fixed- length data

What’s so bad about fixed length?  Easy to waste a lot of space: Every hash bucket must be as large as the largest item to be stored in the table. This is a large problem for sparsely-filled tables, or tables where large items occur infrequently.  Hash tables are often building blocks to more complicated structures, so optimizing them pays off in a lot of places.

Example: A Graph Layout Where We Store Edges in a Hash Table Let’s say u is a vertex of degree d and v 1, … v d are its neighbors. Let’s say that v 0 = v d+1 = u by convention. Then the entry representing the edge (u, v i ) has key (u, v i ) and data (v i-1, v i+1 ). u v2v2 v1v1 v3v3 u v1v1 u v2v2 u u v4v4 v1v1 u v3v3 v2v2 v4v4 4 Hash Table Degree of Vertex This extra entry “starts” the list.

An Idea for Compression  Instead of ((u, v i ), (v i-1, v i+1 )) in the table, we will store ((u, v i – u), (v i-1 – u, v i+1 – u)).  With this representation, we need O(kn) space where k =  (u,v)E log |u – v|.  A good labeling of the vertices will make many of these differences small! But not all of them. The following paper has details: D. Blandford, G. E. Blelloch, and I. Kash. Compact Representations of Separable Graphs. In SODA, 2003, pages 342-351.

First, a simpler problem  Variable-length data stored in arrays  It’s like a hash table except that the indices now are in the fixed range 0…n-1 for n items in the array. We’ll use the following data for our example in these slides: (0, 10110)(1, 0110)(2, 11111) (3, 0101)(4, 1100)(5, 010)(6, 11011) (7, 00001111) We’ll assume that the word size of the machine is 2 bytes.

0 1 1 0 Key Idea: BLOCKS  Multiple data items can be crammed into a word, so let’s take advantage of that.  Two words in a block: one with data, one marking off separations of strings  If the first index in a block is i, we’ll label the block as b i b0b0 2 nd word 1 0 1 1 01 1 1 1 1 This is the block containing strings s 0 through s 2 from our example. 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 st word

Organization of Blocks  Index structure (regular array): A[i] = 1 if and only if string #i starts a block  Hash table (one of the regular kind): if string #i starts a block, H(i) = address of b i  Note that it is easy to split and merge blocks. b0b0 b3b3 b7b7 A 1 1 1 0 0 0 0 0 Key size invariant: two adjacent blocks (like b0 and b3 in the example) must have their sizes sum to greater than the word size of the machine H(0) H(3) H(7)

A Rough Look at Space and Time Bounds for this Array Structure Let’s say we have n items, and w is the word size in bits of the machine. WLOG all data strings are nonempty. Let m =  i |s i |. Lookup tables cut the time down to constant time for finding a block and the string inside it, since they can operate on entire words at once. Indexing structures + hash table use O(w) bits per block. Each block is on average half full due to the invariant. O(m + w) bits used and operations are O(1) time! At most w strings 1 1 1 0 0 0 0 0 <= w apart On avg block is ½ full O(m/w + 1) blocks!

Briefly, how we proceed from there  We can finally implement our generalized hash table using an array of the type we just described as the hash table.  There are more details: the following paper explains this. D. Blandford and G. E. Blelloch. Storing Variable-Length Keys in Arrays, Sets, and Dictionaries, with Applications. In Symposium on Discrete Algorithms (SODA), 2005 (hopefully)

Great, but if there’s a paper written on the subject already, then what do I do?  A lot of this code isn’t yet written. We haven’t yet checked to see that the code we have fulfills the theoretical bounds, since we have to make sure that any “cutting corners” done for the programming is theoretically safe.  My job is to get a lot of this running and look for optimizations.  Also, once this is running, we’ll want to run experiments to see how well it runs, especially in modeling graphs.

An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.

Similar presentations

Presentation on theme: "An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch.

Similar presentations

Presentation on theme: "An Approach to Generalized Hashing Michael Klipper With Dan Blandford Guy Blelloch."— Presentation transcript:

Similar presentations

About project

Feedback