# CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.

## Presentation on theme: "CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada."— Presentation transcript:

CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada

Extendible hashing Expandable and dynamic hashing Virtual hashing Summary 2 OUTLINE

3  Standard hashing works on fixed file size.  What if we add / delete many keys? What if the file sizes change significantly?  Then we will develop separate techniques. Two types: - Directory schemes - Directory less schemes Hash Functions for Extendible Hashing

4  Keys stored in buckets.  Each bucket can only hold a fixed size of items.  Index is an extendible table; h(x) hashes a key value x to a bit map; only a portion of a bit map is used to build a directory. Example: buckets h(k n ) = 11011 Add kn b 00 ******************************** b 00 b 01 b 01 b 10 Table b 1 b 11 Extendible Hashing 00011 00110 00101 01100 01011 10011 11110 11111 00 01 10 11 00 01 10 11 10011 11011 11110 11111

5  Directory schemes - Extendible Hashing (Fagin et. al. 1979) - Expandable hashing (Knott 1971) - Dynamic Hashing (Larson 1978)  Directory less schemes - Virtual hashing (Litwin 1978) Hash Functions for Extendible Hashing

6  Size of a bucket = MAX # of pseudokeys (3 in our example)  Once the bucket is full – split the bucket into two Two situation will be possible: - Directory remains of the same size adjust pointer to a bucket - Size of directory grows from 2 k to 2 k+1 i.e. directory size can be 1, 2, 4, 8, 16 etc (8 is shown in the figure). The number of buckets will remain the same, i.e. some references will point to the same bucket. Finally, one can use bitmap to build the index but store an actual key in the bucket! Extendible Hashing 000 001 010 011 100 101 110 111

7 1. Use as much space as needed. 2. Input the file name, # of words to insert Use bucket size: 128 3.Use any function h(k) that returns the string of bits of up to 32 bits (integer type can be used). 4.Bucket – char array 5.Main idea: only the FIRST bits of the mask are used for search Extendible Hashing

8 Assume that a hashing technique is applied to a dynamically changing file composed of buckets, and each bucket can hold only a fixed number of items. Extendible hashing accesses the data stored in buckets indirectly through an index that is dynamically adjusted to reflect changes in the file. The characteristic feature of extendible hashing is the organization of the index, which is an expandable table. Extendible Hashing

9  A hash function applied to a certain key indicates a position in the index and not in the file (or table or keys). Values returned by such a hash function are called pseudokeys.  The file requires no reorganization when data are added to or deleted from it, since these changes are indicated in the index. Only one hash function h can be used, but depending on the size of the index, only a portion of the added h(K) is utilized.  A simple way to achieve this effect is by looking at the address into the string of bits from which only the i leftmost bits can be used. The number i is the depth of the directory. In figure 1(a) (in the next slide), the depth is equal to two. Extendible Hashing

10 Extendible Hashing Figure 1. An example of extendible hashing (Drozdek Textbook)

11 Expandable Hashing  Similar idea to an extendible hashing. But binary tree is used to store an index on the buckets. Dynamic Hashing  multiple binary trees are used. Outcome: - To shorten the search. - Based on the key --- select what tree to search. Expandable & Dynamic Hashing

12  Larson method  Index is simplified to be represented as a set of binary trees.  Height of each tree is limited.  h(x) is searched in ALL trees.  Time: m – trees, k keys in each max, overall: m*lgk.  Advantage: shorter search time in index file Dynamic Hashing

13 Litwin’s Virtual Hashing  Expand buckets in a linear fashion.  Store them continuously in the memory.  No table is needed, the procedure is simple. Virtual Hashing

14 Summary Extendible hashing advantages: Initially allocated space can increase indefinitely Location of a bucket where key belongs requires only very fast bits comparison Very flexible in choosing size of the bucket, and allows their storage on disks/remote memory access Extendible hashing disadvantages: Increased algorithm complexity Extra memory overhead to store index inside the bucket