Hash Table indexing and Secondary Storage Hashing.

Hash Table indexing and Secondary Storage Hashing

In Memory An array of B buckets indexed from 0 to B-1 Each bucket is the head of a linked list The bucket index is determined by hash function h(k), where k is the key A common hash function, h(k) = B%k 4/6/20092COMP2631 - Mount Allison University

Secondary Storage Hashing Static hash table, fixed number of buckets Dynamic hash table, number of buckets can grow 4/6/20093COMP2631 - Mount Allison University

Static Hash Table The bucket array consists of blocks, rather than pointers to linked lists Records that are hashed by the hash function to a certain bucket are stored in the block of that bucket If there is no more place in the block, a chain of overflow blocks can be added to the bucket 4/6/20094COMP2631 - Mount Allison University

Dynamic Hash Tables Number of buckets (B) approximate the number of records divided by the number of records that can fit on a block, i.e. there is about one block per bucket Extensible hashing, B grows by doubling it Linear hashing, B grows by 1 4/6/20095COMP2631 - Mount Allison University

Extensible Hashing There is an array of pointers to blocks that represent the buckets, instead of array consisting of data itself. The length of array is always a power of, so in a growing step, the number of buckets doubles. There is not necessarily a data block for every bucket, some buckets can share a block if total number of records in those buckets fit in a block 4/6/20096COMP2631 - Mount Allison University

Extensible Hashing The hash function computes for each key a sequence of k bits. The bucket numbers use a small set of those k bits, say i most significant bits Therefore the bucket array has 2 i entries 4/6/20097COMP2631 - Mount Allison University

Extensible Hashing Advantage: when looking for a record, we never need to search more than one data block Disadvantage: for large i, doubling the array size is a substantial amount of work 4/6/20098COMP2631 - Mount Allison University

Extensible Hashing Disadvantage: for large i, the bucket array may not fit in memory any more. Example: assuming i = 32, the size of array will be 4 billion entries, and every pointer is 32 bits or 4 bytes, then the size of array will be 4 bytes x 4 billion = 16 GB 4/6/20099COMP2631 - Mount Allison University

Extensible Hashing 1.Every key has 4 bits, the most significant bit is used to determine the bucket number 2.The number 1 appearing in the nub of each block (lets call it j), indicates the number of bits used to determine membership of records in this block 4/6/200910COMP2631 - Mount Allison University

Extensible Hashing Insertion: If i = j, increment i by 1, and double the length of bucket array, i.e. 2 i+1 If j < i, split block B into two, distribute records in B to the two blocks based on (j+1) most significant bits, adjust j value for the proper blocks, adjust pointers in bucket array to point to proper blocks 4/6/200911COMP2631 - Mount Allison University

Extensible Hashing 1.Lets insert 1010 into this structure, it has to go to block 1, but there no place, 2.Then we have to split the block,, 3.and i = j, then we increment i and double the size of bucket array 4.Then we can split the block 1 into two blocks 4/6/200912COMP2631 - Mount Allison University

Extensible Hashing 1.Now block 1 is split into blocks 10 and 11 2.We use two bits now to determine the proper block for every record 3.Note the first block still is using one bit, therefore both buckets 00 and 01 point to it 4.If we insert 0000, it will go to the block pointed by buckets 00 and 01 5.If we insert 0111, based on i = 2 it has to go the same block and there is no room 6.Since j < i, we can simply split that block into two and adjust the proper bucket pointers 4/6/200913COMP2631 - Mount Allison University

Linear Hashing The number of buckets B is always chosen so the average number of records per bucket is a fixed fraction, say 80%, of the number of records that fill one block. Since blocks cannot always be split, overflow blocks are permitted. 4/6/200914COMP2631 - Mount Allison University

Linear Hashing The number of bits used to number the entries of the bucket array is (Ceiling (log 2 B)), where B is the current number of buckets. These bits are always taken from the right (low-order) end of the bit sequence that is produced by the hash function. We treat those bits as a binary integer number m, therefore if m<B, then the bucket m exists, if B <= m < 2 i, the bucket m does not exist yet, we place the record in bucket m – 2 i-1, 4/6/200915COMP2631 - Mount Allison University

Linear Hashing 1.i is the number of bits to address the buckets, the right most bit is used 2.n is the number of buckets 3.r is the number of records 4.We keep r/n <= 1.7, average occupancy of a bucket does not exceed 85% of the capacity of the block 4/6/200916COMP2631 - Mount Allison University

Linear Hashing 1.To insert 0101, since the bit sequence ends in 1, the record goes to bucket 1. 2.There is room then it can go there. 3.However now we exceed the ratio 1.7 (r/n), we should raise n to 3, then i = log 3 = 2 4/6/200917COMP2631 - Mount Allison University

Linear Hashing 1.Now we insert 0001, it has to go to the bucket 01, since its last two bits are 01 2.However that bucket is full 3.We add an overflow block 4.The ratio of records/buckets is 5/3, and still less than 1.7, so we don’t create new bucket 4/6/200918COMP2631 - Mount Allison University

Linear Hashing 1.Now lets insert 0111, this has to go to bucket m = 11 2, 2. m = 11 2 = 3 10 = n (number of buckets), then the bucket doesn’t exist 3.We place it in the bucket m – 2 i-1, i.e. 3 – 2 = 1 10 = 01 2, 4.However, the ratio of r/n exceeds 1.7, so we create a new bucket, i.e. 11 4/6/200919COMP2631 - Mount Allison University

Linear Hashing 4/6/2009COMP2631 - Mount Allison University20 1.Suppose we look for 1010 2.Since i = 2, we look for bucket number = 10 2 = 2 10 3.Since m < n, then the bucket exist 1.Now lets look for 1011 2.Must be in bucket 11 3.But 11 2 = 3 10 = n, therefore the bucket doesn’t exist 4.We redirect to bucket 01 2 = 1 10, remember (m – 2 i-1 ) 5.If it is not there, surely it doesn’t exist

Hash Table indexing and Secondary Storage Hashing.

Similar presentations

Presentation on theme: "Hash Table indexing and Secondary Storage Hashing."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash Table indexing and Secondary Storage Hashing.

Similar presentations

Presentation on theme: "Hash Table indexing and Secondary Storage Hashing."— Presentation transcript:

Similar presentations

About project

Feedback