Presentation is loading. Please wait.

Presentation is loading. Please wait.

External Memory Hashing

Similar presentations


Presentation on theme: "External Memory Hashing"— Presentation transcript:

1 External Memory Hashing

2 Hash Tables Hash function h: search key  [0…B-1].
Buckets are blocks, numbered [0…B-1]. Big idea: If a record with search key K exists, then it must be in bucket h(K). One disk I/O if there is only one block per bucket. h(d) = 0, h(e) = h(c) = 1, … Hash­Table Lookup: For record(s) with search key K, compute h(K); search that bucket.

3 Hash­Table Insertion Put in bucket h(K) if it fits; otherwise create an overflow block. Overflow block(s) are part of bucket. Example: Insert record with search key g, where h(g) = 1 Delete c, where h(c)=1 ?

4 What if the File Grows too Large?
Efficiency is highest if actual #records < #buckets  max #(records/bucket) If file grows, we need a dynamic hashing method to maintain the above relationship. Extensible Hashing: double the number of buckets when needed. Linear hashing: add one more bucket as appropriate.

5 Dynamic Hashing Framework
Hash function h produces a sequence of k bits. Only some of the bits are used at any time to determine placement of keys in buckets. Extensible Hashing (Buckets may share blocks!) Keep parameter i = number of bits from the beginning of h(K) that determine the bucket. Bucket array now = pointers to buckets. A block can serve several buckets. For each block, a parameter ji tells how many bits of h(K) determine membership in the block. i.e., a block represents 2i-j buckets that share the first j bits of their number.

6 Example An extensible hash table when i=1:

7 Extensible Hash­table Insert
If record with key K fits in the block B pointed to by h(K), put it there. If not, let this block B represent j bits. j=i: Set i:=i+1; Double the bucket array, so it has now 2i+1 entries; Let w be an old array entry. Both the new entries, w0 and w1, point to the same block that w used to point to. Split B into two and distribute the records (of B) according to (j+1)st bit; set j:=j+1; fix pointers in bucket array, so that entries that formerly pointed to B now point either to B or the new block How? depending on…(j+1)st bit j<i: Do as in 1.d

8 Now, after the insertion
Example Insert record with h(K) = 1010. Before Now, after the insertion

9 Example: Next Next: records with h(K)=0000; h(K)=0111.
Bucket for 0... gets split, but i stays at 2. Then: record with h(K) = 1000. Overflows bucket for 10... Raise i to 3. After the insertions Currently

10 More on Extensible Hashing
Two ideas Use i bits of the k bits of h(Key) k use i  grows over time…. h(Key)  Use directory: Bucket Address Table (BAT) . h(Key)[i ] to bucket

11 Example – Extensible Hashing – Insert
New directory 2 00 01 10 11 i = i = 1 0001 1001 1100 h(K) is 4 bits; 2 keys/bucket 1 1100 1010 Insert 1010

12 Example – Extensible Hashing – Insert
2 0111 0000 0001 i = 1 0001 2 1001 1010 1100 00 01 10 11 h(K) is 4 bits; 2 keys/bucket 0111 Insert: 0111 0000

13 Example – Extensible Hashing – Insert
000 001 010 011 100 101 110 111 3 i = 00 01 10 11 2 i = 1001 1010 1100 0111 0000 0001 h(K) is 4 bits; 2 keys/bucket 1001 1011 Insert: 1011

14 Extensible Hash Tables:
Advantages: Lookup; never search more than one data block. Hope that the bucket array fits in main memory Defects: Doubling the bucket array could make the array to not fit in main memory. Problem with skewed key distributions. E.g. Let 1 block=2 records. Suppose that three records have hash values, which happen to be the same in the first 20 bits. In that case we would have i=21 and 2^21 ≈ two million bucket-array entries, even though we have only 3 records!!

15 This is also part of the structure
Linear Hashing Number of buckets n is always chosen so that no. of records r < p × (no. of buckets × bucket capacity) p is typically between 0.5 and 0.9 Overflow blocks are permitted If r > p × (no. of buckets × bucket capacity) one new bucket is allocated, and one old bucket needs to be rehashed Number of bits i needed to index n buckets = log2(n) Buckets numbered [0…n-1], where 2i-1 < n  2i. i=1 n=2 r=3 #of buckets #of records This is also part of the structure

16 This is also part of the structure
Linear Hashing Use i bits from right (low ­order) end of h(K). Buckets numbered [0…n-1], where 2i-1 < n  2i. Let last i bits of h(K) be ai-1ai-2…a0 = integer m. If m < n, then record belongs to bucket m. If n  m < 2i, then record belongs to bucket m-2i-1, that is the bucket we would get if we changed ai-1 (which must be 1) to 0. i=1 n=2 r=3 #of buckets #of records This is also part of the structure

17 Lookup in Linear Hash Table
For record(s) with search key K, compute h(K); search the corresponding bucket according to the procedure described for insertion. If the record we wish to look up isn’t there, it can’t be anywhere else. E.g. lookup for a key which hashes to 1010, and then for a key which hashes to 1011. i=2 n=3 r=4

18 Linear Hash­Table Insert
Pick an upper limit on capacity, e.g., 85% (1.7 records/bucket in our example). If an insertion exceeds capacity limit, set n := n + 1. If new n is 2i + 1, set i := i + 1. No change in bucket numbers needed --- just imagine a leading 0. Need to split bucket n - 2i-1 because there is now a bucket numbered (old) n.

19 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100.
r=1 n=1 i=1 0000 Before After

20 Example Capacity limit exceeded; increment n
Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=1 n=1 i=1 0000 r=2 n=2 i=1 1010 0000 1 Before After Capacity limit exceeded; increment n

21 Example Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100.
r=2 n=2 i=1 1010 0000 r=3 n=2 i=1 1 1111 1 Before After

22 Example Capacity limit exceeded; increment n,
Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=3 n=2 i=1 1010 0000 r=4 n=3 i=2 0000 00 1111 1 1111 01 0101 1010 10 Before After Capacity limit exceeded; increment n, which causes incrementing i as well.

23 Example As long as capacity is not exceeded can add overflow blocks.
Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=4 n=3 i=2 0000 00 r=5 n=3 i=2 0000 00 1111 01 0101 1111 01 0101 0001 1010 10 1010 10 Before After As long as capacity is not exceeded can add overflow blocks.

24 Example Capacity limit exceeded; increment n.
Insert records with h(K) = 0000, 1010, 1111, 0101, 0001, 1100. r=5 n=3 i=2 0000 00 r=6 n=4 i=2 0000 00 1100 1111 01 0101 0001 0001 01 0101 1010 10 1010 10 1111 11 Before After Capacity limit exceeded; increment n.

25 Exercise Suppose we want to insert keys with hash values: 0000…1111 in a linear hash table with 100% capacity threshold. Assume that a block can hold three records.


Download ppt "External Memory Hashing"

Similar presentations


Ads by Google