Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function.

Similar presentations


Presentation on theme: "Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function."— Presentation transcript:

1

2 Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function to be modified dynamically –When the hash function takes modulo 10 in the previous example, the number of buckets is fixed to 10 –The trick is how to change the hash function so that the number of buckets can change –And at the same time, without the need of rehashing the existing records! –Imagine if you change modulo 10 to modulo 13, then every existing record has to be rehashed – not a good idea

3 Department of Computer Science and Engineering, HKUST Slide 2 Extendable Hashing Extendable hashing - one form of dynamic hashing –Hashing function generates values over a large range - typically b-bit integers, with b = 32. At any time, use only a prefix of the b-bit integers to index into a table of bucket addresses. Let the length of the prefix be i bits, 0 < i < 32 Initially i = 1, meaning that it can index at most 2 buckets When the 2 buckets are full, we can use 2 bits (i = 2), meaning that we can now index at most 4 buckets, and so on and so forth…. i grows and shrinks as the size of the database grows and shrinks. Actual number of buckets is < 2 i, which may change due to bucket merging and splitting b-10 i bits

4 Department of Computer Science and Engineering, HKUST Slide 3 Extendable Hash Structure  General Ideas Initially, i = 1, use 1 bit in the hash key, resulting in two entries in the hash address table Suppose we start with only 1 or 2 records, we need only 1 bucket initially Both entries in the hash address table point to the same bucket i 0 = 0 means no bit had been used to separate records in the bucket (I.e., records are all hashed into bucket 0 irrespective of the any bit setting in the hash key values) New record

5 Department of Computer Science and Engineering, HKUST Slide 4 Extendable Hash Structure  Bucket Expansion New record Suppose bucket 0 is full and a new record arrives Create a new bucket, rehash the three records (two existing ones and the new record) into buckets 0 and, according to the last bit of their hash keys

6 Department of Computer Science and Engineering, HKUST Slide 5 General Extendable Hash Structure Note: why do we need to keep i, i 0 and i 1 ? i is the maximum number of bits used in hashing so far; i 0 and i 1 are the number of bits used for these particular buckets Record originally in bucket 0 New record

7 Department of Computer Science and Engineering, HKUST Slide 6 General Extendable Hash Structure A new record 1)Upon inserting of a new (red) record, bucket 0 is full again 2)Bucket 2 is created, and the three records (two existing ones and the new one) are rehashed among buckets 0 and 2 based on the second bit

8 Department of Computer Science and Engineering, HKUST Slide 7 General Extendable Hash structure Bucket 1 not changed Bucket 2 is new use the first 2 bits from the hash key to address the 4 entries in the table. 2 bits from the hash key had been use to hash the records use the 3 rd bit in next split 1 bit from the hash key had been use to hash the records use the 2nd bit in next split

9 Department of Computer Science and Engineering, HKUST Slide 8 Extendable Hash Structure – Properties Every expansion doubles the number of entries in the table Multiple entries in the bucket address table may point to the same bucket. It means that the bucket hasn’t been expanded while other buckets had been expanded multiple times Each bucket j stores a value i j ; entries in the same bucket have the same values on the first i j bits of the hash keys To locate the bucket containing search-key K j : 1. Compute h(K j ) = X 2. Use the first i high order bits of X to look up the hash address table, and follow the pointer to appropriate bucket To insert a record with search-key value K j, look up the bucket where it should belong, say j. If there is room in bucket j insert record in the bucket, else the bucket must be split and insertion re-attempted.

10 Department of Computer Science and Engineering, HKUST Slide 9 Split in Extendable hash Structure To split a bucket j when inserting record with search-key value K j ; If i > i j (more than one pointer to bucket j) –allocate a new bucket z, and set i j and i z to the old i j +1. –make the second half of the bucket address table entries pointing to j to point to z –remove and reinsert each record in bucket j. –recompute new bucket for K j and insert record in the bucket (further splitting is required if the bucket is still full) could be any number > 1

11 Department of Computer Science and Engineering, HKUST Slide 10 Split in Extendable hash Structure To split a bucket j when inserting record with search-key value K j ; If i = i j (only one pointer to bucket j) –increment i and double the size of the bucket address table. –Replace each entry in the table by two entries that point to the same bucket. –Re-compute new bucket address table entry for K j, now i > i j, so use the first case above. i0=2i0=2 i1=2i1=2 i2=2i2=2 i3=2i3=2 new record

12 Department of Computer Science and Engineering, HKUST Slide 11 Example: Use of Extendable Hash Structure Branch-nameh(branch-name) Brighton Downtown Mianus Perryridge Redwood Round hill Initial Hash structure, Bucket size=2 Bucket 0 hash address table 0

13 Department of Computer Science and Engineering, HKUST Slide 12 Example Insert: no bit is needed from the hash value (i=0) Brighton, A-217, 750 0

14 Department of Computer Science and Engineering, HKUST Slide 13 Example Insert: no bit is needed from the hash value (i=0) Downtown, A-101, 500 Brighton, A-217, Downtown, A-101, 500 Insert: bucket full, split records according to first bit (i=1) Downtown, A-101, 600

15 Department of Computer Science and Engineering, HKUST Slide Example Brighton Downtown Brighton, A-217, 750 Downtown, A-101, 500 Downtown, A-101, 600 Insert: Hash into bucket 1, which is full Mianus, A-215, 700

16 Department of Computer Science and Engineering, HKUST Slide Brighton, A-217, Example Downtown, A-101, 500 Downtown, A-101, 600 Mianus, A-215, 700 Note: directory size is doubled 1 bit had been used to allocate records 2 bits had been used to allocate records Mianus Downtown

17 Department of Computer Science and Engineering, HKUST Slide Brighton, A-217, Example Downtown, A-101, 500 Downtown, A-101, 600 Mianus, A-215, 700 Insert: Perryridge, A-102, 400 Insert: Perryridge, A-201, 900 Bucket 2 overflows again!

18 Department of Computer Science and Engineering, HKUST Slide Brighton, A-217, Downtown, A-101, 500 Downtown, A-101, 600 Mianus, A-215, 700 Perryridge, A-102, 400 Example Expand hash address table, renumber the entries by adding one more bit to the left Bucket 10 becomes bucket 10x, where x could be 0 or 1 Bucket 11 becomes bucket 110 and 111, because it is split; new bucket is added; local level is updated Perryridge, A-201, 900 Redistribute overflow and new record

19 Department of Computer Science and Engineering, HKUST Slide 18 Example Insert: Perryridge, A-218, 700 Bucket 3 overflows again! 1 2 Brighton, A-217, Downtown, A-101, 500 Downtown, A-101, Mianus, A-215, 700 Perryridge, A-102, 400 Perryridge, A-201, 900 Perryridge, A-218, 700

20 Department of Computer Science and Engineering, HKUST Slide 19 Example 1 2 Brighton, A-217, Downtown, A-101, 500 Downtown, A-101, Mianus, A-215, 700 Perryridge, A-102, 400 Perryridge, A-201, 900 Perryridge, A-218, 700 Round Hill, A-305, 350 Redwood, A-222, 700 Done!

21 Department of Computer Science and Engineering, HKUST Slide 20 Updates in Extendable Hash Structure When inserting a value, if the bucket is full after several splits (that is, i reaches some limits b) create an overflow bucket instead of splitting bucket entry table further. To delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it because empty (with appropriate updates to the bucket address table). Coalescing of buckets and decreasing bucket address table size is also possible.

22 Department of Computer Science and Engineering, HKUST Slide 21 Extendible Hashing is not Pure Hashing Pure hashing maps a key value directly to the bucket where the record containing the key value can be found. Extendible hashing maps a key value to the entry in the hash prefix table which contains a pointer to the bucket where the record containing the key value can be found. The hash-prefix table can be considered as a complete binary tree; that is, extendible hashing is a combination of tree and hashing!


Download ppt "Department of Computer Science and Engineering, HKUST Slide 1 Dynamic Hashing Good for database that grows and shrinks in size Allows the hash function."

Similar presentations


Ads by Google