Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.

Similar presentations


Presentation on theme: "DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals."— Presentation transcript:

1 DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

2 DBMS 2001Notes 4.2: Hashing2 Hashing? Locating the storage block of a record by the hash value h(k) of its key k Normally really fast –records (often) located by a single disk access

3 DBMS 2001Notes 4.2: Hashing3 key  h(key) Hashing Buckets (typically 1 disk block)

4 DBMS 2001Notes 4.2: Hashing Two alternatives records key  h(key) (1) Hash value determines the storage block directly to implement a primary index

5 DBMS 2001Notes 4.2: Hashing5 key  h(key) Index record key 1 Two alternatives for a secondary index (2) Records located indirectly via index buckets

6 DBMS 2001Notes 4.2: Hashing6 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h = (x 1 + x 2 + … + x n ) mod b  {0, 1, …, b-1}

7 DBMS 2001Notes 4.2: Hashing7  This may not be best function … Good hash Expected number of function: keys/bucket is the same for all buckets  Read Knuth Vol. 3 if you really need to select a good function.

8 DBMS 2001Notes 4.2: Hashing8 Next: example to illustrate inserts, overflows, deletes h(K)

9 DBMS 2001Notes 4.2: Hashing9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = d a c b h(e) = 1 e

10 DBMS 2001Notes 4.2: Hashing a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

11 DBMS 2001Notes 4.2: Hashing11 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

12 DBMS 2001Notes 4.2: Hashing12 How do we cope with growth? Overflows and reorganizations Dynamic hashing: # of buckets may vary Extensible Linear also others...

13 DBMS 2001Notes 4.2: Hashing13 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i  grows over time… For example, b=32

14 DBMS 2001Notes 4.2: Hashing14 (b) Use directory h(K)[i ] to bucket Directory contains 2 i pointers to buckets, and stores i. Each bucket stores j, indicating #bits used for placing the records in this block (j  i)

15 DBMS 2001Notes 4.2: Hashing15 Extensible Hashing: Insertion If there's room in bucket h(k)[i], place record there; Otherwise … If j=i, set i=i+1 and double the directory If j

16 DBMS 2001Notes 4.2: Hashing16 Example: h(k) is 4 bits; 2 keys/block i = Insert New directory i = 2 2 (j)

17 DBMS 2001Notes 4.2: Hashing Insert: i = Example continued

18 DBMS 2001Notes 4.2: Hashing i = Insert: 1001 Example continued i = 3 3

19 DBMS 2001Notes 4.2: Hashing19 Extensible hashing: deletion Reverse insert procedure … Example: Walk thru insert example in reverse!

20 DBMS 2001Notes 4.2: Hashing20 Extensible hashing Can handle growing files - without full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (First it fits in memory, then it does not  sudden performance degradation) - - Only one data block examined +

21 DBMS 2001Notes 4.2: Hashing21 Linear hashing: grow # of buckets by one  No bucket directory needed Two ideas: (a) Use i low order bits of hash grows b i (b) File grows linearly

22 DBMS 2001Notes 4.2: Hashing22 Linear Hashing: Parameters n: number of buckets in use –buckets numbered 0…n-1 i: number of bits of h(k) used to address buckets r: number of records in hash table –ratio r/n limited to fit an avg bucket in a block –next example: r  1.7n, and block holds 2 records => AVG bucket occupancy is  1.7/2 = 0.85 of a block

23 DBMS 2001Notes 4.2: Hashing23 Example: 2 keys/block, b=4 bits, n=2, i = If h(k)[i ] = (a 1 … a i ) 2 < n, then look at bucket h(k)[i ]; else look at bucket h(k)[i ] - 2 i -1 = (0a 2 … a i ) 2 Rule insert now r=4 >1.7n  get new bucket 10 and distribute keys btw buckets 00 and 10

24 DBMS 2001Notes 4.2: Hashing24  n=3, i =2; distribute keys btw buckets 00 and 10:

25 DBMS 2001Notes 4.2: Hashing25 n=3, i =2; insert 0001: can have overflow chains!

26 DBMS 2001Notes 4.2: Hashing26 n=3, i = insert 0111 bucket 11 not in use  redirect to now r=6 > 1.7n -> get new bucket 11

27 DBMS 2001Notes 4.2: Hashing27 n=4, i =2; distribute keys btw 01 and

28 DBMS 2001Notes 4.2: Hashing28 Example Continued: How to grow beyond this? m = 11 (max used block) i =

29 DBMS 2001Notes 4.2: Hashing29 Linear Hashing Can handle growing files - without full reorganizations No indirection directory of extensible hashing Can have overflow chains - but probability of long chains can be kept low by controlling the r/n fill ratio (?) Summary + + -

30 DBMS 2001Notes 4.2: Hashing30 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary

31 DBMS 2001Notes 4.2: Hashing31 Next: Indexing vs Hashing Index definition in SQL

32 DBMS 2001Notes 4.2: Hashing32 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

33 DBMS 2001Notes 4.2: Hashing33 INDEXING (Including B-Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

34 DBMS 2001Notes 4.2: Hashing34 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name

35 DBMS 2001Notes 4.2: Hashing35 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS ( e.g. Load Factor, Size of Hash,...)... at least in SQL … Oracle and IBM DB2 UDB provide a PCTFREE clause to inditate the proportion of B-tree blocks initially left unfilled Oracle: “Hash clusters” with built-in or DBA- specified hash function Note

36 DBMS 2001Notes 4.2: Hashing36 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4: Access Mechanisms - Indexes - B trees - Hashing Chapters 6 & 7: Query Processing NEXT


Download ppt "DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals."

Similar presentations


Ads by Google