Download presentation

Presentation is loading. Please wait.

Published byArturo Grays Modified over 2 years ago

1
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)

2
DBMS 2001Notes 4.2: Hashing2 Hashing? Locating the storage block of a record by the hash value h(k) of its key k Normally really fast –records (often) located by a single disk access

3
DBMS 2001Notes 4.2: Hashing3 key h(key) Hashing...... Buckets (typically 1 disk block)

4
DBMS 2001Notes 4.2: Hashing4...... Two alternatives records...... key h(key) (1) Hash value determines the storage block directly to implement a primary index

5
DBMS 2001Notes 4.2: Hashing5 key h(key) Index record key 1 Two alternatives for a secondary index (2) Records located indirectly via index buckets

6
DBMS 2001Notes 4.2: Hashing6 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h = (x 1 + x 2 + … + x n ) mod b {0, 1, …, b-1}

7
DBMS 2001Notes 4.2: Hashing7 This may not be best function … Good hash Expected number of function: keys/bucket is the same for all buckets Read Knuth Vol. 3 if you really need to select a good function.

8
DBMS 2001Notes 4.2: Hashing8 Next: example to illustrate inserts, overflows, deletes h(K)

9
DBMS 2001Notes 4.2: Hashing9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e

10
DBMS 2001Notes 4.2: Hashing10 01230123 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

11
DBMS 2001Notes 4.2: Hashing11 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

12
DBMS 2001Notes 4.2: Hashing12 How do we cope with growth? Overflows and reorganizations Dynamic hashing: # of buckets may vary Extensible Linear also others...

13
DBMS 2001Notes 4.2: Hashing13 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K) use i grows over time…. 00110101 For example, b=32

14
DBMS 2001Notes 4.2: Hashing14 (b) Use directory h(K)[i ] to bucket............ Directory contains 2 i pointers to buckets, and stores i. Each bucket stores j, indicating #bits used for placing the records in this block (j i)

15
DBMS 2001Notes 4.2: Hashing15 Extensible Hashing: Insertion If there's room in bucket h(k)[i], place record there; Otherwise … If j=i, set i=i+1 and double the directory If j*
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/13/3952374/slides/slide_15.jpg",
"name": "DBMS 2001Notes 4.2: Hashing15 Extensible Hashing: Insertion If there s room in bucket h(k)[i], place record there; Otherwise … If j=i, set i=i+1 and double the directory If j
*

16
DBMS 2001Notes 4.2: Hashing16 Example: h(k) is 4 bits; 2 keys/block i = 1 1 1 0001 1001 1100 Insert 1010 1 110010 New directory 2 00 01 10 11 i = 2 2 (j)

17
DBMS 2001Notes 4.2: Hashing17 1 0001 2 1001 10 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 00 0001 2 2

18
DBMS 2001Notes 4.2: Hashing18 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3

19
DBMS 2001Notes 4.2: Hashing19 Extensible hashing: deletion Reverse insert procedure … Example: Walk thru insert example in reverse!

20
DBMS 2001Notes 4.2: Hashing20 Extensible hashing Can handle growing files - without full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (First it fits in memory, then it does not sudden performance degradation) - - Only one data block examined +

21
DBMS 2001Notes 4.2: Hashing21 Linear hashing: grow # of buckets by one No bucket directory needed Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly

22
DBMS 2001Notes 4.2: Hashing22 Linear Hashing: Parameters n: number of buckets in use –buckets numbered 0…n-1 i: number of bits of h(k) used to address buckets r: number of records in hash table –ratio r/n limited to fit an avg bucket in a block –next example: r 1.7n, and block holds 2 records => AVG bucket occupancy is 1.7/2 = 0.85 of a block

23
DBMS 2001Notes 4.2: Hashing23 Example: 2 keys/block, b=4 bits, n=2, i =1 00 01 1111 0000 1010 If h(k)[i ] = (a 1 … a i ) 2 < n, then look at bucket h(k)[i ]; else look at bucket h(k)[i ] - 2 i -1 = (0a 2 … a i ) 2 Rule insert 0101 0101 now r=4 >1.7n get new bucket 10 and distribute keys btw buckets 00 and 10

24
DBMS 2001Notes 4.2: Hashing24 n=3, i =2; distribute keys btw buckets 00 and 10: 00 01 10 0101 1111 00 1010

25
DBMS 2001Notes 4.2: Hashing25 n=3, i =2; insert 0001: 00 01 10 0101 1111 00 0001 10 can have overflow chains!

26
DBMS 2001Notes 4.2: Hashing26 n=3, i =2 00 01 10 0101 1111 00 10 0001 insert 0111 bucket 11 not in use redirect to 01 0111 now r=6 > 1.7n -> get new bucket 11

27
DBMS 2001Notes 4.2: Hashing27 n=4, i =2; distribute keys btw 01 and 11 00 01 1011 0101 1111 00 10 0001 011111 0001

28
DBMS 2001Notes 4.2: Hashing28 Example Continued: How to grow beyond this? 00 01 1011 111110100101 0000 m = 11 (max used block) i = 2 0000 101 110 111 3... 100 101 0101

29
DBMS 2001Notes 4.2: Hashing29 Linear Hashing Can handle growing files - without full reorganizations No indirection directory of extensible hashing Can have overflow chains - but probability of long chains can be kept low by controlling the r/n fill ratio (?) Summary + + -

30
DBMS 2001Notes 4.2: Hashing30 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary

31
DBMS 2001Notes 4.2: Hashing31 Next: Indexing vs Hashing Index definition in SQL

32
DBMS 2001Notes 4.2: Hashing32 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

33
DBMS 2001Notes 4.2: Hashing33 INDEXING (Including B-Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

34
DBMS 2001Notes 4.2: Hashing34 Index definition in SQL Create index name on rel (attr) Create unique index name on rel (attr) defines candidate key Drop INDEX name

35
DBMS 2001Notes 4.2: Hashing35 CANNOT SPECIFY TYPE OF INDEX (e.g. B-tree, Hashing, …) OR PARAMETERS ( e.g. Load Factor, Size of Hash,...)... at least in SQL … Oracle and IBM DB2 UDB provide a PCTFREE clause to inditate the proportion of B-tree blocks initially left unfilled Oracle: “Hash clusters” with built-in or DBA- specified hash function Note

36
DBMS 2001Notes 4.2: Hashing36 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4: Access Mechanisms - Indexes - B trees - Hashing Chapters 6 & 7: Query Processing NEXT

Similar presentations

OK

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

CS 245Notes 51 CS 245: Database System Principles Hector Garcia-Molina Notes 5: Hashing and More.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on eye os online Ppt on led tv technology Ppt on porter's five forces examples Ppt on power generation using footsteps in the dark Ppt on next generation 2-stroke engine stops Download ppt on teamviewer 6 Ppt on writing book reviews Ppt on rag picking Download ppt on statistics for class 10th Ppt on sanskritization and westernization