Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

Similar presentations


Presentation on theme: "Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)"— Presentation transcript:

1 Hashing by Rafael Jaffarove CS157b

2 Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)

3 Types of Organization  File organization  search-key points to the disk block with desired record  Index organization  search-key is stored together with a pointer in a hash table. Pointer points to a particular bucket where the record is stored

4 Types of Hashing  Static hashing  Fixed file size  Dynamic hashing  Extendable hashing

5 Problems with Static Hashing  Databases tend to grow over time  The number of buckets must be predefined  If number is too large then the space is wasted  If number is too small then we have too many collisions  Bucket overflow

6 Handling Bucket Overflow  Providing overflow buckets  If an initial bucket is full a new bucket is given. If the second bucket is full then a 3 rd bucket is given and so on.  Additional buckets are linked together in a linked list  Problems:  searches and insertions might take liner time  deletions are difficult to perform

7 Dynamic Hashing  Extendable hashing  buckets created as needed  Example of extendable hashing  Insert the following countries into database: England, France, China, Germany, Egypt, Australia  We will use hash function of sum of ASCII codes of all characters in a name  Assumption: bucket can’t hold more than 2 records

8 Extendable Hashing Example (contd.)

9

10

11

12

13 Extendable Hashing  Problem with dynamic hashing  additional level of indirection

14 Hash function  Importance of choosing the right hash function  Uniform function = even distribution of data  Table size is a prime number  There is no perfect hash function so collisions are possible

15 Handling Collisions  Linear probing  Quadratic probing  Double hashing  Chaining

16 Linear Probing  If a slot is used, take next available  If next is used, continue until an empty slot is found  If end of table is reached, wrap around from beginning.  Problems:  Clustering of data  How far to go if there are no empty slots?  Deletion: deleting key in the middle of a cluster

17 Quadratic probing  To avoid clustering take not the next slot but 1 2, 2 2, 3 2, 4 2, etc.  Problem:  Secondary clustering, since the same seek pattern is used in case of a collision

18 Double Hashing  In case of collision, apply second hash function.  Overall better performance than linear and quadratic probing

19 Chaining  Entries are linked lists  In case of a collision the entries are added to those linked lists.  Problem:  In case of frequent collisions on the same key, search for that key in linked list becomes linear. Alternative data structures are used to solve this problem (i.e. B + -trees).


Download ppt "Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)"

Similar presentations


Ads by Google