Lecture : 7 Collision Resolution Techniques dsauet.weebly.com

Lecture : 7 Collision Resolution Techniques dsauet.weebly.com
Azeem Iqbal University of Engineering and Technology, Lahore (Faisalabad Campus)

What is a Collision Hash functions are used to map each key to a different address space, but practically it is not possible to create such a hash function and the problem is called collision. Collision is the condition where two records are stored in the same location.

Collision Resolution Techniques
The process of finding an alternate location is called collision resolution. Even though hash tables have collision problems, they are more efficient in many cases compared to all other data structures, like search trees. There are a number of collision resolution techniques, and the most popular are direct chaining and open addressing.

Collision Resolution Techniques
Direct Chaining: An array of linked list application Separate chaining Open Addressing: Array-based implementation Linear probing (linear search) Quadratic probing (nonlinear search) Double hashing (use two hash functions)

Separate Chaining Collision resolution by chaining combines linked representation with hash table. When two or more records hash to the same location, these records are constituted into a singly-linked list called a chain. The idea is to make each cell of hash table point to a linked list of records that have same hash function value.

Separate Chaining Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. 1 2 3 4 5 6 1 2 3 4 5 6 50 1 2 3 4 5 6 700 50 76 Initial Empty Hash Table Insert 50 Insert 700 & 76

Separate Chaining 50, 700, 76, 85, 92, 73, 101. Insert 92:
Collision occurs, add to chain 1 2 3 4 5 6 700 50 76 1 2 3 4 5 6 700 50 76 85 85 92 Insert 85: Collision occurs, add to chain

Separate Chaining 50, 700, 76, 85, 92, 73, 101. 1 2 3 4 5 6 700 50 73 76 1 2 3 4 5 6 700 50 73 76 85 92 85 92 101 Insert 101 Insert 73

Separate Chaining Advantages: 1) Simple to implement. 2) Hash table never fills up, we can always add more elements to chain. Disadvantages: 1) Wastage of Space (Some Parts of hash table are never used) 2) If the chain becomes long, then search time can become O(n) in worst case. 3) Uses extra space for links.

Open Addressing In open addressing all keys are stored in the hash table itself. This approach is also known as closed hashing. This procedure is based on probing. A collision is resolved by probing.

Open Addressing - Linear Probing
The interval between probes is fixed at 1. In linear probing, we search the hash table sequentially, starting from the original hash location. If a location is occupied, we check the next location. We wrap around from the last table location to the first table location if necessary. The function for rehashing is the following: 𝑟𝑒ℎ𝑎𝑠ℎ(𝑘𝑒𝑦) = (𝑛 + 1)% 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒

Linear Probing 𝑟𝑒ℎ𝑎𝑠ℎ(𝑘𝑒𝑦) = (𝑛 + 1)% 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒
If slot hash(x) % tablesize is full, then we try (hash(x) + 1) % tablesize If (hash(x) + 1) % tablesize is also full, then we try (hash(x) + 2) % tablesize If (hash(x) + 2) % tablesize is also full, then we try (hash(x) + 3) % tablesize And so on until you find the empty slot.

Linear Probing Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. 1 2 3 4 5 6 1 2 3 4 5 6 50 1 2 3 4 5 6 700 50 76 Initial Empty Hash Table Insert 50 Insert 700 & 76

Linear Probing 50, 700, 76, 85, 92, 73, 101. 1 2 3 4 5 6 700 50 76 Insert 85: Collision occurs, insert 85 at the next free slot 85

Linear Probing 50, 700, 76, 85, 92, 73, 101. Insert 92:
Collision occurs as 50 is there at index 1. Insert at next free slot 1 2 3 4 5 6 700 50 76 85 92

Linear Probing 50, 700, 76, 85, 92, 73, 101. 1 2 3 4 5 6 700 50 76 Insert 73 and 101: 85 92 73 101

Linear Probing One of the problems with linear probing is that table items tend to cluster together in the hash table. This means that the table contains groups of consecutively occupied locations that are called primary clustering. Clusters can get close to one another, and merge into a larger cluster. Thus, the one part of the table might be quite dense, even though another part has relatively few items. Clustering causes long probe searches and therefore decreases the overall efficiency.

Linear Probing Insertion The insertion algorithm is as follows:
Use hash function to find index for a record If that spot is already in use, we use next available spot in a "higher" index. Treat the hash table as if it is round, if you hit the end of the hash table, go back to the front 20 April A Each contiguous group of records (groups of record in adjacent indices without any empty spots) in the table is called a cluster.

Linear Probing Searching The search algorithm is as follows:
Use hash function to find index of where an item should be. If it isn't there search records after that hash location (remember to treat table as circular) until either it found, or until an empty record is found. If there is an empty spot in the table before record is found, it means that the record is not there. NOTE: It is important not to search the whole array till you get back to the starting index. As soon as you see an empty spot, your search needs to stop.

Linear Probing Removal
The removal algorithm is a bit trickier because after an object is removed, records in same cluster with a higher index than the removed object has to be adjusted. Otherwise the empty spot left by the removal will cause valid searches to fail. The algorithm is as follows: Find record and remove it making the spot empty For all records that follow it in the cluster, do the following: Determine the hash index of the record Determine if empty spot is between current location of record and the hash index. Move record to empty spot if it is, the record's location is now the empty spot.

Linear Probing  Removal (More Efficient Algorithm) 1 2 3 4 5 6 B A D
1 2 3 4 5 6 B A D R Delete(D): 1 2 3 4 5 6 B A D R H(x) D 4 

Linear Probing Unfortunately, this has a negative side-effect on the way the search operation works. Since the data retrieval operation relies on blank hash elements as the signal to stop probing, there is the possibility that a deletion operation will make some data items unfindable. Consider where a search for 'R' (which has the same hash code as 'A') is attempted, after 'D' has been deleted: The data 'R' will never be found, as the probing had terminated too early; this is due to the hash element that stored 'D' (and kept the probing going) being deleted. 1 2 3 4 5 6 B A D R 

Linear Probing The solution to this problem is to define two different kinds of blank hash elements: 1 2 3 4 5 6 B A D R Purely empty element, which has never stored data; and Empty but deleted element, which stored data that has since been deleted. 

Linear Probing These can be used to differentiate the situations in how a clear hash element came to exist; something that will be necessary to make the hash search work again. When a data item is deleted, it is not completely cleared, but instead has the "empty but deleted" mark. The search function must then be modified so that it will terminate probing only on a purely empty element, and continue probing if an "empty but deleted" element is encountered. 1 2 3 4 5 6 B A empty but deleted R

Linear Probing 1 2 3 4 5 6 B A empty but deleted R An add operation can store data in the "empty but deleted" element. As the deleted flag is only necessary to continue searching, adding data to one of these elements makes it work like just another normal element again (as far as the probing algorithm is concerned.)

Open Addressing - Quadratic Probing
The problem of Clustering can be eliminated if we use the quadratic probing method. In quadratic probing, we start from the original hash location 𝒊. If a location is occupied, we check the locations 𝑖 , 𝑖 , 𝑖 , 𝑖 … We wrap around from the last table location to the first table location if necessary.

Quadratic Probing – Example 
Insert 76 76%7=6 Insert 40 40%7=5 Insert 48 48%7=6 Insert 5 5%7=5 Insert 55 55%7=6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 48 1 2 3 4 5 6 48 1 2 3 4 5 6 48 5 5 55 40 40 40 40 76 76 76 76 76 1 Probes 1 2 3 3

Quadratic Probing – Example 
Insert 76 76%7=6 Insert 93 93%7=2 Insert 40 40%7=5 Insert 35 35%7=0 Insert 47 47%7=5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 35 1 2 3 4 5 6 35 93 93 93 93 40 40 40 76 76 76 76 76 Probes 1 1 1 3 ∞

Quadratic Probing With linear probing we know that we will always find an open spot if one exists (It might be a long search but we will find it). However, this is not the case with quadratic probing unless you take care in the choosing of the table size. In order to guarantee that your quadratic probes will hit every single available spots eventually, your table size must meet these requirements: Be a prime number never be more than half full (even by one element)

Quadratic Probing Limitation: at most half of the table can be used as alternative locations to resolve collisions. This means that once the table is more than half full, it's difficult to find an empty spot. This new problem is known as secondary clustering because elements that hash to the same hash key will always probe the same alternative cells.

Double Hashing Double Hashing is works on a similar idea to linear and quadratic probing. Use a big table and hash into it. Whenever a collision occurs, choose another spot in table to put the value. The difference here is that instead of choosing next opening, a second hash function is used to determine the location of the next spot.

Double Hashing For example, given hash function H1 and H2 and key. do the following: Check location hash1(key). If it is empty, put record in it. If it is not empty calculate hash2(key). Check if hash1(key)+hash2(key) is open, if it is, put it in Repeat with hash1(key)+2hash2(key), hash1(key)+3hash2(key) and so on, until an opening is found. Note: You must take care in choosing hash2. hash2 CANNOT ever return 0. hash2 must be done so that all cells will be probed eventually.

Double Hashing -Example
One good choice is to choose a Prime No R < Size and: Hash2(x) = R – (x mod R) Insert 47 47%7=5 Insert 76 76%7=6 Insert 93 93%7=2 Insert 40 40%7=5 5 - (47%5)=3 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 47 93 93 93 40 40 76 76 76 76 Probes 1 1 1 2

Double Hashing -Example
Insert 10 10%7=3 Insert 55 55%7=6 5 - (55%5)=5 1 2 3 4 5 6 1 2 3 4 5 6 47 47 93 93 10 10 55 40 40 76 76 1 2 Probes

Lecture : 7 Collision Resolution Techniques dsauet.weebly.com

Similar presentations

Presentation on theme: "Lecture : 7 Collision Resolution Techniques dsauet.weebly.com"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture : 7 Collision Resolution Techniques dsauet.weebly.com

Similar presentations

Presentation on theme: "Lecture : 7 Collision Resolution Techniques dsauet.weebly.com"— Presentation transcript:

Similar presentations

About project

Feedback