Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)

Similar presentations


Presentation on theme: "Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)"— Presentation transcript:

1 Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN) time Array List take O(1) time Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

2 Array 99999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490
ID: Name: Somsri Faculty: Science ID: Name: Sompong Faculty: Engineering Problem: a lot of empty space Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

3 Hashing 9999 ID: 4112041 Name: Somsri Faculty: Science ID: 4163490
9999 ID: Name: Somsri Faculty: Science ID: Name: Sompong Faculty: Engineering Map the key into some number between 0 to ArraySize-1 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

4 Hashing Map the key into an array position using a “hash function”
ArrayIndex = hash(key) Take O(1) time to access an item Much less empty space than using normal array Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

5 Hash Function Must return a valid array index.
Should be 1-to-1 mapping. If key1 != key2 then hash(key1) != hash(key2) A collision occurs when two distinct keys hash to the same location in the array Should distribute the keys evenly Any key value k is equally likely to hash to any of the m array locations. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

6 Simple Hash Function ArrayIndex = key mod TableSize Example:
TableSize should be a prime number for even distribution Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

7 Another Hash Function ArrayIndex = (k0 + 37k k ) mod TableSize Example: 3-character key ArrayIndex = (k0 + 37k k2) mod TableSize ArrayIndex = k * (k * (k2)) mod TableSize Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

8 Hash Function } public static int hash( String key, int tableSize )
int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if ( hashVal < 0 ) // overflow hashVal += tableSize; return hashVal; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

9 Collision When an element is inserted, if it hashes to the same value as an already inserted element, then we have a collision. Collision resolving techniques Separate Chaining Open Addressing Linear Probling, Quadratic Probling, Double Hashing Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

10 Separate Chaining 999 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

11 Separate Chaining Load factor l = number of elements / table size
average length of list = l successful search cost 1 + (l/2) link traversals cost depends on l Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

12 Separate Chaining: evenly distributed
999 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

13 Separate Chaining: last digit is zero
10 20 999 Solution: TableSize is prime Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

14 Open Addressing No linked-list. All items are in the array
If a collision occurs, alternative locations are tried until an empty cell is found try h0(x), h1(x), h2(x), … hi(x) = (hash(x) + f(i)) mod TableSize f(i) is a collision resolution strategy Require bigger table, l should be below 0.5 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

15 Linear Probing If a collision occurs, try the next cell sequentially
f(i) = i hi(x) = (hash(x) + i) mod TableSize Try hash(x) mod TableSize, (hash(x) + 1) mod TableSize, (hash(x) + 2) mod TableSize, (hash(x) + 3) mod TableSize, . . . Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

16 Linear Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58 69
1 2 3 4 5 6 7 8 9 49 58 69 18 89 89 is directly inserted into cell 9 18 is directly inserted into cell 8 49 has a collision at cell 9 and finally put into cell 0 58 has collisions at cell 8, 9, 0 and finally put into cell 1 69 has a collisions at cell 9, 0, 1 and finally put into cell 2 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

17 Primary Clustering Forming of blocks of occupied cells (called clusters) A collision occurs if a key is hashed into anywhere in a cluster. Then there may be several attempts to resolve the collision before a free space is found. The new data is added into the cluster. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

18 Linear Probing Problem: Primary Clustering
Normal deletion cannot be performed (some following find operations will fail because the link of collisions that leads to the data is cut) Use lazy deletion Insertion cost = number of probes to find an empty cell = 1/(fraction of empty cells) = 1/(1- l) Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

19 Quadratic Probing Eliminate primary clustering f(i) = i2
hi(x) = (hash(x) + i2) mod TableSize Try hash(x) mod TableSize, hash(x)+12 mod TableSize, hash(x)+22 mod TableSize, hash(x)+32 mod TableSize, Table must be at most half full and table size must be prime, otherwise insertion may fail (always have a collision) Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

20 Quadratic Probing Insert: 89, 18, 49, 58, 69 1 2 3 4 5 6 7 8 9 49 58
1 2 3 4 5 6 7 8 9 49 58 18 89 69 Insert 89, try cell 9 Insert 18, try cell 8 Insert 49, try cell 9, 0 Insert 58, try cell 8, 9, 2 Insert 69, try cell 9, 0, 3 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

21 Quadratic Probing Insert: 10, 20, 30, 40, 50, 60, 70 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10 40 Insert 10, try cell 0 Insert 20, try cell 0, 1 Insert 30, try cell 0, 1, 4 Insert 40, try cell 0, 1, 4, 9 Insert 50, try cell 0, 1, 4, 9, 6 (16) Insert 60, try cell 0, 1, 4, 9, 6 (16), 5 (25) Insert 70, try cell 0, 1, 4, 9, 6 (16), 5 (25), 6 (36), 9 (49), 4 (64), 1 (81), 0 (100), 1 (121), 4 (144), 9 (169), 6 (196), . . . 20 30 60 50 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

22 Quadratic Probing Secondary clustering
elements that hash to the same position will probe the same alternative cells and put into the next available space, forming a cluster. In the first example, inserting 89, 49, 69 forms a secondary cluster. Inserting 18, 58 forms another secondary cluster. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

23 Double Hashing f(i) = i * hash2(x)
hi(x) = (hash(x) + i * hash2(x)) mod TableSize Try hash(x) mod TableSize, (hash(x) + hash2(x)) mod TableSize, (hash(x) + 2*hash2(x)) mod TableSize, . . . Example: hash2(x) = R - (x mod R) R is a prime number smaller than TableSize Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

24 Double Hashing Insert: 89, 18, 49, 58, 69, 23
hash2(49) = 7-(49 mod 7) = 7 hash2(58) = 7-(58 mod 7) = 5 hash2(69) = 7-(69 mod 7) = 1 hash2(23) = 7-(23 mod 7) = 5 Insert 49, try 9, (9+7) mod 10 = 6 Insert 58, try 8, (8+5) mod 10 = 3 Insert 69, try 9, (9+1) mod 10 = 0 Insert 23, try 3, (3 + 5) mod 10 = 8, (3 + 10) mod 10 = 3, (3+15) mod 10 = 8, . . . 1 2 3 4 5 6 7 8 9 69 18 89 58 49 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

25 Rehashing When the table is too full, create a new table at least twice as big (and size is prime), compute the new hash value of each element, insert it into the new table. Rehash when the table is half full, or when an insertion fails, or when a certain load factor is reached. Because of lazy deletion, deleted cells are also counted when the load factor is calculated. Rehashing time is O(N). But the cost is shared by preceding N/2 insertions. So, it adds constant cost to each insertion. Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

26 public interface Hashable { int hash( int tableSize ); }
public class MyInteger implements Comparable, Hashable { public int hash( int tableSize ) if ( value < 0 ) return -value % tableSize; else return value % tableSize; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

27 public static void main( String [ ] args )
{ SeparateChainingHashTable H = new SeparateChainingHashTable( ); final int NUMS = 4000; final int GAP = 37; for( int i = GAP; i != 0; i = ( i + GAP ) % NUMS ) H.insert( new MyInteger( i ) ); for( int i = 1; i < NUMS; i+= 2 ) H.remove( new MyInteger( i ) ); for( int i = 2; i < NUMS; i+=2 ) if( ((MyInteger)(H.find( new MyInteger( i ) ))). intValue( ) != i ) System.out.println( "Find fails " + i ); } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

28 public class SeparateChainingHashTable {
private LinkedList[ ] theLists; public SeparateChainingHashTable( ) public SeparateChainingHashTable( int size ) public void insert( Hashable x ) public void remove( Hashable x ) public void find( Hashable x ) public void makeEmpty( ) public static int hash( String key, int tableSize ) private static final int DEFAULT_TABLE_SIZE = 101 private static int nextPrime( int n ) private static boolean isPrime( int n ) } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

29 public class SeparateChainingHashTable
{ public SeparateChainingHashTable( ) this( DEFAULT_TABLE_SIZE ); } public SeparateChainingHashTable( int size ) theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); public void makeEmpty( ) theLists[ i ].makeEmpty( ); Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

30 public static int hash( String key, int tableSize ) { int hashVal = 0;
for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

31 public void insert( Hashable x )
{ LinkedList whichList = theLists[ x.hash( theLists.length ) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); } public void remove( Hashable x ) theLists[ x.hash( theLists.length ) ].remove( x ); public Hashable find( Hashable x ) return (Hashable)theLists[x.hash(theLists.length)]. find( x ).retrieve( ); Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

32 public class Employee implement Hashable {
public int hash( int tableSize ) { return SeparateChainingHashTable.hash( name, tableSize ); } public boolean equals( Object rhs ) { return name.equals( ((Employee)rhs).name ); } private String name; private double salary; private int seniority; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

33 public class QuadraticProbingHashTable {
public static final int DEFAULT_TABLE_SIZE = 11; protected HashEntry [ ] array; private int currentSize; public QuadraticProbingHashTable( ) public QuadraticProbingHashTable( int size ) public void makeEmpty( ) public Hashable find ( Hashable x) public void insert( Hashable x ) public void remove( Hashable x ) public static int hash( String key, int tableSize ) } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

34 class HashEntry { Hashable element; // the element
boolean isActive; // false is deleted public HashEntry( Hashable e ) this( e, true ); } public HashEntry( Hashable e, boolean i ) element = e; isActive = i; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

35 public class QuadraticProbingHashTable {
public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } public QuadraticProbingHashTable( int size ) allocateArray( size ); makeEmpty( ); } public void makeEmpty( ) currentSize = 0; for( int i = 0; i < array.length; i++ ) array[ i ] = null; private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

36 public Hashable find( Hashable x ) { int currentPos = findPos( x );
return isActive( currentPos ) ? array[ currentPos ].element : null; } private int findPos( Hashable x ) int collisionNum = 0; int currentPos = x.hash( array.length ); while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.length ) currentPos -= array.length; return currentPos; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

37 private boolean isActive( int currentPos )
{ return array[ currentPos ] != null && array[ currentPos ].isActive; } public void insert( Hashable x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) return; array[ currentPos ] = new HashEntry( x, true ); if( ++currentSize > array.length / 2 ) rehash( ); } public void remove( Hashable x ) if( isActive( currentPos ) ) array[ currentPos ].isActive = false; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

38 HashEntry [ ] oldArray = array;
private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive ) insert( oldArray[ i ].element ); return; } Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

39 private static int nextPrime( int n ) { if( n % 2 == 0 ) n++;
for( ; !isPrime( n ); n += 2 ) ; return n; } private static boolean isPrime( int n ) if( n == 2 || n == 3 ) return true; if( n == 1 || n % 2 == 0 ) return false; for( int i = 3; i * i <= n; i += 2 ) if( n % i == 0 ) return false; return true; Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

40 Summary insert and find take constant average time
load factor affects performance load factor of separate chaining hashing should be close to 1 load factor of open addressing hashing should not exceed 0.5 Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University

41 Summary Hashing is good when ordering information is not required
Applications: symbol table on-line spelling checker Intro. to Data Structures Chapter 5 Hashing Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University


Download ppt "Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)"

Similar presentations


Ads by Google