Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.

Hashing

Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted

General Idea Array of Fixed Size (TableSize) Search is performed on some part of the item (Key) Each key is mapped into some number between 0 and TableSize-1 Mapping is called a hash function

Hash Functions Easy to compute –Key is of type Integer –Key is of type String

Hash Function (Integer) Simply return Key % TableSize Choose carefully TableSize –TableSize is 10, all keys end in zero??? To avoid such pitfalls, choose TableSize a prime number

Hash Function I (String) Adds up ASCII values of characters in the string Advantage: Simple to implement and computes quickly Disadvantage: If TableSize large, function does not distribute keys well –Example: Keys are at most 8 characters. Maximum sum (8*256 = 2048), but TableSize 10007. Only 25 percent could be filled.

Hash Function II (String) Assumption: Key has at least 3 characters Hash Function: (26 characters + blank) key[0] + 27 * key[1] + 729 * key[2] Advantage: Distributes well than Hash Function I Disadvantage: Altough there as 26 3 = 17,576 possible combinations, English has only 2,851 different combinations

Hash Function III (String) Idea: Computes a polynomial function of Key’s characters P(Key with n+1 characters) = Key[0]+37Key[1]+37 2 Key[2]+...+37 n Key[n] If find 37 k then sum up complexity O(n 2 ) Using Horner’s rule complexity drops to O(n) ((Key[n]*37+Key[n-1])*37+...+Key[1])*37+Key[0]

public static int hash( String key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key.charAt( i ); hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } Hash Function III (String)

Collision When an element is inserted, it hashes to the same value as an already inserted element we have collision. Example: Hash Function (Key % 10)

Solving Collision Separate Chaining Open Addressing –Linear Probing –Quadratic Probing –Double Hashing

Separate Hashing Keep a list of all elements that hash to the same value Each element of the hash table is a Link List Example: x 2 % 10

Separate Hashing /** * Construct the hash table. */ public SeparateChainingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size approximate table size. */ public SeparateChainingHashTable( int size ) { theLists = new LinkedList[ nextPrime( size ) ]; for( int i = 0; i < theLists.length; i++ ) theLists[ i ] = new LinkedList( ); }

Separate Hashing Find –Use hash function to determine which list to traverse –Traverse the list to find the element public Hashable find( Hashable x ) { return (Hashable)theLists[ x.hash( theLists.length ) ].find( x ).retrieve( ); }

Separate Hashing Insert –Use hash function to determine in which list to insert –Insert element in the header of the list public void insert( Hashable x ) { LinkedList whichList = theLists[x.hash(theLists.length) ]; LinkedListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); }

Separate Hashing Delete –Use hash function to determine from which list to delete –Search element in the list and delete public void remove( Hashable x ) { theLists[ x.hash( theLists.length ) ].remove( x ); }

Separate Hashing Advantages –Solves the collision problem totally –Elements can be inserted anywhere Disadvantages –All lists must be short to get O(c) time complexity

Separate Hashing Alternatives to Link Lists –Binary Trees –Hash Tables

Open Addressing Solving collisions without using any other data structure such as link list If collision occurs, alternative cells are tried until an empty cell is found Cells h 0 (x), h 1 (x),..., are tried in succession h i (x)=(hash(x) + f(i)) % TableSize

Open Addressing Linear Probing –f(i) = i Quadratic Probing –f(i) = i 2 Double Hashing –f(i) = i hash 2 (x)

Linear Probing Advantages –Easy to compute Disadvantages –Table must be big enough to get a free cell –Time to get a free cell may be quite large –Primary Clustering Any key that hashes into the cluster will require several attempts to resolve the collision

Linear Probing

Quadratic Probing Eliminates Primary Clustering problem Theorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty Secondary Clustering –Elements that hash to the same position will probe the same alternative cells

Quadratic Probing

/** * Construct the hash table. */ public QuadraticProbingHashTable( ) { this( DEFAULT_TABLE_SIZE ); } /** * Construct the hash table. * @param size the approximate initial size. */ public QuadraticProbingHashTable( int size ) { allocateArray( size ); makeEmpty( ); }

Quadratic Probing /** * Method that performs quadratic probing resolution. * @param x the item to search for. * @return the position where the search terminates. */ private int findPos( Hashable x ) { /* 1*/ int collisionNum = 0; /* 2*/ int currentPos = x.hash( array.length ); /* 3*/ while( array[ currentPos ] != null && !array[ currentPos ].element.equals( x ) ) { /* 4*/ currentPos += 2 * ++collisionNum - 1; /* 5*/ if( currentPos >= array.length ) /* 6*/ currentPos -= array.length; } /* 7*/ return currentPos; }

Double Hashing Poor choice of hash 2 (x) could be disastrous hash 2 (x) = R – (x % R) R a prime smaller than TableSize If double hashing is correctly implemented, simulations imply that the expected number of probes is almost the same as for a random collision resolution strategy

Double Hashing

Rehashing If Hash Table gets too full Running time for the operations will start taking too long time Insertions might fail for open addressing with quadratic probing Solution: Rehashing

Rehashing Build another table that is about twice as big Associate a new hash function Scan down the entire original hash table Compute the new hash value for each element Insert it in the new table

Rehashing Expensive operation O(N) Not bad, occurs very infrequently If data structure is part of the program, effect is not noticable

Rehashing When to apply rehashing? –As soon as the table is half full –Only when an insertion fails –When the table reaches a certain load factor

Rehashing

private void allocateArray( int arraySize ) { array = new HashEntry[ arraySize ]; } private void rehash( ) { HashEntry [ ] oldArray = array; // Create a new double-sized, empty table allocateArray( nextPrime( 2 * oldArray.length ) ); currentSize = 0; // Copy table over for( int i = 0; i < oldArray.length; i++ ) if( oldArray[ i ] != null && oldArray[ i ].isActive) insert( oldArray[ i ].element ); return; }

Extendible Hashing (Why?) Amount of data is too large to fit in main memory Main consideration is the number of disk accesses required to get data

Extendible Hashing (Why?) Open addressing or separate chaining is used, collisions could cause several blocks to be examined When the table gets too full, rehashing step requires O(N) disk accesses

Extendible Hashing Use of idea in B-Trees Choose of M so large that B-Tree has a depth of 1 Problem: Branching factor is too high, requires to much time to determine which leaf the data was in Time to perform this step is reduced

Extendible Hashing

Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.

Similar presentations

Presentation on theme: "Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.

Similar presentations

Presentation on theme: "Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted."— Presentation transcript:

Similar presentations

About project

Feedback