Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.

Similar presentations


Presentation on theme: "DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI."— Presentation transcript:

1 DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI

2 2 REVIEW We have investigated the following ADTs LISTS Array Linked List STACKS QUEUE TREES Binary Trees Binary Search Trees AVL Trees What about their running times ?

3 3 Running times of important operations insertiondeletionfind ArrayO(n) Linked listO(1)O(n) TreeO(log n) Can we decrease the running times more ?

4 4 ROAD MAP HASHING General Idea Hash Function Separate Chaining Open Adressing Rehashing

5 5 Hashing Hashing: implementation of hash tables hash table: an array of elements fixed size TableSize Search is performed on a part of the item: key Each key is mapped into a number in the range 0 to TableSize-1 Used as array index Mapping by hash function Simple to compute Ensure that any two distinct keys get different cells How to perform insert, delete and find operations in O(1) time ?

6 6 An ideal hash table Each key is mapped to a different index ! Not always possible many keys, finite indexes Even distribution Considerations : Choose a hash function Decide what to do when two keys hash to the same value Decide on table size

7 7 Hash function If keys are integers hash function return Key mod TableSize Ex: TableSize = 10 Keys = 120, 330, 1000 TableSize should be prime

8 8 Hash function If keys are strings Add ASCII values of the characters If TableSize is large and number of characters is small TableSize = 10000 & number of characters in a key = 8 127*8=1016 < 10000 int hash( const string & key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal += key[i]; return hashVal % tableSize; }

9 9 Hash function If keys are strings Use all characters ∑ 32 i Key [KeySize -i -1 ] Early characters does not count Use only some number of characters Use characters in odd spaces

10 10 Hash function If keys are strings Use first three characters 729*key[2] + 27*key[1] + key[0] If the keys are not random some part of the table is not used. int hash( const string & key, int tableSize ) { return ( key [0] + 27 * key [1] + 729 * key [2]) % tableSize; }

11 11 int hash( const string & key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ]; hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } A good hash function

12 12 Collusion Main programming detail is collision resolution If when an element is inserted, it hashes to the same value as an already inserted element, there is collision. There are several methods to deal with this problem Separate chaining Open addressing

13 13 Separate Chaining Hash Table Keep a list of all elements that hash to the same value TableSize = 10 is not good not prime

14 14 Type declaration for separate chaining hash table template class HashTable { public: explicit HashTable(const HashedObj & notFound,int size = 101); HashTable( const HashTable & rhs ) :ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists( rhs.theLists ) { } const HashedObj & find( const HashedObj & x ) const; void makeEmpty( ); void insert( const HashedObj & x ); void remove( const HashedObj & x ); const HashTable & operator=( const HashTable & rhs ); private: vector > theLists; // The array of Lists const HashedObj ITEM_NOT_FOUND; }; int hash( const string & key, int tableSize ); int hash( int key, int tableSize );

15 15 /* Construct the hash table. template HashTable ::HashTable( const HashedObj & notFound, int size ) : ITEM_NOT_FOUND(notFound), theLists( nextPrime( size ) ) {} /* Make the hash table logically empty. template void HashTable ::makeEmpty( ) { for( int i = 0; i < theLists.size( ); i++ ) theLists[ i ].makeEmpty( ); } /* Deep copy. template const HashTable & HashTable :: operator=( const HashTable & rhs ) { if( this != &rhs ) theLists = rhs.theLists; return *this; }

16 16 /* Remove item x from the hash table. template void HashTable ::remove( const HashedObj & x ) { theLists[ hash( x, theLists.size( ) ) ].remove( x ); } /* Find item x in the hash table. template const HashedObj & HashTable :: find( const HashedObj & x ) const { ListItr itr; itr = theLists[ hash( x, theLists.size( ) ) ].find( x ); if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND; else return itr.retrieve( ); }

17 17 /* Insert item x into the hash table. template void HashTable ::insert( const HashedObj & x ) { List & whichList = theLists[ hash( x, theLists.size( ) ) ]; ListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); }

18 18 Analysis Let ג be load factor of a hash table number of elements / TableSize ג is the avarage length of a list Successful Find  ג/2 comparisons + time to evaluate hash function Unsuccessful Find & Insert  ג comparisons + time to evaluate hash function Good choise ג ~ 1 Disadvantage of separate chaining is allocate/deallocate memory !

19 19 Open Adressing If collision  try an alternate cell h 0 (x), h 1 (x), h 2 (x), … h i (x) = (hash(x) + F(i)) mod TableSize F(0) = 0 ג < 1 Good choise < 0.5

20 20 Linear Probing F is a linear function of i –F(i) = i Insert keys {89, 18, 49, 58, 69} When 49 is inserted collision occurs –Put into the next available spot 0 58 collidates with 18, 89, 49

21 21 Linear Probing Problem: It is not easy to delete an element May have caused a collision before Mark the element deleted Problem: Primary Clustering

22 22 Linear Probing Analysis Problem: Primary Clustering

23 23 Quadratic Probing F(i) is a quadratic function Ex : F(i) = i 2

24 24 Quadratic Probing When 49 collides with 89, next position attemped is one cell away 58 collides at position 8. The cell one away is tried, another collision occurs. It is inserted into the cell 2 2 =4 away

25 25 Quadratic Probing Solves primary clustering problem All empty cells may not be accessed A loop around full cells may happen Hash table not full but empty space not found Theorem : If the table size is prime and ג<0.5 new element can always be inserted. Problem : Secondary clustering!...

26 26 template class HashTable { public: explicit HashTable(const HashedObj & notFound,int size = 101); HashTable( const HashTable & rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND), array( rhs.array ), currentSize( rhs.currentSize ) { } const HashedObj & find( const HashedObj & x ) const; void makeEmpty( ); void insert( const HashedObj & x ); void remove( const HashedObj & x ); const HashTable & operator=( const HashTable & rhs ); enum EntryType { ACTIVE, EMPTY, DELETED }; Type declaration for open addressing hash table

27 27 private: struct HashEntry { HashedObj element; EntryType info; HashEntry( const HashedObj & e = HashedObj( ), EntryType i = EMPTY ) : element( e ), info(i) { } }; vector array; int currentSize; const HashedObj ITEM_NOT_FOUND; bool isActive( int currentPos ) const; int findPos( const HashedObj & x ) const; void rehash( ); }; Type declaration for open addressing hash table

28 28 /* Construct the hash table. template HashTable :: HashTable( const HashedObj & notFound, int size ) :ITEM_NOT_FOUND( notFound ), array( nextPrime( size ) ) { makeEmpty( ); } /* Make the hash table logically empty. template void HashTable ::makeEmpty( ) { currentSize = 0; for( int i = 0; i < array.size( ); i++ ) array[ i ].info = EMPTY; }

29 29 /* Find item x in the hash table. template const HashedObj & HashTable :: find( const HashedObj & x ) const { int currentPos = findPos( x ); if( isActive( currentPos ) ) return array[ currentPos ].element; else return ITEM_NOT_FOUND; } /* Method that performs quadratic probing resolution. template int HashTable ::findPos(const HashedObj & x) const { int collisionNum = 0; int currentPos = hash( x, array.size( ) ); while ( array[ currentPos ].info != EMPTY && array[ currentPos ].element != x ) { currentPos += 2 * ++collisionNum - 1; if( currentPos >= array.size( ) ) currentPos -= array.size( ); } return currentPos; }

30 30 /* Return true if currentPos exists and is active. template bool HashTable ::isActive( int currentPos ) const { return array[ currentPos ].info == ACTIVE; } /* Remove item x from the hash table. template void HashTable ::remove( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) ) array[ currentPos ].info = DELETED; } /* Insert routine with quadratic probing template void HashTable ::insert( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) )return; array[ currentPos ] = HashEntry( x, ACTIVE ); }

31 31 /* Deep copy. template const HashTable & HashTable :: operator=( const HashTable & rhs ) { if( this != &rhs ) { array = rhs.array; currentSize = rhs.currentSize; } return *this; }

32 32 Double Hashing Use second hash function F(i) = i * hash 2 (x) Poor example : hash 2 (x) = X mod 9 hash 1 (x) = X mod 10 TableSize = 10 If X = 99 what happens ? hash 2 (x) ≠ 0 for any X

33 33 Double Hashing Good choise : hash 2 (x) = R – (X mod R) R is a prime and < TableSize

34 34 Double Hashing hash 2 (x) = 7 – (X mod 7)

35 35 Analysis Random collision resolution Probes are independent No clustering problem Unsuccessful search and Insert Number of probes until an empty cell is found (1- ג) = fraction of cells that are empty 1 / (1- ג) = expected number of probes Successful search P(X)=Number of probes when the element X is inserted 1/N∑ P(X) approximately

36 36 Rehashing If ג gets large, number of probes increases. Running time of operations starts taking too long and insertions might fail Solution : Rehashing with larger TableSize (usually *2) When to rehash if ג > 0.5 if insertion fails

37 37 Rehashing Example Elements 13, 15, 24 and 6 is inserted into an open addressing hash table of size 7 H(X) = X mod 7 Linear probing is used to resolve collisions

38 38 Rehashing Example If 23 is inserted, the table is over 70 percent full. A new table is created 17 is the first prime twice as large as the old one; so H new (X) = X mod 17 

39 39 Rehashing Rehashing is an expensive operation Running time is O(N) Rehashing frees the programmer from worrying about table size Amortized Analysis: Average over N operations Operations take: O(1) time

40 40 /* Insert routine with quadratic probing template void HashTable ::insert( const HashedObj & x ) { int currentPos = findPos( x ); if( isActive( currentPos ) )return; array[ currentPos ] = HashEntry( x, ACTIVE ); if( ++currentSize > array.size( ) / 2 ) rehash( ); } /* Expand the hash table. template void HashTable ::rehash( ) { vector oldArray = array; array.resize( nextPrime( 2 * oldArray.size( ) ) ); for( int j = 0; j < array.size( ); j++ ) array[ j ].info = EMPTY; currentSize = 0; for( int i = 0; i < oldArray.size( ); i++ ) if( oldArray[ i ].info == ACTIVE ) insert( oldArray[ i ].element ); }


Download ppt "DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI."

Similar presentations


Ads by Google