CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
CS202 - Fundamental Structures of Computer Science II
Hashing CS 3358 Data Structures.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Vishnu Kotrajaras, PhD. What do we want to do? Insert Delete find (constant time) No sorting No Findmin findmax.
Dictionaries and Hash Tables1 Hash Tables  
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
CS201: Data Structures and Discrete Mathematics I Hash Table.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Hashing Vishnu Kotrajaras, PhD Nattee Niparnan, PhD.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
1 Designing Hash Tables Sections 5.3, 5.4, 5.5, 5.6.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Fundamental Structures of Computer Science II
Hashing.
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CMSC 341 Hashing.
CMSC 341 Hashing 12/2/2018.
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
CS202 - Fundamental Structures of Computer Science II
CMSC 341 Hashing 2/18/2019.
CMSC 341 Hashing.
CMSC 341 Hashing 4/11/2019.
CMSC 341 Hashing 4/27/2019.
Hashing Vishnu Kotrajaras, PhD.
Data Structures and Algorithm Analysis Hashing
CMSC 341 Lecture 12.
CS 144 Advanced C++ Programming April 23 Class Meeting
Presentation transcript:

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent University Computer Engineering Department

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 2 Hashing We will now see a data structure that will allow the following operations to be done in O(1) time:  Insertion  Deletion  Find This data structure,however, is not efficient in operation that require any ordering information among the elements Sort Find minimum Find maximum

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 3 Hashing Hashing requires a Hash Table to be used. There may various methods in implementing a hash table, but the most simplest one is using an array of some fixed size. The maximum size of the array is TableSize. The items that are stored in the array (hash table) are indexed by values from 0 to TableSize – 1. A stored item need to have a data member, called key, that will be used in computing the index value for the item. A stored item need to have a data member, called key, that will be used in computing the index value for the item. For student items, key could be the student ID or student Name For student items, key could be the student ID or student Name For account balance records in a bank, the key could be the account number of the customer, …. For account balance records in a bank, the key could be the account number of the customer, ….

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 4 Hashing- General Idea When an item needs to be inserted into the Hash Table: The key of the item is mapped into an index value (between 0 and TableSize -1) using a special function, called Hash Table. The item is then stored in the array at that index value. If more than one key (items) are mapped into the same index value, then we say that we have a collision. Collisions must be resolved. We need to handle collisions. When an item needs to be found (searched) The key of the item is mapped into an index value again and the item is tried to be retrieved from that index of the array. If the item could not be stored at the location (at that index), then appropriate error message could be returned.

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 5 Hash Function Hash function objectives: Should be simple to compute Theoretically: Should ensure (or try) that any two distinct keys are mapped into different hash table entries (cells). Practically: Should distribute the keys evenly among the cells. Hash Function keyIndex into hash table

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 6 Example Hash Function mary dave phil john Items Hash Table key Key could be an integer, a string, etc. Here it is a string. mary dave phil john 25000

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 7 Hash Function If the keys are integers: Hash(Key) = Key mod TableSize Table size of prime. If the keys are strings: It is more difficult. The keys need to be well-distributed. Table should be well utilized.  We will give 3 functions

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 8 Hash Function for Strings  Method: Sum up the ascii values of the characters of the key!  This method causes some part of the hash table to be unused. A case for which the functions do not work well:  Let say TableSize is 10,007 (a prime number).  Assume all keys are 8 characters long.  8 x 127 = 1016 different 8 characters combinations possible.  Only the hash values in the range will be generated.  The rest of the table will be unused. int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } Function 1

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 9 Hash function for strings The functions uses the first 3 characters of the key string. Computes the number of English words that can be generated by the first 3 characters. 26 English alphabet character + 1 blank symbol = 27 symbols are represented by a character. Therefore there are 26 * 26 * 26 = different English words that can be theoretically generated. However, English dictionary contains 2851 three-character valid English words. Therefore, this function also does not utilize the entire hash table and does not distribute the keys randomly to cells. Some cells are always empty. int hash (const string &key, int tableSize) { return (key[0] + 27 * key[1] key[2]) % tableSize; } Function 2

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 10 Hash function for strings: int hash (const string &key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) hashVal = 37 * hashVal + key[i]; hashVal %=tableSize; if (hashVal < 0) /* in case overflows occur in computation */ hashVal += tableSize; return hashVal; };

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 11 Hash function for strings: ali key KeySize = 3; hash(“ali”) = (105 * * *37 2 ) % 10,007 = i key[i] hash function ali …… ,006 (TableSize) “ali”

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 12 Resolving Collisions When two or more different keys are mapped to the same hash value (index), this is called collision.  We have to find some way of storing the items mapped to this hash value. We have a single array cell at the hash value index that can store only a single item There are various methods to resolve collisions.

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 13 Separate Chaining Method is Keep a list of items (elements) that hash to the same value. The array cell which has the hash-value as the index, will have a pointer to the first element of the list. When a new item is to be inserted into the list, it can be inserted to the front of the list  We don’t need to traverse to the end of the list. If duplicate items are to be inserted:  A reference count can be kept at each item node, that points how many items are present at that node.

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 14 Separate Chaining – an Example We have items (keys): 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hask(key) = key % 10.

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 15 #include "vector.h" #include "mystring.h" #include "LinkedList.h“ template class HashTable { public: explicit HashTable( const HashedObj & notFound, int size = 101 ); HashTable( const HashTable & rhs ) : ITEM_NOT_FOUND( rhs.ITEM_NOT_FOUND ), theLists( rhs.theLists ) { } const HashedObj & find( const HashedObj & x ) const; void makeEmpty( ); void insert( const HashedObj & x ); void remove( const HashedObj & x ); const HashTable & operator=( const HashTable & rhs ); private: vector > theLists; // The array of Lists const HashedObj ITEM_NOT_FOUND; }; int hash( const string & key, int tableSize ); int hash( int key, int tableSize ); Hash Table Class for Separate Chaining

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 16 /* A hash routine for string objects. */ int hash( const string & key, int tableSize ) { int hashVal = 0; for( int i = 0; i < key.length( ); i++ ) hashVal = 37 * hashVal + key[ i ]; hashVal %= tableSize; if( hashVal < 0 ) hashVal += tableSize; return hashVal; } /** A hash routine for ints */ int hash( int key, int tableSize ) { if( key < 0 ) key = -key; return key % tableSize; } Hash Table Hash() Member function

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 17 /** * Construct the hash table. */ template HashTable ::HashTable( const HashedObj & notFound, int size ) : ITEM_NOT_FOUND( notFound ), theLists( nextPrime( size ) ) { } /** * Make the hash table logically empty. */ template void HashTable ::makeEmpty( ) { for( int i = 0; i < theLists.size( ); i++ ) theLists[ i ].makeEmpty( ); } Hash Table Constructor And makeEmpty operations

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 18 /** * Construct the hash table. */ template HashTable ::HashTable( const HashedObj & notFound, int size ) : ITEM_NOT_FOUND( notFound ), theLists( nextPrime( size ) ) { } /** * Make the hash table logically empty. */ template void HashTable ::makeEmpty( ) { for( int i = 0; i < theLists.size( ); i++ ) theLists[ i ].makeEmpty( ); } Hash Table Constructor And makeEmpty operations

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 19 /** * Insert item x into the hash table. If the item is * already present, then do nothing. */ template void HashTable ::insert( const HashedObj & x ) { List & whichList = theLists[ hash( x, theLists.size( ) ) ]; ListItr itr = whichList.find( x ); if( itr.isPastEnd( ) ) whichList.insert( x, whichList.zeroth( ) ); } Hash Table Insert operation

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 20 /** * Remove item x from the hash table. */ template void HashTable ::remove( const HashedObj & x ) { theLists[ hash( x, theLists.size( ) ) ].remove( x ); } Hash Table remove operation

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 21 /** * Find item x in the hash table. * Return the matching item or ITEM_NOT_FOUND if not found */ template const HashedObj & HashTable ::find( const HashedObj & x ) const { ListItr itr; itr = theLists[ hash( x, theLists.size( ) ) ].find( x ); if( itr.isPastEnd( ) ) return ITEM_NOT_FOUND; else return itr.retrieve( ); } Hash Table find operation

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 22 Linked list solution for collision resolution can be replaces by: Binary search tree An other hash table. …. Load factor definition: Ratio of number of elements (N) in a hash table to the hash TableSize.  = N/TableSize

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 23 The average length of a list is Effort required for search Unsuccessful:  We have to traverse the list, so we need to examine nodes on the average. Successful search:  Hash function computation (O(1)) – constant, 1 node processing for the item found, x nodes processing for other items in the list.  x = (N-1) / M = -1 / M (N: number of nodes, M: number of lists). M is large.  So, x ~=  On the average, we need to check half of the other nodes while searching for a certain element.  So the total cost is: 1 +  /2

CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University 24 General Rules for separate chaining Not the TableSize,but the load factor ( ) is important in defining the cost of operations. TableSize should be as large the number of expected elements in the hash table. To keep load factor around 1. TableSize should be prime for even distribution of keys to hash table cells.