WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1.

WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1

GOAL Develop a structure that will allow users to insert / delete / find records in constant average time (e.g O(1)) Structure will be a table (relatively small) Table completely contained in memory Implemented by an array Capitalizes on ability to access any element of the array in constant time 2 CE222 - Dr. Senem Kumova Metin - 2011/2012

General Idea A stored item needs to have a data member, called key, that will be used in computing the index value for the item. – Key could be an integer, a string, etc – e.g. a name or Id that is a part of a large employee structure If the size of the array is N, the items that are stored in the hash table are indexed by values from 0 to N – 1. Each key is mapped into some number in the range 0 to N – 1. The mapping is called a hash function. 3 CE222 - Dr. Senem Kumova Metin - 2011/2012

Example Hash Function mary 28200 dave 27500 joe 31250 linda 25000 Items Hash Table key dave 27500 0 1 2 3 4 5 6 7 8 9 mary 28200 joe 31250 linda 25000 4 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function Determines position of keys in the array (Maps items to cells in array) The hash function: – must be simple to compute. – must distribute the keys evenly among the cells. If all the keys are known, then it is possible to write perfect hash functions !!  not possible 5 CE222 - Dr. Senem Kumova Metin - 2011/2012

An example 1/2 Assume that keys are non-negative integers between 0 and MAX_INT and table size N is 5. x  key hash(x)  hashing function hash(x)= x mod(N)  hash(x)=x%5 6 CE222 - Dr. Senem Kumova Metin - 2011/2012

An example 2/2 hash(x)= x mod(N)  hash(x)=x%5 Assume that keys are 23,14, 25, 46 82 in order. Steps : 1. hash(23)=23%5=3 2. hash(14)=14%5=4 3. hash(25)=25%5=0 4. hash(46)=46%5=1 5. hash(82)=82%5=2 indexinitialafter 23after 14after 25after 46after 82 0 25 1 46 2 82 3 23 4 14 7 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Functions Problems: Keys may not be numeric. Number of possible keys is much larger than the space available in table. Different keys may map into same location ( What happens if keys are 25, 30 and 40 for previous example ?? ) – Hash function is not one-to-one => collision. – If there are too many collisions, the performance of the hash table will suffer dramatically. 8 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Functions If the input keys are integers then simply key mod TableSize is a general strategy. – Unless key happens to have some undesirable properties.  Make Table size a prime !!! (Assume that table size is 10 and all keys=10*i ??? ) 9 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Functions If the input keys are strings then hash function needs to convert keys into a numeric value. How to convert a string to a numeric value ?? – Use ASCII codes of chars (127 different chars) 10 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function for Strings 1 Add up the ASCII values of all characters of the key Example : tableSize= N and key =“john” hashVal= 106 + 111 + 104 +110 = 431 index= 431%N int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } 11 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function for Strings 1 Easy to implement !! However, if the table size is large, the function does not distribute the keys well. e.g. Table size =10000, key length <= 8, the hash function can assume values only between 0 and 8*127=1016 int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } 12 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function for Strings 2 Examine only the first 3 characters of the key. In English we have 26 different letters int hash (const string &key, int tableSize) { return (key[0]+27 * key[1] + 27 2 *key[2]) % tableSize; } In theory, 26 * 26 * 26 = 17576 different combinations (ignoring blanks) can be generated. However, English is not random, only 2851 different combinations are possible. Thus, this function although easily computable, is also not appropriate if the hash table is reasonably large. e.g TableSize=10007  without any collisions 28.4% (2851/10007) of table can be hashed to. 13 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function for Strings 3 int hash (const string &key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) hashVal = 37 * hashVal + key[i]; hashVal %=tableSize; if (hashVal < 0) /* in case overflows occurs */ hashVal += tableSize; return hashVal; }; 14 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash function for Strings 3 ali key KeySize = 3; TableSize=10007 98 108 105 // hashVal =0; // for (int i = 0; i < key.length(); i++) //hashVal = 37 * hashVal + key[i]; hashVal=0; hashVal=37*0 +key[0]; // 0+98 hashVal=37*98 +key[1]; // 37*98+108 hashVal=37*(37*98+108)+key[2]; // 37*37*98+37*108+105 hash(“ali”) = (105 * 1 + 108*37 + 98*37 2 ) % 10,007 = 8172 0 1 2 i key[i] 15 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function : Collision Let hash(x) = x % 15 Then, if x = 25 129 35 2501 47 36 hash(x) = 10 9 5 11 2 6 Storing the keys in the array is straightforward: Thus, delete and find can be done in O(1), and also insert, except… 01234567891011121314 47 3536 129252501 16 CE222 - Dr. Senem Kumova Metin - 2011/2012

Hash Function : Collision What happens when you try to insert: x = 65 ? x = 65 hash(x) = 5 ??? If, when an element is inserted, it hashes to the same value as an already inserted element, this is called a collision. 01234567891011121314 47 3536 129252501 17 CE222 - Dr. Senem Kumova Metin - 2011/2012

Handling Collisions Separate Chaining Open Addressing – Linear Probing – Quadratic Probing – Double Hashing 18 CE222 - Dr. Senem Kumova Metin - 2011/2012

Separate Chaining The idea is to keep a list of all elements that hash to the same value. – The array elements are pointers to the first nodes of the lists. – A new item is inserted to the front of the list. Advantages: – Better space utilization for large items. – Simple collision handling: searching linked list. – Overflow: we can store more items than the hash table size. – Deletion is quick and easy: deletion from the linked list. 19 CE222 - Dr. Senem Kumova Metin - 2011/2012

Separate Chaining Example 0 1 2 3 4 5 6 7 8 9 0 81 1 64 4 25 36 16 49 9 Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hash(key) = key % 10. 20 CE222 - Dr. Senem Kumova Metin - 2011/2012

Separate Chaining : Operations Initialization: all entries are set to NULL Find: – locate the cell using hash function. – sequential search on the linked list in that cell. Insertion: – Locate the cell using hash function. – (If the item does not exist) insert it as the first item in the list. Deletion: – Locate the cell using hash function. – Delete the item from the linked list. 21 CE222 - Dr. Senem Kumova Metin - 2011/2012

Separate Chaining: Disadvantages Parts of the array might never be used. As chains get longer, search time increases to O(N) in the worst case. Constructing new chain nodes is relatively expensive (still constant time, but the constant is high). Is there a way to use the “unused” space in the array instead of using chains to make more space? 22 CE222 - Dr. Senem Kumova Metin - 2011/2012

Analysis of Separate Chaining Collisions are very likely. – How likely and what is the average length of lists? Load factor definition: – Ratio of number of elements (N) in a hash table to the hash TableSize. i.e. = N/TableSize – The average length of a list is also  – For chaining is not bound by 1; it can be > 1. 23 CE222 - Dr. Senem Kumova Metin - 2011/2012

Cost of searching 1/4 Search Time(or Cost) = Time to evaluate hash function + the time to traverse the list – Unsuccessful search or successful search ?? 24 CE222 - Dr. Senem Kumova Metin - 2011/2012

Cost of searching 2/4 Unsuccessful search: – We have to traverse the entire list, so we need to compare  nodes on the average 25 CE222 - Dr. Senem Kumova Metin - 2011/2012

Cost of searching 3/4 Successful search: – Successful search time to traverse the list = the node searched + half the expected # of other nodes) – N=# of elements; M= Number of Lists Expected # of other nodes = (N-1)/M =  /M  which is essentially, since M is presumed large) – On the average, we need to check half of the other nodes while searching for a certain element – Thus average search cost = 1 +  /2 26 CE222 - Dr. Senem Kumova Metin - 2011/2012

Cost of searching 4/4 Observation: Table size is not important but load factor is.  For separate chaining make λ ~ 1 27 CE222 - Dr. Senem Kumova Metin - 2011/2012

How to implement Hashing ? EXAMPLE CE222 - Dr. Senem Kumova Metin - 2011/2012 28

Implementation :Example p1/3 class Node { public : int key; // EASY  put all members to public Node(int a) {key=a; next=NULL;} Node * next; }; class List { public :Node * head; List() {head=NULL;} bool searchList(int x) { for(Node * p=head; p!=NULL && p->key !=x ;p=p->next); if(p==NULL) return false; else return true;} void insertList(int x) { if(head==NULL) head= new Node(x); else {Node * p=new Node(x); p->next=head; head=p; } } } 29

Implementation :Example p2/3 const int TABLE_SIZE = 5; int hash(int x); // hash function to generate an index number // between 0-TableSize class HashTable{ public: HashTable (); void makeEmpty(); // remove all entries in the table void insert(int x); // insert x to table void remove(int x); // remove x from table private: List table[TABLE_SIZE]; } 30

Implementation :Example p3/3 void HashTable:: insert(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList(x)==false) table[value].insertList(x); } void HashTable:: remove(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList()==false) cout<< “cannot remove”; else ???? } CE222 - Dr. Senem Kumova Metin - 2011/2012 31

WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1.

Similar presentations

Presentation on theme: "WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1.

Similar presentations

Presentation on theme: "WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1."— Presentation transcript:

Similar presentations

About project

Feedback