WEEK 1 Hashing CE222 Dr. Senem Kumova Metin 2011-2012 1.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hashing.
Hashing General idea Hash function Separate Chaining Open Addressing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
CS202 - Fundamental Structures of Computer Science II
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Techniques.
Hashing CS 3358 Data Structures.
Data Structures Hash Tables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.
Hash Tables1 Part E Hash Tables  
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
Dictionaries and Hash Tables1 Hash Tables  
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Searching Tables Table: sequence of (key,information) pairs (key,information) pair is a record key uniquely identifies information, so no duplicate records.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
CE 221 Data Structures and Algorithms
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Dictionaries and Hash Tables
CMSC 341 Hashing 12/2/2018.
Searching Tables Table: sequence of (key,information) pairs
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CMSC 341 Hashing 4/11/2019.
Data Structures and Algorithm Analysis Hashing
CMSC 341 Lecture 12.
Presentation transcript:

WEEK 1 Hashing CE222 Dr. Senem Kumova Metin

GOAL Develop a structure that will allow users to insert / delete / find records in constant average time (e.g O(1)) Structure will be a table (relatively small) Table completely contained in memory Implemented by an array Capitalizes on ability to access any element of the array in constant time 2 CE222 - Dr. Senem Kumova Metin /2012

General Idea A stored item needs to have a data member, called key, that will be used in computing the index value for the item. – Key could be an integer, a string, etc – e.g. a name or Id that is a part of a large employee structure If the size of the array is N, the items that are stored in the hash table are indexed by values from 0 to N – 1. Each key is mapped into some number in the range 0 to N – 1. The mapping is called a hash function. 3 CE222 - Dr. Senem Kumova Metin /2012

Example Hash Function mary dave joe linda Items Hash Table key dave mary joe linda CE222 - Dr. Senem Kumova Metin /2012

Hash Function Determines position of keys in the array (Maps items to cells in array) The hash function: – must be simple to compute. – must distribute the keys evenly among the cells. If all the keys are known, then it is possible to write perfect hash functions !!  not possible 5 CE222 - Dr. Senem Kumova Metin /2012

An example 1/2 Assume that keys are non-negative integers between 0 and MAX_INT and table size N is 5. x  key hash(x)  hashing function hash(x)= x mod(N)  hash(x)=x%5 6 CE222 - Dr. Senem Kumova Metin /2012

An example 2/2 hash(x)= x mod(N)  hash(x)=x%5 Assume that keys are 23,14, 25, in order. Steps : 1. hash(23)=23%5=3 2. hash(14)=14%5=4 3. hash(25)=25%5=0 4. hash(46)=46%5=1 5. hash(82)=82%5=2 indexinitialafter 23after 14after 25after 46after CE222 - Dr. Senem Kumova Metin /2012

Hash Functions Problems: Keys may not be numeric. Number of possible keys is much larger than the space available in table. Different keys may map into same location ( What happens if keys are 25, 30 and 40 for previous example ?? ) – Hash function is not one-to-one => collision. – If there are too many collisions, the performance of the hash table will suffer dramatically. 8 CE222 - Dr. Senem Kumova Metin /2012

Hash Functions If the input keys are integers then simply key mod TableSize is a general strategy. – Unless key happens to have some undesirable properties.  Make Table size a prime !!! (Assume that table size is 10 and all keys=10*i ??? ) 9 CE222 - Dr. Senem Kumova Metin /2012

Hash Functions If the input keys are strings then hash function needs to convert keys into a numeric value. How to convert a string to a numeric value ?? – Use ASCII codes of chars (127 different chars) 10 CE222 - Dr. Senem Kumova Metin /2012

Hash Function for Strings 1 Add up the ASCII values of all characters of the key Example : tableSize= N and key =“john” hashVal= = 431 index= 431%N int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } 11 CE222 - Dr. Senem Kumova Metin /2012

Hash Function for Strings 1 Easy to implement !! However, if the table size is large, the function does not distribute the keys well. e.g. Table size =10000, key length <= 8, the hash function can assume values only between 0 and 8*127=1016 int hash(const string &key, int tableSize) { int hasVal = 0; for (int i = 0; i < key.length(); i++) hashVal += key[i]; return hashVal % tableSize; } 12 CE222 - Dr. Senem Kumova Metin /2012

Hash Function for Strings 2 Examine only the first 3 characters of the key. In English we have 26 different letters int hash (const string &key, int tableSize) { return (key[0]+27 * key[1] *key[2]) % tableSize; } In theory, 26 * 26 * 26 = different combinations (ignoring blanks) can be generated. However, English is not random, only 2851 different combinations are possible. Thus, this function although easily computable, is also not appropriate if the hash table is reasonably large. e.g TableSize=10007  without any collisions 28.4% (2851/10007) of table can be hashed to. 13 CE222 - Dr. Senem Kumova Metin /2012

Hash Function for Strings 3 int hash (const string &key, int tableSize) { int hashVal = 0; for (int i = 0; i < key.length(); i++) hashVal = 37 * hashVal + key[i]; hashVal %=tableSize; if (hashVal < 0) /* in case overflows occurs */ hashVal += tableSize; return hashVal; }; 14 CE222 - Dr. Senem Kumova Metin /2012

Hash function for Strings 3 ali key KeySize = 3; TableSize= // hashVal =0; // for (int i = 0; i < key.length(); i++) //hashVal = 37 * hashVal + key[i]; hashVal=0; hashVal=37*0 +key[0]; // 0+98 hashVal=37*98 +key[1]; // 37* hashVal=37*(37*98+108)+key[2]; // 37*37*98+37* hash(“ali”) = (105 * * *37 2 ) % 10,007 = i key[i] 15 CE222 - Dr. Senem Kumova Metin /2012

Hash Function : Collision Let hash(x) = x % 15 Then, if x = hash(x) = Storing the keys in the array is straightforward: Thus, delete and find can be done in O(1), and also insert, except… CE222 - Dr. Senem Kumova Metin /2012

Hash Function : Collision What happens when you try to insert: x = 65 ? x = 65 hash(x) = 5 ??? If, when an element is inserted, it hashes to the same value as an already inserted element, this is called a collision CE222 - Dr. Senem Kumova Metin /2012

Handling Collisions Separate Chaining Open Addressing – Linear Probing – Quadratic Probing – Double Hashing 18 CE222 - Dr. Senem Kumova Metin /2012

Separate Chaining The idea is to keep a list of all elements that hash to the same value. – The array elements are pointers to the first nodes of the lists. – A new item is inserted to the front of the list. Advantages: – Better space utilization for large items. – Simple collision handling: searching linked list. – Overflow: we can store more items than the hash table size. – Deletion is quick and easy: deletion from the linked list. 19 CE222 - Dr. Senem Kumova Metin /2012

Separate Chaining Example Keys: 0, 1, 4, 9, 16, 25, 36, 49, 64, 81 hash(key) = key % CE222 - Dr. Senem Kumova Metin /2012

Separate Chaining : Operations Initialization: all entries are set to NULL Find: – locate the cell using hash function. – sequential search on the linked list in that cell. Insertion: – Locate the cell using hash function. – (If the item does not exist) insert it as the first item in the list. Deletion: – Locate the cell using hash function. – Delete the item from the linked list. 21 CE222 - Dr. Senem Kumova Metin /2012

Separate Chaining: Disadvantages Parts of the array might never be used. As chains get longer, search time increases to O(N) in the worst case. Constructing new chain nodes is relatively expensive (still constant time, but the constant is high). Is there a way to use the “unused” space in the array instead of using chains to make more space? 22 CE222 - Dr. Senem Kumova Metin /2012

Analysis of Separate Chaining Collisions are very likely. – How likely and what is the average length of lists? Load factor definition: – Ratio of number of elements (N) in a hash table to the hash TableSize. i.e. = N/TableSize – The average length of a list is also  – For chaining is not bound by 1; it can be > CE222 - Dr. Senem Kumova Metin /2012

Cost of searching 1/4 Search Time(or Cost) = Time to evaluate hash function + the time to traverse the list – Unsuccessful search or successful search ?? 24 CE222 - Dr. Senem Kumova Metin /2012

Cost of searching 2/4 Unsuccessful search: – We have to traverse the entire list, so we need to compare  nodes on the average 25 CE222 - Dr. Senem Kumova Metin /2012

Cost of searching 3/4 Successful search: – Successful search time to traverse the list = the node searched + half the expected # of other nodes) – N=# of elements; M= Number of Lists Expected # of other nodes = (N-1)/M =  /M  which is essentially, since M is presumed large) – On the average, we need to check half of the other nodes while searching for a certain element – Thus average search cost = 1 +  /2 26 CE222 - Dr. Senem Kumova Metin /2012

Cost of searching 4/4 Observation: Table size is not important but load factor is.  For separate chaining make λ ~ 1 27 CE222 - Dr. Senem Kumova Metin /2012

How to implement Hashing ? EXAMPLE CE222 - Dr. Senem Kumova Metin /

Implementation :Example p1/3 class Node { public : int key; // EASY  put all members to public Node(int a) {key=a; next=NULL;} Node * next; }; class List { public :Node * head; List() {head=NULL;} bool searchList(int x) { for(Node * p=head; p!=NULL && p->key !=x ;p=p->next); if(p==NULL) return false; else return true;} void insertList(int x) { if(head==NULL) head= new Node(x); else {Node * p=new Node(x); p->next=head; head=p; } } } 29

Implementation :Example p2/3 const int TABLE_SIZE = 5; int hash(int x); // hash function to generate an index number // between 0-TableSize class HashTable{ public: HashTable (); void makeEmpty(); // remove all entries in the table void insert(int x); // insert x to table void remove(int x); // remove x from table private: List table[TABLE_SIZE]; } 30

Implementation :Example p3/3 void HashTable:: insert(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList(x)==false) table[value].insertList(x); } void HashTable:: remove(int x) { int value= hash(x); // table[value] is the head of corresponding list if(table[value].searchList()==false) cout<< “cannot remove”; else ???? } CE222 - Dr. Senem Kumova Metin /