# Hash Tables Dr. Li Jiang School of Computer Science,

## Presentation on theme: "Hash Tables Dr. Li Jiang School of Computer Science,"— Presentation transcript:

Hash Tables Dr. Li Jiang School of Computer Science,
The University of Adelaide Greeting! So far we have covered abstract data types (with all of the C++ implementation of classes and templates), linked lists (pointer-based, template classes), stacks and queues. Today, we will talk about ..

Overview Hash Table ADT Hash Function
Direct addressing and its problem Hash table and hash table ADT operations Hash Function Example of using a hash function Benefit of using a hash function Problem of using a hash function Collision and collision resolution Collisions Resolution An example of using chaining So far we have covered abstract data types (with all of the C++ implementation of classes and templates), linked lists (pointer-based, template classes), stacks and queues.  I will have also spent a lecture talking about the fundamentals of algorithm efficiency from a Big-O perspective.  So you can assume that background as you prepare to teach hash tables.  In the past, I’ve generally done an array-based approach first so that we can talk about collisions, etc. and then work up to the idea of a bucket hash table.

Learning Objectives By the end of this lecture, you should be able to:
Understand and interpret the concepts of hash table and hash function. Define hash table function and hash table operations Understand the collision and one of the collision resolution approaches – chaining approach Use chain approach to solve collision problem

An Example of A Table (Key, Value) BHM
Birmingham International Airport LGB Long Beach LAX Los Angeles International Airport OAK Oakland IAD Washington, Dulles International Airport HNL Honolulu International Airport BOS Boston, Logan International Airport ACY Atlantic City International Airport CLE Cleveland PDX Portland International Airport Suppose that we want you to develop a program to save the information, insert information about new airports, delete an airport name, query about the name of an airport. Table is an abstract storage device that contains table entries Each table entry contains a unique key k. Each table entry may also contain some information, I, associated with its key. A table entry is an ordered pair (K, I)

Associated Information (Airports name, or related information )
An Example of A Table (cont.) Key Associated Information (Airports name, or related information ) BHM Birmingham International Airport LGB Long Beach LAX Los Angeles International Airport OAK Oakland IAD Washington, Dulles International Airport HNL Honolulu International Airport BOS Boston, Logan International Airport ACY Atlantic City International Airport CLE Cleveland PDX Portland International Airport Suppose that we want you to develop a program to save the information, insert information about new airports, delete an airport name, query about the name of an airport. Table is an abstract storage device that contains table entries Each table entry contains a unique key k. Each table entry may also contain some information, I, associated with its key. A table entry is an ordered pair (K, I)

Associated Information
An Example of A Table (cont.) Key Associated Information 1 2 3 4 5 6 7 8 9 10 BHM Birmingham International Airport LGB Long Beach LAX Los Angeles International Airport OAK Oakland IAD Washington, Dulles International Airport HNL Honolulu International Airport BOS Boston, Logan International Airport ACY Atlantic City International Airport CLE Cleveland PDX Portland International Airport To represent and implement the Table ADT Suppose that we want you to develop a program to save the information, insert information about new airports, delete an airport name, query about the name of an airport. Table is an abstract storage device that contains table entries Each table entry contains a unique key k. Each table entry may also contain some information, I, associated with its key. A table entry is an ordered pair (K, I)

Direct Addressing Suppose there are n objects required to store in the table: The range of keys is 0..n-1 Keys are distinct The idea of the direct addressing: Table is represented with an array, e.g. airportInfo[0..n-1] Insert an object to the airport information table airportInfo[i] = x if x airportInfo and key[x] = i airportInfo[i] = NULL otherwise Efficiency of the algorithms implementing the operations of Table ADT Insert operation takes O(1) time Search operation takes O(n) time Delete operation takes O(n) time

If number of objects and size of table is reasonably small: Direct Addressing is an efficient way to access the data It takes less time for any operation on direct addressing table.

When the size of table is very large: Using a table T of size N and N is a large number (e.g. >10000), using direct addressing may be impractical, given the memory available on a typical computer. The number of the objects actually stored may be so small relative to large space created. Thus, most of the space allocated for T would be wasted. When you are not sure how many data items are and you know the number of data items will be big. People tend to create a much bigger table.

Associated Information
An Example of A Table (1) Key Associated Information 1 2 3 4 5 6 7 8 9 10 BHM Birmingham International Airport LGB Long Beach LAX Los Angeles International Airport OAK Oakland IAD Washington, Dulles International Airport HNL Honolulu International Airport BOS Boston, Logan International Airport ACY Atlantic City International Airport CLE Cleveland PDX Portland International Airport Suppose that we want you to develop a program to save the information, insert information about new airports, delete an airport name, query about the name of an airport. Table is an abstract storage device that contains table entries Each table entry contains a unique key k. Each table entry may also contain some information, I, associated with its key. A table entry is an ordered pair (K, I)

The data item of one airport
An Example of Table The data item of one airport If direct addressing approach is used, Buckets Assume that Data items of 400 airports needs to be processed. The key: Airport code with three letters, used to identify each airport. Number of different three letter combinations will be 26 × 26 × 26 =17576 (possible number of airports) The fraction of actual keys (Buckets) needed: 400/17576=2.2% Percent of the memory allocated for table wasted, % Again, the operations on the table will take: O(1) to O(n) time How to represent and store the table such that all the operations related to the table can be executed efficiently. Pneumonoultramicroscopicsilicovolcanoconiosis : lung related disease = 4.7e+63 Honorificabilitudinitatibus: 27 : Longest word in Shakespeare's works = 1.6e + 38

Another Example Assume that: A better way is necessary Hash Table
A table is needed to store 50 students in a class. The key is defined as 9 digit Student Identification Number, used to identify each student. If direct addressing approach is used, we will find that Number of different 9 digit number will be 109 The fraction of actual keys needed. 50/109, % Percent of the memory allocated for table wasted, % Apparently, this is not a good way to do it. A better way is necessary Hash Table

Can be any type of object
Hash Table ADT Dictionary ADT The hash table is a table of elements that have keys, usually represented as (Key, Element) pair which is actually a table. A hash function is used for locating a position in the table Can be any type of object h( key ) Location of the object containing the key Key  S, where S is usually a huge set of possible keys The elements of the table is the set of data acted upon by the hash table operations The term "hash" comes by way of analogy with its non-technical meaning, to "chop and mix". Indeed, typical hash functions, like the mod operation, "chop" the input domain into many sub-domains that get "mixed" into the output range to improve the uniformity of the key distribution. Notice that uniformity does not mean injective mapping, it might be bijective mapping as well. hash function thoroughly 'mixes up' the values of its domain. because you chop your input that you put in pieces in different places or buckets (your table entries). A hash table maps a huge set of possible keys into index of N buckets by applying a hash function to each hash code Notice : Ns = Card |S|, Ns is much larger than N, n is the actual number of objects that are processed Ideally, n =N or n=a× N +b where a and b is small number

Hash Functions The input into a hash function is a key value
The output from a hash function is an index of an array (hash table) where the object containing the key is located The most commonly used hash function is: h( hashCode ) = hashCode mod N How to define a hash function is not trivial for some applications. Where the hashCode is the key of an element, N is the number of buckets that is actually used Notice that the hashCode is not often obvious, building a model to compute it is often required.

Examples of Hash Functions
Divisor is usually the size of the table, it is set to a prime when the keys contains a lot of 0s h( k ) = k % 101 if k is an integer and it is the key for the associated element h(CLE)=? One of the answers will be: h(Ariport_code) =p(fitstChar) × p(secondChar) × p(thirdChar)%400 p is a position function which maps a character to its position value How to define a hash function is not trivial for some applications. A B C D E F G H I J K L …… …… h(CLE) =3 × 12 × 5%400=180

Insert: to insert an element into a table Retrieve: to retrieve an element from the table Remove: to remove an element from the table Update: to update an element in the table Empty: to empty out the hash table

Inserting an Object in A HashTable
The following pseudo-code for the insert operation: public: bool insert( key, object) { 1. Compute the key's hash code. 2. Compute the hash function to determine the index of bucket. 3. Insert the object into the bucket's chain with the index of the bucket obtained from 2. } Notice that here is bucket’s chain, instead of bucket. Insertion is done in O( 1 ) time

Inserting an Object in A HashTable
1 …… 80 2 79 …… Buckets An example of insert operation An element (Cleveland) is inserted into a hash table. (suppose we only need to deal with 101 big airports) What the hash function will be? h( k ) = k % 101 h(CLE)=h(180)=180%101=79 Cleveland A good hash function and implementation algorithm are essential for good hash table performance, but may be difficult to achieve. To find where an element is to be inserted, use the hash function on its key If the key value is 180, the element is to be stored in index 79 of the array Insertion is done in O( 1 ) time

Benefit of Using a Hash Function
Using a hash table, we simply have a function which provides us with the index of the array where the object containing the key is located Other alternative is expensive If we have millions of objects with (key, values) structure, it may take a long time to search a regular array or a linked list for a specific part number (on average, we might compare 500,000 key values)

Problem of Using a Hash Function
Consider the hash function h( k ) = k % 100 Suppose that a key value of 214 is used for an object, and the object is stored at index 14 a key value of 114 is used for a second object; the result of the hash function is 14, but index 14 is already occupied, This is called a collision Collision is the circumstance where several keys hash to the same bucket. This happens when: h( hashCode1 ) == h( hashCode2 ) How shall we solve this problem?

How are Collisions Resolved?

Searching an Object in A HashTable
Pseudo-code for the retrieve (search, find) operation The following pseudo-code for the retrieve (find) operation: public: bool retrieve( DataType & key) { 1. Hash the key find the hash code and compute hash function with the given key to obtain the index of the bucket. 2. Search through the linked list specified by the bucket index number. 3. If you find the entry with the right key you return it; otherwise return null. } A search for an element can be done in O( 1 ) time.

Searching an Object in A HashTable
An example of search operation Suppose our hash function is: h( k ) = k % 100 We wish to search for the object containing key value 214 If k is set to 214 in the hash function above, the result is 14 The object containing key 214 is stored at index 14 of the array (hash table)

An Example of HashTable Class
template <class DataType>   class HashTable   {   public:   HashTable( int (*hf)(const DataType &), int s );   bool insert( const DataType & newObject ); // returns true if successful;   // returns false if invalid index was returned  from hash function   bool retrieve( DataType & retrieved ); // retrieve the item for the given key bool remove( DataType & removed ); // remove the item for the given key bool update( DataType & updateObject ); // update the item for the given key void makeEmpty( );  // empty out the hash table private:   Array< LinkedList<DataType> > table;   int (*hashfunc)(const DataType &); // pointer to hash function   };

An Example of Using Chaining
1 2 3 4 5 6 A hash table which is initially empty. Every element is a LinkedList object. Only the start pointer of the LinkedList object is shown, which is set to NULL at the beginning. The hash function is: h( k ) = k % 7

An Example of Using Chaining (cont.)
1 2 3 4 5 6 The hash function is: h( k ) = k % 7 INSERT object with key 31

Example Using Chaining (cont.)
1 2 3 4 5 6 The hash function is: h( k ) = k % 7 h(31)=31 % 7= 3 INSERT object with key 31

Example Using Chaining (cont.)
1 2 3 4 5 6 Assumed that the hash function is: h( k ) = k % 7 31 Note: The whole object is stored but only the key value is shown

Example Using Chaining (cont.)
1 2 3 4 5 6 The hash function is: h( k ) = k % 7 INSERT object with key 9 31

Example Using Chaining (cont.)
1 2 3 4 5 6 The hash function is: h( k ) = k % 7 INSERT object with key 9 9 % 7 = 2 31

Example Using Chaining (cont.)
1 2 3 4 5 6 The hash function is: h( k ) = k % 7 INSERT object with key 9 9 % 7 is 2 9 31

Example Using Chaining (cont.)
1 2 3 4 5 6 9 INSERT object with key 36 36 % 7 is 1 31

Example Using Chaining (cont.)
1 2 3 4 5 6 36 9 INSERT object with key 36 36 % 7 is 1 31

Example Using Chaining (cont.)
1 2 3 4 5 6 36 9 INSERT object with key 42 31

Example Using Chaining (cont.)
1 2 3 4 5 6 36 9 INSERT object with key 42 42 % 7 is 0 31

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 42 42 % 7 is 0 31

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 46 31

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 46 46 % 7 is 4 31

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 46 46 % 7 is 4 31 46

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 20 31 46

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 20 20 % 7 is 6 31 46

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 20 20 % 7 is 6 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 2 2 % 7 is 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 INSERT object with key 2 2 % 7 is 2 36 9 But an object has been inserted in the location with index 2 of the linked list before 31 46 COLLISION occurs !! How to resolve this? 20 Inserts the new element at the BEGINNING of the list

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 INSERT object with key 2 2 % 7 is 2 9 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 2 2 % 7 is 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 2 2 % 7 is 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 9 INSERT object with key 2 2 % 7 is 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 2 2 % 7 is 2 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 36 2 9 INSERT object with key 24 24 % 7 is 3 24 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 Supposed that all objects were stored in the linked list. How to Find an object? 36 2 9 e.g. FIND the object with key 9 24 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 FIND the object with key 9 9 % 7 is 2 36 2 9 24 31 46 20

Example Using Chaining (cont.)
FIND the object with key 9 9 % 7 is 2 1 2 3 4 5 6 42 36 We search this linked list for the object with key 9 2 9 24 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 Remember…the whole object is stored, only the key is shown 36 2 9 24 31 46 20

Example Using Chaining (cont.)
1 2 3 4 5 6 42 Does this object contain key 9? 36 2 9 24 31 46 20

Example Using Chaining (cont.)
FIND the object with key 9 1 2 3 4 5 6 42 Does this object contain key 9? No, so go on to the next object. 36 2 9 24 31 46 20

Example Using Chaining (cont.)
FIND the object with key 9 1 2 3 4 5 6 42 Does this object contain key 9? 36 2 9 24 31 46 20

Example Using Chaining (cont.)
FIND the object with key 9 1 2 3 4 5 6 42 Does this object contain key 9? YES, found it! Return the object. 36 2 9 24 31 46 20

Summary Hash Table ADT Hash Function
Direct addressing and its problem Hash table and hash table ADT operations Hash Function Example of using a hash function Benefit of using a hash function Problem of using a hash function Collision and collision resolution Collisions Resolution An example of using chaining So far we have covered abstract data types (with all of the C++ implementation of classes and templates), linked lists (pointer-based, template classes), stacks and queues.  I will have also spent a lecture talking about the fundamentals of algorithm efficiency from a Big-O perspective.  So you can assume that background as you prepare to teach hash tables.  In the past, I’ve generally done an array-based approach first so that we can talk about collisions, etc. and then work up to the idea of a bucket hash table.

END Thank You ! Look Forward To Seeing You Again !