Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course: Programming II - Abstract Data Types Hash TablesSlide Number 1 The ADT Hash Table What is a table? A collection of items, each including several.

Similar presentations


Presentation on theme: "Course: Programming II - Abstract Data Types Hash TablesSlide Number 1 The ADT Hash Table What is a table? A collection of items, each including several."— Presentation transcript:

1 Course: Programming II - Abstract Data Types Hash TablesSlide Number 1 The ADT Hash Table What is a table? A collection of items, each including several pieces of information. One of these pieces of information is a search key. A table is another example of ADT whose insertion, deletion and retrieval of items is made by value and not by position. items may be in order with respect to the search key. items may or may not have the same search key City is the search key City ContryPopulation Cairo London Paris Egypt England France RomeItaly Efficient retrieval of items, if based on search key value: e.g. Retrieve the population of London.

2 Course: Programming II - Abstract Data Types Hash TablesSlide Number 2 Access Procedures for Tables iii. getSize( ) // post: Returns the number of entries in the table. iv.add(key, newElem) // post: Adds the pair (key, newElem) to the table in its proper order according to // its key. i. createTable( ) // post: Creates an empty table. ii. isEmpty( ) // post: Determines whether a table is empty. v.remove(key) // post: Removes from the table the entry with given key. Returns either the value // post: associated with the given key or null if such entry does not exist. vi.getValue(key) // post: Retrieves the value that corresponds to a given key from a table. Returns null // post: if no such entry exists in the table.

3 Course: Programming II - Abstract Data Types Hash TablesSlide Number 3 Introducing the ADT Hash Table Binary Search Tree: a particular type of binary tree that enables easy search of specific items. But: efficiency of the search method depends on the form of the binary search trees, in the best case scenario (balanced trees), for trees with 10,000 items, operations of retrieve, insert, and delete would still require O(log 10,000) = 13 steps. Is there a more efficient way of storing, searching and retrieving data? Hashing: the basic idea Address calculator Elem (or search key) add(searchKey, newElem){ i = the array index that the address calculator gives us for the searchKey of newElem. table[i] = newElem } method add is of the order O(1) (it requires constant time). The add operation

4 Course: Programming II - Abstract Data Types Hash TablesSlide Number 4 The retrieve operation getValue(key) and remove(key) are also operations of the order O(1) (require constant time). The remove operation getValue(searchKey) // post: Returns the element that has a matching // post: searchKey. If not found, it returns null. i = the array index that the address calculator gives us, for element whose search key is equal to searchKey if (table[i].getKey( ) equals searchKey) return table[i].getValue(); else return null; pseudocode remove(searchKey) // post: Deletes element that has a matching searchKey, // post: and returns true; otherwise returns false. i = the array index that the address calculator gives us for the given searchKey if (table[i].getKey( ) equals searchKey){ delete element from table[i]; return true; } else return false pseudocode

5 Course: Programming II - Abstract Data Types Hash TablesSlide Number 5 Definition The ADT Hash Table is an array (i.e. array table) of elements (possibly associated with a search key unique for each element), together with an hash function and access procedures. The hash function determines the location in the table of a new element, using its value (or search key if any). In a similar way, permits to locate the position of an existing element. A hash table can also be empty – no element is stored in the array table. The access procedures include insertion, deletion, and retrieval of an element by means of the hash function. The hash function takes a search key and maps it into an integer array index. A perfect hash function is an ideal function that maps each search key into a unique array index.

6 Course: Programming II - Abstract Data Types Hash TablesSlide Number 6 Understanding how Hash Functions work Example: the table is a Directory Database with Less than 10,000,000 people Each person with his/her telephone number, as search key, The telephone number is of type int: e.g., Store the person with number in table[ ] 10,000,000 memory locations to spare! Numbers are regional. Store the person with number in table[4567] 10,000 memory locations to spare! Hash function: Takes a value of the search key and transforms into an integer used as array index value. Note: the above is an example of a perfect hash function

7 Course: Programming II - Abstract Data Types Hash TablesSlide Number 7 A perfect hash function is an hash function that maps each search key into a unique array index. It is possible if we know in advance all the possible search key values. The collision problem A collision is when two or more elements with search keys x and y are told by the hash function to be stored in the same array location table[i], where i = h(x) = h(y). The two search keys x and y are said to have collided. A way for solving collisions is to provide appropriate collision-resolution schemes. Basic requirements for a good hash function: - be easy and fast to compute - place elements evenly throughout the hash table (i.e. minimizes collisions) In practice, we dont know all possible search key values. An hash function can map two or more search keys into the same integer array index: x y and h(x) = h(y) = i.

8 Course: Programming II - Abstract Data Types Hash TablesSlide Number 8 Examples of Hash Functions (1) Assume hash functions have integers as search keys. Selecting digits: given a search key number composed of a certain number of digits the hash function picks digits at specific places in the search key number: e.g. h( ) = 35 (select the forth and the last digit) Folding: given a search key number, the function defines the index by adding up all the digits in the search key. e.g. h( ) = 29 (add the digits) Or by first grouping the digits and then adding them up. e.g. h( ) = = 1190 (group the digits and add them up) Simple and fast, Generally, does not evenly distribute the elements in the hash table Note: you can apply more than one hash function to a search key

9 Course: Programming II - Abstract Data Types Hash TablesSlide Number 9 Examples of Hash Functions (2) Modulo arithmetic: given a search key number, the function defines the index to be the modulo arithmetic of the search value with some fix number. e.g. h( ) = mod tableSize Converting character string to an integer: given a search key is a string, we could first convert it into an integer, and then apply the hash function. We could think of using different ways of converting strings into a number to get better hash function results. We get lots of collision We can more evenly distribute the elements in the table, if tableSize is prime e.g. h(NOTE) = , using the ASCII values of the letters. h(TONE) =

10 Course: Programming II - Abstract Data Types Hash TablesSlide Number 10 Collision-resolution schemes Two main approaches: 1. Assign another location within the hash table to the new collided element. 2. Change the structure of the hash table: each table[i] can accommodate more than one element. 1. Open addressing schemes: In case of collision, probe for some other empty location to place the element in. The probe sequence of locations used by the add procedure has to be efficiently reproducible by the delete and retrieve procedures. Linear probing: i=7597 mod 101=22 i+1 i+2 i+3 Table locations have to be defined to be in one of three states: empty, deleted, occupied; otherwise, after deletion, the retrieve operation might stop prematurely. Elements tend to cluster together. Parts of the table might be too dense and others relatively empty, making the hashing less efficient.

11 Course: Programming II - Abstract Data Types Hash TablesSlide Number 11 Open addressing schemes (continued) Double hashing: probe sequence is not sequential, but defined using the given search key. It uses the hash function h to calculate the initial index, and a second functionh' to calculate the size of the probing step, using the same search key. The function h' has the following properties: - h' (key) 0 - h' h Example: h (key)= key mod 11 h' (key) = 7 – (key mod 7) 1410 h(14) collision h(14) = 3 h'(14) = 7, i= h(91) collision h(91) = 3 h'(91) = 7, i=(3+7 +7)% h(58) =

12 Course: Programming II - Abstract Data Types Hash TablesSlide Number 12 Restructuring the Hash Table Alter the structure of the hash table so to allow more than one element to be stored at the same location. The array table is defined so that each location table[i] is itself an array, called bucket. Limitation: how to choose the size of each bucket? Separate Chaining The array table is defined as an array of linked lists. Each location table[i] is a reference to a linked list, called the chain, of all the elements that have collided to the same integer i. Buckets public class ChainNode{ private keyedElem elem; private ChainNode next; ……. } public class HashTable{ private final int TABLESIZE=101 private ChainNode[ ] table; private int size; ….. }

13 Course: Programming II - Abstract Data Types Hash TablesSlide Number 13 A Separate Chaining Structure Size Table …. Each location of the hash table contains a reference to a linked list

14 Course: Programming II - Abstract Data Types Hash TablesSlide Number 14 getValue(searchKey){ i = hashIndex(searchKey); node = table[i]; while((node null) && (node.getElem( ).getKey( ) searchKey)) { node = node.getNext( );} if (node != null){ return node.getElem( ); } else return null; } pseudocode add(key, newElem){ searchKey = key; i = hashIndex(searchKey); node = reference to a new node containing newElem; node.setNext(table[i]); table[i] = node; } pseudocode Implementing Hash Table with separate chaining hashIndex is a protected procedure of the class HashTable.

15 Course: Programming II - Abstract Data Types Hash TablesSlide Number 15 Summary Hashing is the process that calculates where in an array a data element should be, rather then searching for it. It allows efficient retrievals, insertions and deletions. Hash function should be easy to compute and it should scatter the elements evenly throughout the table. Collisions occur when two different search keys hash into same array location. Two strategies to resolve collisions, using probing and chaining respectively.

16 Course: Programming II - Abstract Data Types Hash TablesSlide Number 16 Conclusion What is an Abstract Data Type Introduce individual ADTs Understand the data type abstractly Define the specification of the data type Use the data type in small applications, basing solely on its specification Implement the data type Static approach Dynamic approach Some fundamental algorithms for some ADTs: pre-order, in-order and post-order traversal, heapsort Lists Stacks Queues Trees Heaps AVL Trees Hash Tables

17 Course: Programming II - Abstract Data Types Hash TablesSlide Number 17


Download ppt "Course: Programming II - Abstract Data Types Hash TablesSlide Number 1 The ADT Hash Table What is a table? A collection of items, each including several."

Similar presentations


Ads by Google