Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.

Similar presentations


Presentation on theme: "CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive."— Presentation transcript:

1 CS203 Lecture 14

2 Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive suppose your collection of Strings stores the text of various books, you are adding a book, and you need to make sure you are preserving the Set definition – ie that no book already in the Set has the same text as the one you are adding. A hash function maps each datum to a value to a fixed and manageable size. This reduces the search space and makes searching less expensive Hash functions must be deterministic, since when we search for an item we will search for its hashed value. If an identical item is in the list, it must have received the same hash value 2

3 Hashing Any function that maps larger data to smaller ones must map more than one possible original datum to the same mapped value Diagram from Wikipedia 3 When more than one item in a collection receives the same hash value, a collision is said to occur. There are various ways to deal with this. The simplest is to create a list of all items with the same hash code, and do a sequential or binary search of the list if necessary

4 Hashing Hash functions often use the data to calculate some numeric value, then perform a modulo operation. This results in a bounded-length hash value. If the calculation ends in % n, the maximum hash value is n-1, no matter how large the original data is. It also tends to produce hash values that are relatively uniformly distributed, minimizing collisions. Modulo may be used again within a hash-based data structure in order to scale the hash values to the number of keys found in the structure 4

5 Hashing Sparse: A AA AB AC AD AE AF AG AH AI AJ AK AL AM AN AO AP AQ AR AS AT AU AV AW AX AY AZ Not Sparse: Juror 1 Juror 2 Juror 3 Juror 4 5 The more sparse the data, the more useful hashing is

6 Hashing Hashing is used for many purposes in programming. The one we are interested in right now is that it makes it easier to look up data or memory addresses in a table 6

7 Why Hashing? 7 The preceding units introduced search trees. An element can be found in O(logn) time in a well-balanced search tree. There is a more efficient way to search for an element in a container: you can use hashing to implement a map or a set to search, insert, and delete an element in an array in O(1) time.

8 Map 8 Recall that a map stores entries. Each entry contains two parts: key and value. The key is also called a search key, which is used to search for the corresponding value. For example, a dictionary can be stored in a map, where the words are the keys and the definitions of the words are the values. A map is also called a dictionary, a hash table, or an associative array.

9 What is Hashing? 9 If you know the index of an element in the array, you can retrieve the element using the index in O(1) time. So, can we store the values in an array and use the key as the index to find the value? The answer is yes if you can map a key to an index. The array that stores the values is called a hash table. The function that maps a key to an index in the hash table is called a hash function. Hashing is a technique that retrieves the value using the index obtained from key without performing a search.

10 Hash Function and Hash Codes 10 A typical hash function first converts a search key to an integer value called a hash code, and then compresses the hash code into an index to the hash table.

11 What is Hashing? 11 A collision occurs when multiple values receive the same hash code. There are several standard ways to deal with collisions. The first approach, open addressing, involves ways of finding available addresses in the table. The simplest form of this is linear probing, which checks whether the key is found at the expected place in the array. If not, it checks the next spot, etc. There are more complex variants of open addressing, which you can read about in the textbook. The second approach, separate chaining, uses a list at each location in the hash table. When looking up a value by hash, search the list of all nodes with that hash value.

12 Linear Probing 12

13 Handling Collisions Using Separate Chaining 13 The separate chaining scheme places all entries with the same hash index into the same location, rather than finding new locations. Each location in the separate chaining scheme is called a bucket. A bucket is a container that holds multiple entries.

14 Implementing Map Using Hashing 14 Run TestMyHashMap MyHashMap MyMap


Download ppt "CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive."

Similar presentations


Ads by Google