Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011.

Similar presentations


Presentation on theme: "Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011."— Presentation transcript:

1 Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011

2 Hashing ► In a dictionary, if it can be arranged such that the key is also the index to the array that stores the entries, searching and inserting items would be very fast ► Example: empdata[1000] index = employee ID number ► Search for employee with ID number 500 ► return empdata[500] ► Running Time: O(1)

3 Hash Table ► A data structure implemented as an array of objects, where the search keys correspond to the array indices ► Insert and find operations involve straight forward array accesses: O(1) time complexity

4 About Hash Tables ► In the first example shown, it was relatively easy since employee number is an integer ► A few problems may arise in different situations

5 About Hash Table ► Problem 1: possible integer key values might be too large; creating an appropriate array might be impractical ► Need to map large integer values to smaller array indices ► Problem 2: What if the key is a word in the English Alphabet (e.g. last names) ► Need to map names to integers (indices)

6 Large Values to Small Values ► Hash function: converts a number from a large range into a number from a smaller range (the range of array indices) ► Size of the array ► Rule of thumb: the array size should be about twice the size of the data set ► For 50,000 words, use an array of 100,000 elements

7 Hash Function and Modulo ► Simplest Hash Function: achieved by using the modulo function (returns the remainder) ► For example, 33 % 10 = 3 ► General Formula: LargeNumber % SmallRange

8 Hash Functions for Names ► Sum of Digits Method ► Map the alphabet A to Z to the numbers 1 to 26 (a=1, b=2, c=3, etc) ► Add the total of the letters ► For example, “cats” ► c=3, a=1, t=20, s=19, 3+1+20+19=43 ► “cats” will be stored using index 43 ► Use modulo to map to a smaller array

9 Collisions ► Problem ► Too many words with the same index ► “was”, “tin”, “give”, “tend”, “moan”, “tick” and several other words add to 43 ► These are called collisions: case where two different search keys hash to the same index value

10 Collisions ► Can occur even when dealing with integers ► Suppose the size of the hash table is 100 ► Keys 158 and 358 hash to the same value when using the modulo hash function

11 Collision Resolution Policy ► Need to know what to do when a collision occurs; i.e. during an insert operation; What if the array slot is already occupied? ► Most common policy: go to the next available slot ► “wrap around” the array if necessary

12 Collision Resolution Policy ► Consequence: when searching, use the hash function, first check whether the element is the one you are looking for ► If not, try the next slots ► How do you know if the element is not in the array?

13 Probe Sequence ► Sequence of indices that serve as array slots where a key value would map to ► The first index in the probe sequence is the home position; the value of the hash function ► The next indices are the alternative slots

14 Probe Sequence ► Suppose the array size is 10, and the hash function is h(K) = K%10. ► The probe sequence for K=25 is: ► 5,6,7,8,9,0,1,2,3,4 ► Here, we assume that most common collision resolution policy of going to the next slot: p(K,i) = I ► Goal: exhaust array slots

15 Hash Table Operations ► Insert object Obj with key value K ► home  h(K) for i  0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then throw exception “error” // or overwrite it else if HT[pos] is null then HT[pos]  Obj break;

16 Hash Table Operations ► Finding an object with key value K ► home  h(K) for i  0 to M-1 do pos = (home + p(K,i)) % 10 if HT[pos].getKey() = K then return HT[pos] else if HT[pos] is null then throw exception “not found”

17 Hash Table Operations ► Although insert and find run in O(1) time during typical conditions, the time complexity in the worst-case is O(n) ► Something to think about: characterize the worst-case scenarios for insert and find

18 Removing Elements ► Removing an element from a hash table during a delete operation poses a problem ► If we set the corresponding hash table entry to null, then succeeding find operations might not work properly ► Recall that for the find algorithm, seeing a null means a target element is not found but in fact the element might be in a next slot

19 Removing Elements ► Solution: tombstone ► Arrange it so that deleted entries seem null when inserting, but don’t seem null when searching ► Requires a simple flag on the objects stored

20 Hash Tables in Java ► java.util.Hashtable ► Important methods for Hashtable class ► put(Object key, Object entry) ► Object get(Object key) ► remove(Object key) ► boolean constainsKey(Object key)

21 Summary ► Hash tables implement the dictionary data structure and enable O(1) insert, find, and remove operations ► Caveat: O(n) in the worst-case because of the possibility of collisions

22 Summary ► Requires a hash function(maps keys to array indices) and a collision resolution policy ► Probe sequence depicts a sequence of array slots that an object would occupy, given its key ► In Java: use the Hashtable class


Download ppt "Hashing CS 110: Data Structures and Algorithms First Semester, 2010-2011."

Similar presentations


Ads by Google