Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture.

Similar presentations


Presentation on theme: "Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture."— Presentation transcript:

1 Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from the book: Java Structures by Duane A. Bailey or the companion structure package Revised 3/28/00

2 ©Duane Szafron 2000 2 About This Lecture In this lecture we will study a container interface called Dictionary and an implementation class called HashTable.

3 ©Duane Szafron 2000 3Outline Dictionary Interface HashTable Class Iterators External Chaining

4 ©Duane Szafron 2000 4Dictionary A Dictionary is an unordered container that contains key-value pairs. The keys are unique, but the values are not. 45 "Barney" "Wilma" "Betty" "Fred" 39 3739 keys values

5 ©Duane Szafron 2000 5 Dictionary Hierarchy In Java.util, Dictionary is a class. In the structure package the Dictionary Interface as an extension of the Store Interface. The class HashTable will implement the Dictionary interface. Store Dictionary HashTable

6 ©Duane Szafron 2000 6 Structure Interface - Store public interface Store { public int size(); //post: returns the number of elements contained in // the store. public boolean isEmpty(); // post: returns the true iff store is empty. public void clear(); // post: clears the store so that it contains no // elements. } code based on Bailey pg. 18

7 ©Duane Szafron 2000 7 Structure Interface - Dictionary 1 public interface Dictionary extends Store { public Object put(Object key, Object value); // pre: key is non-null // post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value. // Otherwise, returns null public Object get(Object key); // pre: key is non-null // post: returns the value with the given key or null if // no matching key is found public boolean contains(Object value); // pre: value is non-null // post: returns true iff the Dictionary contains the value code based on Bailey pg. 268

8 ©Duane Szafron 2000 8 Structure Interface - Dictionary 2 public boolean containsKey(Object key); // pre: key is non-null // post: returns true iff the Dictionary contains the key public Object remove(Object key); // pre: key is non-null // post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null public Iterator keys(); // post: returns an Iterator for traversing all keys public Iterator elements(); // post: returns an Iterator for traversing all values } code based on Bailey pg. 267

9 ©Duane Szafron 2000 9 Dictionary - Obvious Implementations We could implement a Dictionary using two parallel containers (Arrays, Vectors, Lists etc.,) one for the keys and one for the values. We could also implement a Dictionary using a single container that holds Associations. In either case, the methods get(Object), put(Object, Object), contains(Object), containsKey(Object) and remove(Object) would each require O(n) calls to the equals(Object) method for Lists. If the keys are Comparable we can reduce the comparisons to log (n) for Arrays and Vectors. Can we do better?

10 ©Duane Szafron 2000 10 Dictionary - Parcel Analogy Assume that you are about to leave a busy mall or amusement park and you are one of about a thousand people picking up a parcel at any time during the day. This is a Dictionary problem with names as keys and parcels as values. Assume the mall has 100 bins that each hold about 10 parcels. How should the mall organize these parcels to minimize waiting time?

11 ©Duane Szafron 2000 11 Parcels - Using Bins When you buy your item, you are asked for the last two digits of your phone number and your parcel is sent to that bin. When you pick up your parcel the attendant asks for the last two digits of your phone number, goes to the correct bin (1 - 100) and searches through the parcels (1-10) to get the one with your name. This is an example of hashing. Each item is assigned a hash number (or index) that is used to select a bin which contains a small number of items that can be searched for your item.

12 ©Duane Szafron 2000 12 Selecting Bin Numbers Would the first two digits of a phone number be as good as the last two digits? There are only a few combinations of first two digits that most local residents share (for example, 42, 43, 44, 45, 46, 47, 48, 92, 96, 98 in Edmonton), so a few bins would overflow and others would be empty. What about using the first two or last two letters of the name of the person? This would take 26*26 = 676 bins but even so, some bins would fuller than others. For maximum efficiency, we want the keys to be uniformly distributed over the bin numbers.

13 ©Duane Szafron 2000 13 Hash Functions A hash function maps keys to bin values. –It should map keys uniformly across all bins. –It should be fast to compute. –It should be applicable to all objects. h(“Paul”) = 28 When two keys map to the same bin, we have a hash collision. When a collision occurs, a collision resolution algorithm is used to establish the locations of the colliding keys in the bin. In some cases when we know all of the key values in advance we can construct a perfect hash function that maps each key to a different bin (no collisions).

14 ©Duane Szafron 2000 14 Hash Tables A hash table is a container (usually an Array or Vector) whose elements are used as bins. In the basic implementation, each entry in the hash table is a bin that holds a single element. “longest” “to” “kiwi” “fifth” 01234560123456 hash function = length % 7 “kiwi” 4

15 ©Duane Szafron 2000 15 Hash Tables Collisions If there is a hash collision, the collision resolution algorithm selects a different bin for the new element to be inserted. This is called open addressing. “longest” “to” “kiwi” “fifth” 01234560123456 hash function = length % 7 “poem” 4 ?

16 ©Duane Szafron 2000 16 Linear Probing One open addressing algorithm is called linear probing: –Locations are checked from the hash location to the end of the table and the element is placed in the first empty slot. –If the bottom of the table is reached, checking “wraps around” to the start of the table. Collision resolution modifies how a search is done since the match for a search might not be at the hash location. For example, if linear probing is used, the search must continue down the table until a match or empty location is found.

17 ©Duane Szafron 2000 17 Linear Probing Example “longest” “to” “kiwi” “fifth” 01234560123456 “poem” hash function = length % 7 4 “poem”

18 ©Duane Szafron 2000 18 Other Open Addressing Schemes Linear probing has an offset value of 1. Instead, we can use a second hash function to generate a different offset from the first hash location using double hashing. “longest” “to” “kiwi” “poem” “fifth” 01234560123456 “fred” hash function = length % 7 4 hash function =value (firstChar) 6 (4 + 6) % 7 -> 3 “fred”

19 ©Duane Szafron 2000 19 Element Deletion Problem Open addressing affects element removal. When an element is removed, the “hole” may prevent us from finding another element that hashed to the same location. hash function =length % 7 “poem” 4 “longest” “to” “kiwi” “poem” “fifth” 01234560123456 stop before finding “poem”

20 ©Duane Szafron 2000 20 Element Deletion Deletions can be handled in two ways: –Mark the deleted location as Reserved During insertion, a reserved location can be re-used. –Move all of the elements that hashed to the same location as the removed element “up” in the hash table after a deletion. “longest” “to” “kiwi” “poem” Reserved 01234560123456 “longest” “to” “kiwi” “poem” 01234560123456

21 ©Duane Szafron 2000 21 Efficiency of HashTables If the number of collisions is small, searching, adding and removing elements in a hash table requires O(C) time. To reduce the number of collisions, in addition to using a good hash function, we must make sure the table does not get too full. The load factor of a hash table is the ratio of full elements to empty elements. For best results, the load factor should not be above 0.6. If it gets higher, we should extend the hash table and re-hash all of its elements.

22 ©Duane Szafron 2000 22 Implementation of HashTable We will use an array of Associations. We will use the Reserved strategy for deletions. We will grow the HashTable when the load factor gets too high. We will cache the logical size to make it easier to determine when the load factor is too high. The size of the HashTable should be a prime, but we will allow the user to specify the initial size and double this size and add one, when it must be grown. (e.g., run some experiments using size 97 vs 100)

23 ©Duane Szafron 2000 23Example Aho Hopcroft Backus Von Neuman Scott Jacobsen 0 1 2 3 4 5 6 7 8 9 10 put “Aho”, prog-lang -> 1*3%11=3 put “Scott”, automata -> 19*5%11=7 put “Hopcroft”, automata -> 9 put “Backus”, prog-lang -> 1 put “von Neuman”, archit -> 10 put “Turing”, coding -> 10 put “Jacobsen”, softeng -> 3 Turing hash = value(first char of key)*length(key)

24 ©Duane Szafron 2000 24Example 11 12 13 14 15 16 17 18 19 20 21 22 put “Aho”, prog-lang -> 1*3%23->3 put “Scott”, automata -> 19*5%23->3 put “Hopcroft”, automata -> 9 ->18 put “Backus”, prog-lang -> 1 -> 12 put “von Neuman”, archit -> 10 ->13 put “Turing”, coding -> 10 -> 5 put “Jacobsen”, softeng -> 3 -> Aho Hopcroft Backus Von Neuman Scott Jacobsen 0 1 2 3 4 5 6 7 8 9 10 Turing hash = value(first char of key)*length(key) Scott Aho Hopcroft Backus von Neuman Turing Jacobsen McCarthy 11 put “McCarthy”,AI -> 13*8%23 ->12 rehash

25 ©Duane Szafron 2000 25 HashTable - State and Constructors class HashTable implements Dictionary { protected static Association reserved = new Association(“reserved”, null); protected Association data[ ]; protected int count; protected int capacity; protected final double loadFactor = 0.6; public HashTable(int initialCapacity) { // pre: initialCapacity > 0 // post: constructs a HashTable with given initial size. this.data = new Association[initialCapacity]; this.capacity = initialCapacity; this.count = 0; } public HashTable() { // post: constructs a HashTable with a default size. this(997); } code based on Bailey pg. 270

26 ©Duane Szafron 2000 26 HashTable - Store Interface /* Interface Store Methods */ public int size() { //post: returns the number of elements in the store. return this.count; } public boolean isEmpty() { // post: returns the true iff store is empty. return this.size() == 0; } public void clear(); // post: clears the store so that it contains no elements. for (index = 0; index < this.capacity; index++) this.data[index] = null; this.count = 0; } code based on Bailey SPackage

27 ©Duane Szafron 2000 27 HashTable - get public Object get(Object key) { // pre: key is non-null // post: returns the value with the given key or null if // no matching key is found int index; Association found; index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) return null; return found.value(); } code based on Bailey pg. 275

28 ©Duane Szafron 2000 28 HashTable - put 1 public Object put(Object key, Object value); // pre: key is non-null // post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value. // Otherwise, returns null int index; Association found; Object oldValue; if (count + 1 > this.loadFactor * capacity) this.rehash(); index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) { // not found this.data[index] = new Association(key, value); this.count++; return null; } code based on Bailey pg. 274

29 ©Duane Szafron 2000 29 HashTable - put 2 and containsKey else // found oldValue = found.value(); found.setValue(value); return oldValue; } public boolean containsKey(Object key) { // pre: key is non-null // post: returns true iff the Dictionary contains the key int index; index = this.locate(key); // locate does the work return this.data[index] != null && this.data[index] != reserved; } code based on Bailey pg. 275

30 ©Duane Szafron 2000 30 HashTable - remove public Object remove(Object key); // pre: key is non-null // post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null int index; Association found; Object oldValue; index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) { // not found return null; this.count--; oldValue = found.value(); this.data[index] = reserved; return oldValue; } code based on Bailey pg. 276

31 ©Duane Szafron 2000 31 HashTable - locate 1 protected int locate(Object key); // pre: key is non-null // post: returns ideal index of key in table int index; int reservedIndex; Association found; Object oldValue; index = Math.abs(key.hashCode() % this.capacity); reservedIndex = -1; code based on Bailey pg. 274

32 ©Duane Szafron 2000 32 HashTable - locate 2 while (this.data[index] != null) { if (this.data[index] = reserved { if (reservedIndex == -1) reservedIndex = index; } else if (key.equals(this.data[index].key())) return index; // we have located the key index = (index + 1) % this.capacity; //probe linearly } if (reservedIndex == -1) return index; //haven’t hit reserved key so return index else //return first available (reserved) index return reservedIndex; } code based on Bailey pg. 274

33 ©Duane Szafron 2000 33 HashTable - rehash protected void rehash() { // post: resizes table and re-hashes all elements Association association; Iterator iterator; iterator = new HashtableIterator(this.data); this.capacity = this.capacity * 2 + 1; this.data = new Association[this.capacity]; this.count = 0; while (iterator.hasMoreElements()) { association = (Association) iterator.nextElement(); put(association.key(), association.value()); } code based on Bailey SPackage

34 ©Duane Szafron 2000 34Iterators Create a HashTableIterator class whose elements are Associations. A HashTableIterator is used in rehash(). Also, let each KeyIterator or ValueIterator be a filter on a HashTableIterator (see textbook).

35 ©Duane Szafron 2000 35 HashtableIterator - public 1 class HashtableIterator implements Iterator { protected int current; protected Association data[ ]; public HashtableIterator(Association[ ] table) { // post: constructs a new hash table iterator this.data = table; this.reset(); } public void reset() { // post: resets iterator to beginning of hash table this.current = 0; this.findNextElement(); } public boolean hasMoreElements() { // post: returns true if there are unvisited elements return this.current < this.data.length; } code based on Bailey SPackage

36 ©Duane Szafron 2000 36 HashtableIterator - public 2 public Object nextElement() { // pre: hasMoreElements() // post: returns current element, increments iterator Object result; result = this.data[this.current]; this.findNextElement(); return result; } public Object value() // pre: hasMoreElements() // post: returns current element (key and value) return this.data[this.current]; } code based on Bailey SPackage

37 ©Duane Szafron 2000 37 HashtableIterator - findNextElement protected void findNextElement() { // post: moves current index to the next real element while (this.current < this.data.length && (this.data[this.current] == null || this.data[this.current] == Hashtable.reserved)) this.current++; } code based on Bailey SPackage

38 ©Duane Szafron 2000 38 External Chaining Instead of implementing a hash table whose entries are associations, we can have a hash table whose entries are containers for associations. Then when there is a hashing collision, we put all elements that collided into a common container. 01234560123456 “longest” “to” “kiwi” “fifth” “largest” “there” “fred”“association”

39 ©Duane Szafron 2000 39 Some Principles from the Textbook 25. Provide a method for hashing the objects you implement. 26. Equivalent Objects should return equal hash codes. principles from Bailey ch. 13


Download ppt "Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture."

Similar presentations


Ads by Google