Dictionaries and Hash Tables Cmput 115 - Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture.

Slides:



Advertisements
Similar presentations
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Advertisements

Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Binary Search Trees Cmput Lecture 23 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
Doubly-Linked Lists Cmput Lecture 16 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
Ordered Containers Cmput Lecture 21 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
Container Traversal Cmput Lecture 20 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
Self-Reference - Induction Cmput Lecture 7 Department of Computing Science University of Alberta ©Duane Szafron 1999 Some code in this lecture is.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Stacks Cmput Lecture 18 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from.
Circularly-Linked Lists Cmput Lecture 17 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
HASHING CSC 172 SPRING 2002 LECTURE 22. Hashing A cool way to get from an element x to the place where x can be found An array [0..B-1] of buckets Bucket.
hashing1 Hashing It’s not just for breakfast anymore!
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Cmput Lecture 15 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from the book:
Queues Cmput Lecture 19 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
The List Interface Cmput Lecture 14 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based.
Hashing General idea: Get a large array
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
Copyright © 2002, Systems and Computer Engineering, Carleton University Hashtable.ppt * Object-Oriented Software Development Unit 8.
Hash Tables1   © 2010 Goodrich, Tamassia.
Ordered Containers CMPUT Lecture 19 Department of Computing Science University of Alberta ©Duane Szafron 2003 Some code in this lecture is based.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
CS 206 Introduction to Computer Science II 11 / 16 / 2009 Instructor: Michael Eckmann.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Sets and Maps Computer Science 4 Mr. Gerb Reference: Objective: Understand the two basic applications of searching.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Building Java Programs Generics, hashing reading: 18.1.
Efficiency of in Binary Trees
Hashing CSE 2011 Winter July 2018.
Advanced Associative Structures
Hashing as a Dictionary Implementation
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
L5. Necessary Java Programming Techniques
Hashing in java.util
Collision Handling Collisions occur when different elements are mapped to the same cell.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

Dictionaries and Hash Tables Cmput Lecture 24 Department of Computing Science University of Alberta ©Duane Szafron 2000 Some code in this lecture is based on code from the book: Java Structures by Duane A. Bailey or the companion structure package Revised 3/28/00

©Duane Szafron About This Lecture In this lecture we will study a container interface called Dictionary and an implementation class called HashTable.

©Duane Szafron Outline Dictionary Interface HashTable Class Iterators External Chaining

©Duane Szafron Dictionary A Dictionary is an unordered container that contains key-value pairs. The keys are unique, but the values are not. 45 "Barney" "Wilma" "Betty" "Fred" keys values

©Duane Szafron Dictionary Hierarchy In Java.util, Dictionary is a class. In the structure package the Dictionary Interface as an extension of the Store Interface. The class HashTable will implement the Dictionary interface. Store Dictionary HashTable

©Duane Szafron Structure Interface - Store public interface Store { public int size(); //post: returns the number of elements contained in // the store. public boolean isEmpty(); // post: returns the true iff store is empty. public void clear(); // post: clears the store so that it contains no // elements. } code based on Bailey pg. 18

©Duane Szafron Structure Interface - Dictionary 1 public interface Dictionary extends Store { public Object put(Object key, Object value); // pre: key is non-null // post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value. // Otherwise, returns null public Object get(Object key); // pre: key is non-null // post: returns the value with the given key or null if // no matching key is found public boolean contains(Object value); // pre: value is non-null // post: returns true iff the Dictionary contains the value code based on Bailey pg. 268

©Duane Szafron Structure Interface - Dictionary 2 public boolean containsKey(Object key); // pre: key is non-null // post: returns true iff the Dictionary contains the key public Object remove(Object key); // pre: key is non-null // post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null public Iterator keys(); // post: returns an Iterator for traversing all keys public Iterator elements(); // post: returns an Iterator for traversing all values } code based on Bailey pg. 267

©Duane Szafron Dictionary - Obvious Implementations We could implement a Dictionary using two parallel containers (Arrays, Vectors, Lists etc.,) one for the keys and one for the values. We could also implement a Dictionary using a single container that holds Associations. In either case, the methods get(Object), put(Object, Object), contains(Object), containsKey(Object) and remove(Object) would each require O(n) calls to the equals(Object) method for Lists. If the keys are Comparable we can reduce the comparisons to log (n) for Arrays and Vectors. Can we do better?

©Duane Szafron Dictionary - Parcel Analogy Assume that you are about to leave a busy mall or amusement park and you are one of about a thousand people picking up a parcel at any time during the day. This is a Dictionary problem with names as keys and parcels as values. Assume the mall has 100 bins that each hold about 10 parcels. How should the mall organize these parcels to minimize waiting time?

©Duane Szafron Parcels - Using Bins When you buy your item, you are asked for the last two digits of your phone number and your parcel is sent to that bin. When you pick up your parcel the attendant asks for the last two digits of your phone number, goes to the correct bin ( ) and searches through the parcels (1-10) to get the one with your name. This is an example of hashing. Each item is assigned a hash number (or index) that is used to select a bin which contains a small number of items that can be searched for your item.

©Duane Szafron Selecting Bin Numbers Would the first two digits of a phone number be as good as the last two digits? There are only a few combinations of first two digits that most local residents share (for example, 42, 43, 44, 45, 46, 47, 48, 92, 96, 98 in Edmonton), so a few bins would overflow and others would be empty. What about using the first two or last two letters of the name of the person? This would take 26*26 = 676 bins but even so, some bins would fuller than others. For maximum efficiency, we want the keys to be uniformly distributed over the bin numbers.

©Duane Szafron Hash Functions A hash function maps keys to bin values. –It should map keys uniformly across all bins. –It should be fast to compute. –It should be applicable to all objects. h(“Paul”) = 28 When two keys map to the same bin, we have a hash collision. When a collision occurs, a collision resolution algorithm is used to establish the locations of the colliding keys in the bin. In some cases when we know all of the key values in advance we can construct a perfect hash function that maps each key to a different bin (no collisions).

©Duane Szafron Hash Tables A hash table is a container (usually an Array or Vector) whose elements are used as bins. In the basic implementation, each entry in the hash table is a bin that holds a single element. “longest” “to” “kiwi” “fifth” hash function = length % 7 “kiwi” 4

©Duane Szafron Hash Tables Collisions If there is a hash collision, the collision resolution algorithm selects a different bin for the new element to be inserted. This is called open addressing. “longest” “to” “kiwi” “fifth” hash function = length % 7 “poem” 4 ?

©Duane Szafron Linear Probing One open addressing algorithm is called linear probing: –Locations are checked from the hash location to the end of the table and the element is placed in the first empty slot. –If the bottom of the table is reached, checking “wraps around” to the start of the table. Collision resolution modifies how a search is done since the match for a search might not be at the hash location. For example, if linear probing is used, the search must continue down the table until a match or empty location is found.

©Duane Szafron Linear Probing Example “longest” “to” “kiwi” “fifth” “poem” hash function = length % 7 4 “poem”

©Duane Szafron Other Open Addressing Schemes Linear probing has an offset value of 1. Instead, we can use a second hash function to generate a different offset from the first hash location using double hashing. “longest” “to” “kiwi” “poem” “fifth” “fred” hash function = length % 7 4 hash function =value (firstChar) 6 (4 + 6) % 7 -> 3 “fred”

©Duane Szafron Element Deletion Problem Open addressing affects element removal. When an element is removed, the “hole” may prevent us from finding another element that hashed to the same location. hash function =length % 7 “poem” 4 “longest” “to” “kiwi” “poem” “fifth” stop before finding “poem”

©Duane Szafron Element Deletion Deletions can be handled in two ways: –Mark the deleted location as Reserved During insertion, a reserved location can be re-used. –Move all of the elements that hashed to the same location as the removed element “up” in the hash table after a deletion. “longest” “to” “kiwi” “poem” Reserved “longest” “to” “kiwi” “poem”

©Duane Szafron Efficiency of HashTables If the number of collisions is small, searching, adding and removing elements in a hash table requires O(C) time. To reduce the number of collisions, in addition to using a good hash function, we must make sure the table does not get too full. The load factor of a hash table is the ratio of full elements to empty elements. For best results, the load factor should not be above 0.6. If it gets higher, we should extend the hash table and re-hash all of its elements.

©Duane Szafron Implementation of HashTable We will use an array of Associations. We will use the Reserved strategy for deletions. We will grow the HashTable when the load factor gets too high. We will cache the logical size to make it easier to determine when the load factor is too high. The size of the HashTable should be a prime, but we will allow the user to specify the initial size and double this size and add one, when it must be grown. (e.g., run some experiments using size 97 vs 100)

©Duane Szafron Example Aho Hopcroft Backus Von Neuman Scott Jacobsen put “Aho”, prog-lang -> 1*3%11=3 put “Scott”, automata -> 19*5%11=7 put “Hopcroft”, automata -> 9 put “Backus”, prog-lang -> 1 put “von Neuman”, archit -> 10 put “Turing”, coding -> 10 put “Jacobsen”, softeng -> 3 Turing hash = value(first char of key)*length(key)

©Duane Szafron Example put “Aho”, prog-lang -> 1*3%23->3 put “Scott”, automata -> 19*5%23->3 put “Hopcroft”, automata -> 9 ->18 put “Backus”, prog-lang -> 1 -> 12 put “von Neuman”, archit -> 10 ->13 put “Turing”, coding -> 10 -> 5 put “Jacobsen”, softeng -> 3 -> Aho Hopcroft Backus Von Neuman Scott Jacobsen Turing hash = value(first char of key)*length(key) Scott Aho Hopcroft Backus von Neuman Turing Jacobsen McCarthy 11 put “McCarthy”,AI -> 13*8%23 ->12 rehash

©Duane Szafron HashTable - State and Constructors class HashTable implements Dictionary { protected static Association reserved = new Association(“reserved”, null); protected Association data[ ]; protected int count; protected int capacity; protected final double loadFactor = 0.6; public HashTable(int initialCapacity) { // pre: initialCapacity > 0 // post: constructs a HashTable with given initial size. this.data = new Association[initialCapacity]; this.capacity = initialCapacity; this.count = 0; } public HashTable() { // post: constructs a HashTable with a default size. this(997); } code based on Bailey pg. 270

©Duane Szafron HashTable - Store Interface /* Interface Store Methods */ public int size() { //post: returns the number of elements in the store. return this.count; } public boolean isEmpty() { // post: returns the true iff store is empty. return this.size() == 0; } public void clear(); // post: clears the store so that it contains no elements. for (index = 0; index < this.capacity; index++) this.data[index] = null; this.count = 0; } code based on Bailey SPackage

©Duane Szafron HashTable - get public Object get(Object key) { // pre: key is non-null // post: returns the value with the given key or null if // no matching key is found int index; Association found; index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) return null; return found.value(); } code based on Bailey pg. 275

©Duane Szafron HashTable - put 1 public Object put(Object key, Object value); // pre: key is non-null // post: puts the key-value pair in this Dictionary. If a // matching key was in this Dictionary, returns the old value. // Otherwise, returns null int index; Association found; Object oldValue; if (count + 1 > this.loadFactor * capacity) this.rehash(); index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) { // not found this.data[index] = new Association(key, value); this.count++; return null; } code based on Bailey pg. 274

©Duane Szafron HashTable - put 2 and containsKey else // found oldValue = found.value(); found.setValue(value); return oldValue; } public boolean containsKey(Object key) { // pre: key is non-null // post: returns true iff the Dictionary contains the key int index; index = this.locate(key); // locate does the work return this.data[index] != null && this.data[index] != reserved; } code based on Bailey pg. 275

©Duane Szafron HashTable - remove public Object remove(Object key); // pre: key is non-null // post: removes a key-value pair whose key is “equal” to // the given key and returns the value. If no matching key // was found, then returns null int index; Association found; Object oldValue; index = this.locate(key); // locate does the work found = this.data[index]; if (found == null || found == reserved) { // not found return null; this.count--; oldValue = found.value(); this.data[index] = reserved; return oldValue; } code based on Bailey pg. 276

©Duane Szafron HashTable - locate 1 protected int locate(Object key); // pre: key is non-null // post: returns ideal index of key in table int index; int reservedIndex; Association found; Object oldValue; index = Math.abs(key.hashCode() % this.capacity); reservedIndex = -1; code based on Bailey pg. 274

©Duane Szafron HashTable - locate 2 while (this.data[index] != null) { if (this.data[index] = reserved { if (reservedIndex == -1) reservedIndex = index; } else if (key.equals(this.data[index].key())) return index; // we have located the key index = (index + 1) % this.capacity; //probe linearly } if (reservedIndex == -1) return index; //haven’t hit reserved key so return index else //return first available (reserved) index return reservedIndex; } code based on Bailey pg. 274

©Duane Szafron HashTable - rehash protected void rehash() { // post: resizes table and re-hashes all elements Association association; Iterator iterator; iterator = new HashtableIterator(this.data); this.capacity = this.capacity * 2 + 1; this.data = new Association[this.capacity]; this.count = 0; while (iterator.hasMoreElements()) { association = (Association) iterator.nextElement(); put(association.key(), association.value()); } code based on Bailey SPackage

©Duane Szafron Iterators Create a HashTableIterator class whose elements are Associations. A HashTableIterator is used in rehash(). Also, let each KeyIterator or ValueIterator be a filter on a HashTableIterator (see textbook).

©Duane Szafron HashtableIterator - public 1 class HashtableIterator implements Iterator { protected int current; protected Association data[ ]; public HashtableIterator(Association[ ] table) { // post: constructs a new hash table iterator this.data = table; this.reset(); } public void reset() { // post: resets iterator to beginning of hash table this.current = 0; this.findNextElement(); } public boolean hasMoreElements() { // post: returns true if there are unvisited elements return this.current < this.data.length; } code based on Bailey SPackage

©Duane Szafron HashtableIterator - public 2 public Object nextElement() { // pre: hasMoreElements() // post: returns current element, increments iterator Object result; result = this.data[this.current]; this.findNextElement(); return result; } public Object value() // pre: hasMoreElements() // post: returns current element (key and value) return this.data[this.current]; } code based on Bailey SPackage

©Duane Szafron HashtableIterator - findNextElement protected void findNextElement() { // post: moves current index to the next real element while (this.current < this.data.length && (this.data[this.current] == null || this.data[this.current] == Hashtable.reserved)) this.current++; } code based on Bailey SPackage

©Duane Szafron External Chaining Instead of implementing a hash table whose entries are associations, we can have a hash table whose entries are containers for associations. Then when there is a hashing collision, we put all elements that collided into a common container “longest” “to” “kiwi” “fifth” “largest” “there” “fred”“association”

©Duane Szafron Some Principles from the Textbook 25. Provide a method for hashing the objects you implement. 26. Equivalent Objects should return equal hash codes. principles from Bailey ch. 13