Sets, Maps and Hash Tables

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

Binary Trees. DCS – SWC 2 Binary Trees Sets and Maps in Java are also available in tree-based implementations A Tree is – in this context – a data structure.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
CS2110: SW Development Methods Textbook readings: MSD, Chapter 8 (Sect. 8.1 and 8.2) But we won’t implement our own, so study the section on Java’s Map.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 22 Java Collections.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Big Java Chapter 16.
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Hash Tables1   © 2010 Goodrich, Tamassia.
Chapter 18 Java Collections Framework
Sets, Maps and Hash Tables. RHS – SOC 2 Sets We have learned that different data struc- tures have different advantages – and drawbacks Choosing the proper.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables and Hash Maps. DCS – SWC 2 Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen- tation in order to use.
Chapter 11 Sets © 2006 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Lecture 9:FXML and Useful Java Collections Michael Hsu CSULA.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Collections ABCD ABCD Head Node Tail Node array doubly linked list Traditional Arrays and linked list: Below is memory representation of traditional.
Appendix I Hashing.
Sets and Maps Chapter 9.
Slides by Donald W. Smith
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Sections 10.5 – 10.6 Hashing.
Using the Java Collection Libraries COMP 103 # T2
Slides by Donald W. Smith
Compsci 201 Midterm 1 Review
Hashing & HashMaps CS-2851 Dr. Mark L. Hornick.
Chapter 19 Java Data Structures
Priority Queues and Heaps
Hashing CSE 2011 Winter July 2018.
Data Structures TreeMap HashMap.
© 2013 Goodrich, Tamassia, Goldwasser
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash functions Open addressing
Binary Trees.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
TreeSet TreeMap HashMap
Hashing CS2110 Spring 2018.
Hashing II CS2110 Spring 2018.
TCSS 342, Winter 2006 Lecture Notes
Hash Tables.
Hashing CS2110.
Indexing and Hashing Basic Concepts Ordered Indices
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Dictionaries 4/5/2019 1:49 AM Hash Tables  
How to use hash tables to solve olympiad problems
CSE 373 Separate chaining; hash codes; hash maps
slides created by Marty Stepp
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Lecture-Hashing.
Presentation transcript:

Sets, Maps and Hash Tables

Sets We have learned that different data struc-tures have different advantages – and drawbacks Choosing the proper data structure depends on typical usage patterns Array- and list-oriented data structures are appropriate when the order of elements matter – but that is not always the case RHS – SWC

Sets A Set is a data structure which can hold an unordered collection of elements Not having to worry about ordering can improve performance of other operations On a Set, we want to be able to Insert an element Delete an element Check if a given element is in the Set RHS – SWC

Sets public interface Set<T> { void add(T element); void remove(); boolean contains(T element); Iterator<T> iterator(); } RHS – SWC

Sets It turns out that insertion, deletion and check for containment can be done in O(log(n)), or even faster! Depends on the underlying implemen-tation of the interface In Java, implementation is either HashSet (based on Hash Tables) TreeSet (based on Trees) RHS – SWC

Sets A Set iterator is ”simpler” than e.g. a List iterator Elements will occur in ”random” order No add method – we just call add on the Set itself No previous method – does not make sense The Set iterator does however have a delete method (why?) RHS – SWC

Sets – Quality tip When using a Set, we must choose a spe-cific implementation (HashSet or TreeSet) However, the definition should look like: Set<Car> cars = new HashSet<Car>(); RHS – SWC

Sets – Quality tip Set<Car> cars = new HashSet<Car>(); Why…? We should in general only refer to the interface, not the implementation Easy to switch implementation! RHS – SWC

Maps A Map is a data structure which stores associations between A collection of keys A collection of values All keys map to a value Keys are unique (values are not) RHS – SWC

Maps K1 V1 K2 V2 K3 V3 K4 RHS – SWC

Map public interface Map<K,V> { void put(K key,V value); V get(K key); void remove(K key); Set<K> keySet(); } RHS – SWC

Map The keySet method returns a Set containing all keys in the Map You must then iterate through this Set, in order to get all values stored in the Map RHS – SWC

Map Map<String,Car> carMap = new HashMap<String,Car>(); ... Set<String> regNumbers = carMap.keySet(); for (String regNo : regNumbers) { Car aCar = carMap.get(regNo); ... // Do something with the Car object } RHS – SWC

Hash Tables A Set and a Map are both abstract data types – we need a concrete implemen-tation in order to use them In the Java library, two implementations are available: Sets: HashSet, TreeSet Maps: HashMap, TreeMap RHS – SWC

Hash Tables The implementations HashSet and HashMap are based on a Hash Table A Hash Table is based on the below ideas: Create an array of length N, which can store objects of some type T Find a mapping from T to the interval [0; N-1] (a Hash Function f) Store an object t of type T in the position f(t) RHS – SWC

Hash Tables 1 2 3 4 f(Car1) = 3 Car3 f(Car2) = 0 f(Car3) = 2 Car1 Car2 1 2 3 4 RHS – SWC

Hash Tables A Hash Table is thus ”almost” an array Instead of having an index directly available, we must calculate it If calculation can be done in constant time, then all basic operations (insert, delete, lookup) can be done in constant time! Better than tree-based implementations, which have O(log(N)) RHS – SWC

Hash Tables However, there are some issues: How do we define a good mapping from the objects to [0; N-1]? What happens if we try to store two objects at the same position? RHS – SWC

Hash Functions Before finding a good mapping – i.e. a good hash function – we must consider the size of the array For good performance, the array should at least be as large as the maximal number of objects stored Rule of thumb is about 30 % larger Size should be a prime number (???) RHS – SWC

Hash Functions What if the expected number of objects is unknown in advance? We can expand a hash table dynamically If the hash table in running out of space, double the capacity Start out with a reasonably large array (space is cheap…) RHS – SWC

Hash Functions Having handled the choice of N, how do we define a proper hash function? Properties of a hash function: Must map all objects of type T to the interval [0; N-1] Should map objects as uniformly as possible to the interval [0; N-1] RHS – SWC

Hash Functions We can enforce the mapping to [0;N-1] by using the modulo operator: f(t) = g(t) % N g(t) can then produce any integer value How do we achieve a uniform distribution? Theory for this is complicated, but there are some general rules to follow RHS – SWC

Hash Functions A good hash function should be ”almost ran-dom”, but deterministic ”Almost random” – values are well distri-buted in the interval Deterministic – always produce the same output for the same input RHS – SWC

Hash Functions In Java, all objects have a hashCode method Defined in Object class Can be overrided Returns an integer (the Hash Code) We must use modulo on the value ourselves RHS – SWC

Hash Functions Hash function for integers: Hash function for strings: The number itself… Hash function for strings: final int HASH_MULTIPLIER = 31; int h = 0; for (int i = 0; i < s.length; i++) h = (HASH_MULTIPLIER * h) + s.charAt(i); RHS – SWC

Hash Functions Hash code for an object can be calculated by combining hash codes for instance fields Combine values in a way similar to the algorithm used to find string hash codes RHS – SWC

Hash Functions public int hashCode() { final int MULTIPLIER = 31; int h1 = regNo.hashCode(); int h2 = mileage; int h3 = model.hashCode(); int h = h1*MULTIPLIER + h2; h = h*MULTIPLIER + h3; return h; } RHS – SWC

Hash Functions But wait…what about numeric overflow? We multiply a ”random” integer value with a number…? Does not really matter… As long as the algorithm is deterministic, overflow is not a problem Just helps ”scrambling” the value  RHS – SWC

Hash Functions Common pitfalls: Remember to define a hashCode function If you forget, the hashCode implementation in Object is used Based solely on memory location of object Two objects with the same value of instance fields will produce different hash codes… RHS – SWC

Hash Functions Common pitfalls: The hashCode function must be ”compatible” with your equals function If a.equals(b) it must hold that a.hashCode() == b.hashCode() If not, duplicates are allowed! The reverse condition is not required; two different objects may have the same hash code RHS – SWC

Hash Functions In general, you must remember to: Either define the hashCode and the equals method Or not define any of them! RHS – SWC

Handling collisions Even with a good hash function, we will still experience collisions Collision: two different objects t1 and t2 have the same hash code We will then try to store both objects in the same position in the array Now what…? RHS – SWC

Handling collisions What we store in each position in the array is not the objects themselves, but a linked list of objects Objects with the same hash code h are stored in the linked list in position h With a good hash function, the average length of non-empty lists is less than 2 RHS – SWC

Handling collisions Car6 Car4 Car2 Car3 Car1 Car5 1 2 3 4 RHS – SWC

Handling collisions Basic operations (insert, delete, lookup) follow this structure: Calculate hash code for the object Find the corresponding position in the array Insert: Insert element at the end of list Delete/Lookup: Iterate through list until element is found, or end of list is reached RHS – SWC

Handling collisions Basic operations are thus not done in truly constant time However, if a proper hash function is used, running time is constant in practice Use hash-based implementations unless special circumstances apply Hard to define hash/equals function More functionality required RHS – SWC