Sets and Maps (and Hashing)

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
Hashing.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Lecture Objectives To learn about hash coding and its use to facilitate efficient search and retrieval To study two forms of hash tables—open addressing.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Sets and Maps ITEC200 – Week Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
Fall 2007CS 225 Sets and Maps Chapter 9. Fall 2007CS 225 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Fall 2007CS 225 Sets and Maps Chapter 7. Fall 2007CS 225 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Maps A map is an object that maps keys to values Each key can map to at most one value, and a map cannot contain duplicate keys KeyValue Map Examples Dictionaries:
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
CSC 427: Data Structures and Algorithm Analysis
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Lecture Objectives  To learn about hash coding and its use to facilitate efficient search and retrieval  To study two forms of hash tables—open addressing.
Building Java Programs Generics, hashing reading: 18.1.
Appendix I Hashing.
Sets and Maps Chapter 9.
CSC 321: Data Structures Fall 2013 Hash tables HashSet & HashMap
Slides by Steve Armstrong LeTourneau University Longview, TX
CSC 427: Data Structures and Algorithm Analysis
Data Structures Using C++ 2E
Efficiency add remove find unsorted array O(1) O(n) sorted array
Hash Table.
Hashing Alexandra Stefan.
Sets and Maps Chapter 9.
Dictionaries 4/5/2019 1:49 AM Hash Tables  
Sets and Maps Chapter 7 CS 225.
Presentation transcript:

Sets and Maps (and Hashing) Chapter 9

Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about hash codes and how they are used to facilitate efficient search and retrieval To study two forms of hash tables—open addressing and chaining—and to understand their relative benefits and performance tradeoffs Chapter 9: Sets and Maps

Chapter Objectives To learn how to implement both hash table forms To be introduced to the implementation of Maps and Sets To see how two earlier applications can be more easily implemented using Map objects for data storage Chapter 9: Sets and Maps

Review of Sets Set is unordered, and has no duplicate elements Suppose A = {1,3,5,7,9,11}, B = {2,3,5,7,11,13} Then A  B = {1,2,3,5,7,9,11,13} A  B = {3,5,7,11} A  B = {1,9} B  A = {2,13} If C = {3,5,9}, then C  A Chapter 9: Sets and Maps

Sets and the Set Interface The part of the Collection hierarchy that relates to sets Includes three interfaces, two abstract classes, and two actual classes Chapter 9: Sets and Maps

The Set Abstraction A set is a collection that contains no duplicate elements And at most, one null element In a set, index of an element is meaningless If s is a set, s.contains(“apple”) returns true or false s.indexOf(“apple”) makes no sense s.get(i) is also nonsensical Chapter 9: Sets and Maps

The Set Abstraction Operations on sets include: Testing for membership Adding (inserting) elements Removing elements Union Intersection Difference Subset Chapter 9: Sets and Maps

The Set Interface and Methods Has required methods for … Testing set membership Testing for an empty set Determining set size Creating an iterator over the set Two optional methods for … To add an element To remove an element Constructors enforce no duplicate members, and… …add method does not allow duplicate item Chapter 9: Sets and Maps

The Set Interface and Methods Chapter 9: Sets and Maps

Comparison of Lists and Sets Duplicate elements OK in a list Not allowed in sets: Set.add returns false if you try to insert a duplicate element Get method List has a get method A set has no get method (index is meaningless) Iterators Lists have iterators Can also iterate thru elements in a set Chapter 9: Sets and Maps

Maps A map relates one set to another set Map is a set of ordered pairs (x,y) Where x == key and y == value (element) For example This map is: {(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)} Chapter 9: Sets and Maps

Maps Map is a set of ordered pairs (x,y) Where x == key and y == value (element) Keys must be unique But values need not be unique (onto, not 1-to-1) Each key “maps” to a particular value (element) Or, you might say it “corresponds” to Maps used for very efficient storage and retrieval of information in tables Key is used like index into a list But key does not need to be integer Chapter 9: Sets and Maps

Maps Suppose we have the map: {(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)} And it is stored in “aMap” Then What does aMap.get(“B2”) return? “Bill” What does aMap.get(“Bill”) return? Null, since nothing in aMap has key == “Bill” Chapter 9: Sets and Maps

Map Interface Chapter 9: Sets and Maps

Hash Tables For maps, want to access entry by its key, not its value A hash table is used for such access For efficiency, want to access element directly by its key As opposed to searching for key value in an array Using a hash table we can retrieve an item in constant time, on average, and linear time in worst case That is, O(1) is expected, but O(n) is worst case Chapter 9: Sets and Maps

Hash Codes and Index Calculation Hashing idea Transform an item’s key value into an integer Then use this integer as a numeric index Chapter 9: Sets and Maps

Hash Code Index Example Suppose we want to store number of occurrences of each Unicode characters in a file There are 65,536 Unicode characters What to do? Could create an array of size 65,536 and store count of character i in array element i This will work, but… …very inefficient for a small file Suppose file only has 100 characters! Is there a better way? Chapter 9: Sets and Maps

Hash Code Index Calculation Suppose we want to store number of occurrences of each Unicode characters in a file There are 65,536 Unicode characters File of 100 characters Use a hash code for each character But how to compute hash code? Could do the following: Create an array of size 200 and compute index as index = uniChar % 200 Good since it uses less space Bad if there are collisions 2 or more characters in file “hash” to same value Chapter 9: Sets and Maps

Methods for Generating Hash Codes Usually, keys consist of strings of letters and/or digits The number of possible key values is much larger than the table size Generating a good hash code is something of an art Some experimentation, trial-and-error may be required Desirable properties of a “hash function”? A “random” (uniform) distribution of values Relatively simple function Efficient to compute Collisions can always occur---what to do? Chapter 9: Sets and Maps

Java HashCode Method For strings, could simply sum int values of all characters Will return the same hash code for sign and sing The Java API algorithm accounts for position of the characters as follows… The String.hashCode() returns the integer calculated by the formula: s0 x 31(n-1) + s1 x 31(n-2) + … + sn-1 where si is the ith character of the string, and n is the length of the string “Cat” will have a hash code of: ‘C’ x 312 + ‘a’ x 31 + ‘t’ Since 31 is a prime number, fewer collisions Chapter 9: Sets and Maps

Open Addressing We consider two ways to organize hash tables Chaining For open addressing, linear probing can be used to deal with collisions If that element contains an item with a different key, increment the index by one Keep incrementing until you find the key or null entry Null indicates element is not in the table Chapter 9: Sets and Maps

Open Addressing Algorithm Chapter 9: Sets and Maps

Table Wraparound and Search Termination As index increases, must wrap around (circular array) Leads to the potential of an infinite loop How do you know when to stop searching if the table is full and you have not found the correct value? Stop when the index value for the next probe is the same as the hash code value for the object, or… Ensure that the table is never full by increasing its size after an insertion if its occupancy rate exceeds a specified threshold (sparser table has fewer collisions) Chapter 9: Sets and Maps

Open Addressing Example Suppose we have the following values and hash codes Name hashCode hashCode % 5 hashCode %11 “Tom” 84274 4 3 “Dick” 2129869 5 “Harry” 69496448 10 “Sam” 82879 “Pete” 2484038 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data null 1 2 3 4 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data null 1 2 3 4 “Tom” Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 null 2 3 4 “Tom” Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 null 2 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 “Sam” 2 null 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 “Sam” 2 “Pete” 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 4 5 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 7 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 “Sam” 7 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Open Addressing Example Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 “Sam” 7 “Pete” 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

Hash Table Operations Iterating thru hash table gives entries in “arbitrary” order Deleting from hash table Cannot just insert a null --- why not? Null used for stopping/not found condition Can insert a “dummy value” So, removing does not improve search time Reducing collisions Expand size of hash table, and rehash elements Tradeoff between table size and search efficiency Chapter 9: Sets and Maps

Reducing Collisions by Quadratic Probing Linear probing tends to form clusters of keys in the table, causing longer search chains Quadratic probing can reduce the effect of clustering Increments form a quadratic series Disadvantages? More work to calculate next index (multiplication, addition, and modular division) Not all table elements are examined when looking for an insertion index Chapter 9: Sets and Maps

Chaining Chaining is an alternative to open addressing Each table element references a linked list that contains all of the items that hash to the same table index The linked list is often called a bucket The approach sometimes called bucket hashing Only items that have the same value for their hash codes will be examined when looking for an object Chapter 9: Sets and Maps

Chaining Recall hashCode % 5 Chaining creates linked list for each collision In this example Linked list for Tom, Dick, Sam Another linked list for Harry and Pete Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” Chapter 9: Sets and Maps

Chaining Chapter 9: Sets and Maps

Chaining Plusses? Conceptually simple Minimizes table size Good search efficiency Minuses? Overhead of linked lists (more storage) More complex (perhaps) Chapter 9: Sets and Maps

Performance of Hash Tables Load factor is number of filled cells divided by table size Load factor has greatest effect on performance The lower the load factor, the better the performance Why? Less chance of collision in a sparsely populated table But, smaller the load factor, more wasted space… Chapter 9: Sets and Maps

Performance of Hash Tables Chapter 9: Sets and Maps

Maps and Hashing Maps use hash tables! Hashing converts the key into an index Index is place where corresponding value stored Makes it possible to search efficiently Recall, O(1), on average Without having an (explicit) index Of course, there is some additional overhead Chapter 9: Sets and Maps

Implementing a Hash Table Chapter 9: Sets and Maps

Implementing a Hash Table Chapter 9: Sets and Maps

Implementation of Maps and Sets Class Object implements methods hashCode and equals, so every class can access these methods unless it overrides them Object.equals compares two objects based on their addresses, not their contents Object.hashCode calculates an object’s hash code based on its address, not its contents Java recommends that if you override the equals method, then you should also override the hashCode method Chapter 9: Sets and Maps

Implementing HashSetOpen Chapter 9: Sets and Maps

Implementing Java Map and Set Interfaces The Java API uses a hash table to implement both the Map and Set interfaces The task of implementing the two interfaces is simplified by the inclusion of abstract classes AbstractMap and AbstractSet in the Collection hierarchy Chapter 9: Sets and Maps

Nested Interface Map.Entry One requirement on the key-value pairs for a Map object is that they implement the interface Map.Entry<K, V>, which is an inner interface of interface Map An implementer of the Map interface must contain an inner class that provides code for the methods in the table below Chapter 9: Sets and Maps

Additional Applications of Maps Can implement the phone directory using a map Chapter 9: Sets and Maps

Additional Applications of Maps Huffman Coding Problem Use a map for creating an array of elements and replacing each input character by its bit string code in the output file Frequency table The key will be the input character The value is the character code string Chapter 9: Sets and Maps

Chapter Review The Set interface describes an abstract data type that supports the same operations as a mathematical set The Map interface describes an abstract data type that enables a user to access information corresponding to a specified key A hash table uses hashing to transform an item’s key into a table index so that insertions, retrievals, and deletions can be performed in expected O(1) time A collision occurs when two keys map to the same table index In open addressing, linear probing is often used to resolve collisions Chapter 9: Sets and Maps

Chapter Review The best way to avoid collisions is to keep the table load factor relatively low by rehashing when the load factor reaches a value such as 0.75 In open addressing, you can’t remove an element from the table when you delete it, but you must mark it as deleted A set view of a hash table can be obtained through method entrySet Two Java API implementations of the Map (Set) interface are HashMap (HashSet) and TreeMap (TreeSet) Chapter 9: Sets and Maps