Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sets and Maps (and Hashing)

Similar presentations


Presentation on theme: "Sets and Maps (and Hashing)"— Presentation transcript:

1 Sets and Maps (and Hashing)
Chapter 9

2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about hash codes and how they are used to facilitate efficient search and retrieval To study two forms of hash tables—open addressing and chaining—and to understand their relative benefits and performance tradeoffs Chapter 9: Sets and Maps

3 Chapter Objectives To learn how to implement both hash table forms
To be introduced to the implementation of Maps and Sets To see how two earlier applications can be more easily implemented using Map objects for data storage Chapter 9: Sets and Maps

4 Review of Sets Set is unordered, and has no duplicate elements
Suppose A = {1,3,5,7,9,11}, B = {2,3,5,7,11,13} Then A  B = {1,2,3,5,7,9,11,13} A  B = {3,5,7,11} A  B = {1,9} B  A = {2,13} If C = {3,5,9}, then C  A Chapter 9: Sets and Maps

5 Sets and the Set Interface
The part of the Collection hierarchy that relates to sets Includes three interfaces, two abstract classes, and two actual classes Chapter 9: Sets and Maps

6 The Set Abstraction A set is a collection that contains no duplicate elements And at most, one null element In a set, index of an element is meaningless If s is a set, s.contains(“apple”) returns true or false s.indexOf(“apple”) makes no sense s.get(i) is also nonsensical Chapter 9: Sets and Maps

7 The Set Abstraction Operations on sets include: Testing for membership
Adding (inserting) elements Removing elements Union Intersection Difference Subset Chapter 9: Sets and Maps

8 The Set Interface and Methods
Has required methods for … Testing set membership Testing for an empty set Determining set size Creating an iterator over the set Two optional methods for … To add an element To remove an element Constructors enforce no duplicate members, and… …add method does not allow duplicate item Chapter 9: Sets and Maps

9 The Set Interface and Methods
Chapter 9: Sets and Maps

10 Comparison of Lists and Sets
Duplicate elements OK in a list Not allowed in sets: Set.add returns false if you try to insert a duplicate element Get method List has a get method A set has no get method (index is meaningless) Iterators Lists have iterators Can also iterate thru elements in a set Chapter 9: Sets and Maps

11 Maps A map relates one set to another set
Map is a set of ordered pairs (x,y) Where x == key and y == value (element) For example This map is: {(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)} Chapter 9: Sets and Maps

12 Maps Map is a set of ordered pairs (x,y)
Where x == key and y == value (element) Keys must be unique But values need not be unique (onto, not 1-to-1) Each key “maps” to a particular value (element) Or, you might say it “corresponds” to Maps used for very efficient storage and retrieval of information in tables Key is used like index into a list But key does not need to be integer Chapter 9: Sets and Maps

13 Maps Suppose we have the map:
{(J,Jane), (B,Bill), (B2,Bill), (S,Sam), (B1,Bob)} And it is stored in “aMap” Then What does aMap.get(“B2”) return? “Bill” What does aMap.get(“Bill”) return? Null, since nothing in aMap has key == “Bill” Chapter 9: Sets and Maps

14 Map Interface Chapter 9: Sets and Maps

15 Hash Tables For maps, want to access entry by its key, not its value
A hash table is used for such access For efficiency, want to access element directly by its key As opposed to searching for key value in an array Using a hash table we can retrieve an item in constant time, on average, and linear time in worst case That is, O(1) is expected, but O(n) is worst case Chapter 9: Sets and Maps

16 Hash Codes and Index Calculation
Hashing idea Transform an item’s key value into an integer Then use this integer as a numeric index Chapter 9: Sets and Maps

17 Hash Code Index Example
Suppose we want to store number of occurrences of each Unicode characters in a file There are 65,536 Unicode characters What to do? Could create an array of size 65,536 and store count of character i in array element i This will work, but… …very inefficient for a small file Suppose file only has 100 characters! Is there a better way? Chapter 9: Sets and Maps

18 Hash Code Index Calculation
Suppose we want to store number of occurrences of each Unicode characters in a file There are 65,536 Unicode characters File of 100 characters Use a hash code for each character But how to compute hash code? Could do the following: Create an array of size 200 and compute index as index = uniChar % 200 Good since it uses less space Bad if there are collisions 2 or more characters in file “hash” to same value Chapter 9: Sets and Maps

19 Methods for Generating Hash Codes
Usually, keys consist of strings of letters and/or digits The number of possible key values is much larger than the table size Generating a good hash code is something of an art Some experimentation, trial-and-error may be required Desirable properties of a “hash function”? A “random” (uniform) distribution of values Relatively simple function Efficient to compute Collisions can always occur---what to do? Chapter 9: Sets and Maps

20 Java HashCode Method For strings, could simply sum int values of all characters Will return the same hash code for sign and sing The Java API algorithm accounts for position of the characters as follows… The String.hashCode() returns the integer calculated by the formula: s0 x 31(n-1) + s1 x 31(n-2) + … + sn-1 where si is the ith character of the string, and n is the length of the string “Cat” will have a hash code of: ‘C’ x ‘a’ x 31 + ‘t’ Since 31 is a prime number, fewer collisions Chapter 9: Sets and Maps

21 Open Addressing We consider two ways to organize hash tables
Chaining For open addressing, linear probing can be used to deal with collisions If that element contains an item with a different key, increment the index by one Keep incrementing until you find the key or null entry Null indicates element is not in the table Chapter 9: Sets and Maps

22 Open Addressing Algorithm
Chapter 9: Sets and Maps

23 Table Wraparound and Search Termination
As index increases, must wrap around (circular array) Leads to the potential of an infinite loop How do you know when to stop searching if the table is full and you have not found the correct value? Stop when the index value for the next probe is the same as the hash code value for the object, or… Ensure that the table is never full by increasing its size after an insertion if its occupancy rate exceeds a specified threshold (sparser table has fewer collisions) Chapter 9: Sets and Maps

24 Open Addressing Example
Suppose we have the following values and hash codes Name hashCode hashCode % 5 hashCode %11 “Tom” 84274 4 3 “Dick” 5 “Harry” 10 “Sam” 82879 “Pete” 7 Chapter 9: Sets and Maps

25 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data null 1 2 3 4 Chapter 9: Sets and Maps

26 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data null 1 2 3 4 “Tom” Chapter 9: Sets and Maps

27 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 null 2 3 4 “Tom” Chapter 9: Sets and Maps

28 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 null 2 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

29 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 “Sam” 2 null 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

30 Open Addressing Example
Suppose we use hashCode % 5 to create hash table Using open addressing Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” index data “Dick” 1 “Sam” 2 “Pete” 3 “Harry” 4 “Tom” Chapter 9: Sets and Maps

31 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 4 5 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

32 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

33 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 7 8 9 10 Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

34 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 7 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

35 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 “Sam” 7 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

36 Open Addressing Example
Suppose we use hashCode % 11 to create hash table Using open addressing Index data null 1 2 3 “Tom” 4 5 “Dick” 6 “Sam” 7 “Pete” 8 9 10 “Harry” Name hashCode % 5 “Tom” 3 “Dick” 5 “Harry” 10 “Sam” “Pete” 7 Chapter 9: Sets and Maps

37 Hash Table Operations Iterating thru hash table gives entries in “arbitrary” order Deleting from hash table Cannot just insert a null --- why not? Null used for stopping/not found condition Can insert a “dummy value” So, removing does not improve search time Reducing collisions Expand size of hash table, and rehash elements Tradeoff between table size and search efficiency Chapter 9: Sets and Maps

38 Reducing Collisions by Quadratic Probing
Linear probing tends to form clusters of keys in the table, causing longer search chains Quadratic probing can reduce the effect of clustering Increments form a quadratic series Disadvantages? More work to calculate next index (multiplication, addition, and modular division) Not all table elements are examined when looking for an insertion index Chapter 9: Sets and Maps

39 Chaining Chaining is an alternative to open addressing
Each table element references a linked list that contains all of the items that hash to the same table index The linked list is often called a bucket The approach sometimes called bucket hashing Only items that have the same value for their hash codes will be examined when looking for an object Chapter 9: Sets and Maps

40 Chaining Recall hashCode % 5
Chaining creates linked list for each collision In this example Linked list for Tom, Dick, Sam Another linked list for Harry and Pete Name hashCode % 5 “Tom” 4 “Dick” “Harry” 3 “Sam” “Pete” Chapter 9: Sets and Maps

41 Chaining Chapter 9: Sets and Maps

42 Chaining Plusses? Conceptually simple Minimizes table size
Good search efficiency Minuses? Overhead of linked lists (more storage) More complex (perhaps) Chapter 9: Sets and Maps

43 Performance of Hash Tables
Load factor is number of filled cells divided by table size Load factor has greatest effect on performance The lower the load factor, the better the performance Why? Less chance of collision in a sparsely populated table But, smaller the load factor, more wasted space… Chapter 9: Sets and Maps

44 Performance of Hash Tables
Chapter 9: Sets and Maps

45 Maps and Hashing Maps use hash tables!
Hashing converts the key into an index Index is place where corresponding value stored Makes it possible to search efficiently Recall, O(1), on average Without having an (explicit) index Of course, there is some additional overhead Chapter 9: Sets and Maps

46 Implementing a Hash Table
Chapter 9: Sets and Maps

47 Implementing a Hash Table
Chapter 9: Sets and Maps

48 Implementation of Maps and Sets
Class Object implements methods hashCode and equals, so every class can access these methods unless it overrides them Object.equals compares two objects based on their addresses, not their contents Object.hashCode calculates an object’s hash code based on its address, not its contents Java recommends that if you override the equals method, then you should also override the hashCode method Chapter 9: Sets and Maps

49 Implementing HashSetOpen
Chapter 9: Sets and Maps

50 Implementing Java Map and Set Interfaces
The Java API uses a hash table to implement both the Map and Set interfaces The task of implementing the two interfaces is simplified by the inclusion of abstract classes AbstractMap and AbstractSet in the Collection hierarchy Chapter 9: Sets and Maps

51 Nested Interface Map.Entry
One requirement on the key-value pairs for a Map object is that they implement the interface Map.Entry<K, V>, which is an inner interface of interface Map An implementer of the Map interface must contain an inner class that provides code for the methods in the table below Chapter 9: Sets and Maps

52 Additional Applications of Maps
Can implement the phone directory using a map Chapter 9: Sets and Maps

53 Additional Applications of Maps
Huffman Coding Problem Use a map for creating an array of elements and replacing each input character by its bit string code in the output file Frequency table The key will be the input character The value is the character code string Chapter 9: Sets and Maps

54 Chapter Review The Set interface describes an abstract data type that supports the same operations as a mathematical set The Map interface describes an abstract data type that enables a user to access information corresponding to a specified key A hash table uses hashing to transform an item’s key into a table index so that insertions, retrievals, and deletions can be performed in expected O(1) time A collision occurs when two keys map to the same table index In open addressing, linear probing is often used to resolve collisions Chapter 9: Sets and Maps

55 Chapter Review The best way to avoid collisions is to keep the table load factor relatively low by rehashing when the load factor reaches a value such as 0.75 In open addressing, you can’t remove an element from the table when you delete it, but you must mark it as deleted A set view of a hash table can be obtained through method entrySet Two Java API implementations of the Map (Set) interface are HashMap (HashSet) and TreeMap (TreeSet) Chapter 9: Sets and Maps


Download ppt "Sets and Maps (and Hashing)"

Similar presentations


Ads by Google