ADSA: Hashing/10 1 241-423 Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing,

Slides:



Advertisements
Similar presentations
Hashing as a Dictionary Implementation
Advertisements

Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing. 2 Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Dictionaries and Hash Tables1  
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing. 2 Preview A hash function is a function that: When applied to an Object, returns a number When applied to equal Objects, returns the same number.
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Hashing. Searching Consider the problem of searching an array for a given value If the array is not sorted, the search requires O(n) time If the value.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
Hashing.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing.
Dictionaries 4/5/2019 1:49 AM Hash Tables  
Hashing.
Algorithms: Design and Analysis
Collision Handling Collisions occur when different elements are mapped to the same cell.
Podcast Ch21a Title: Hash Functions
Hashing.
Hashing.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Hashing.
Dictionaries and Hash Tables
Lecture No.42 Data Structures Dr. Sohail Aslam.
Presentation transcript:

ADSA: Hashing/ Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing, double hashing, bucket hashing, JDK hash classes Semester 2, Hashing

ADSA: Hashing/10 2 Contents 1. Searching an Array 2. Hashing 3. Creating a Hash Function 4. Solution #1: Linear Probing 5. Solution #2: Double Hashing 6. Solution #3: Bucket Hashing 7. Table Resizing 8. Java's hashCode() 9. Hash Tables in Java 10. Efficiency 11. Ordered/Unordered Sets and Maps

ADSA: Hashing/ Searching an Array If the array is not sorted, a search requires O(n) time. If the array is sorted, a binary search requires O(log n) time If the array is organized using hashing then it is possible to have constant time search: O(1).

ADSA: Hashing/ Hashing A hash function takes a search item and returns its array location (its index position). The array + hash function is called a hash table. Hash tables support efficient insertion and search in O(1) time – –but the hash function must be carefully chosen

ADSA: Hashing/10 5 A Simple Hash Function The simple hash function: hashCode("apple") = 5 hashCode("watermelon") = 3 hashCode("grapes") = 8 hashCode("cantaloupe") = 7 hashCode("kiwi") = 0 hashCode("strawberry") = 9 hashCode("mango") = 6 hashCode("banana") = 2 kiwi banana watermelon apple mango cantaloupe grapes strawberry

ADSA: Hashing/10 6 A Table with Key-Value Pairs A hash table for a map storing (ID, Name) items, – –ID is a nine-digit integer The hash table is an array of size N  10,000 The hash function is hashCode(ID)  last four digits of the ID key     … ( , tim) ( , ad) ( , tim) ( , jim)

ADSA: Hashing/10 7 Applications of Hash Tables Small databases Compilers Web Browser caches

ADSA: Hashing/ Creating a Hash Function The hash function should return the table index where an item is to be placed – –but it's not possible to write a perfect hash function – –the best we can usually do is to write a hash function that tells us where to start looking for a location to place the item

ADSA: Hashing/10 9 A Real Hash Function A more realistic hash function produces: – –hash("apple") = 5 hash("watermelon") = 3 hash("grapes") = 8 hash("cantaloupe") = 7 hash("kiwi") = 0 hash("strawberry") = 9 hash("mango") = 6 hash("banana") = 2 hash("honeydew") = 6 kiwi banana watermelon apple mango cantaloupe grapes strawberry Now what?

ADSA: Hashing/10 10 Collisions A collision occurs when two item hash to the same array location – –e.g "mango" and "honeydew" both hash to 6 Where should we place the second and other items that hash to this same location? Three popular solutions: – –linear probing, double hashing, bucket hashing (chaining)

ADSA: Hashing/ Solution #1: Linear Probing o o Linear probing handles collisions by placing the colliding item in the next empty table cell (perhaps after cycling around the table) o o Each inspected table cell is called a probe – –in the above example, three probes were needed to insert Key = 32 to be added [hash(32) = 6] 32

ADSA: Hashing/10 12 Example 1 A hash table with N = 13 and h(k) = k mod 13 Insert keys: 18, 41, 22, 44, 59, 32, 31, 73 Total number of probes:

ADSA: Hashing/10 13 Example 2: Insertion Suppose we want to add "seagull": – –hash(seagull) = 143 – –3 probes: table[143] is not empty; table[144] is not empty; table[145] is empty – –put seagull at location 145 robin sparrow hawk bluejay owl seagull

ADSA: Hashing/10 14 Searching Look up "seagull": – –hash(seagull) = 143 – –3 probes: table[143] != seagull; table[144] != seagull; table[145] == seagull – –found seagull at location 145 robin sparrow hawk bluejay owl seagull

ADSA: Hashing/10 15 Searching Again Look up "cow": – –hash(cow) = 144 – –3 probes: table[144] != cow; table[145] != cow; table[146] is empty – –"cow" is not in the table since the probes reached an empty cell robin sparrow hawk bluejay owl seagull

ADSA: Hashing/10 16 Insertion Again Add "hawk": – –hash(hawk) = 143 – –2 probes: table[143] != hawk; table[144] == hawk – –hawk is already in the table, so do nothing robin sparrow hawk seagull bluejay owl

ADSA: Hashing/10 17 Insertion Again Add "cardinal": – –hash(cardinal) = 147 – –3 or more probes: 147 and 148 are occupied; "cardinal" goes in location 0 (or 1, or 2, or...) robin sparrow hawk seagull bluejay owl

ADSA: Hashing/10 18 Search with Linear Probing o o Search hash table A o o get(k) – –start at cell h(k) – –probe consecutive locations until: An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed (N is the table size) Algorithm get(k) i  h(k) p  0 // count num of probes repeat c  A[i] if c   // empty cell return null else if c.key ()  k return c.element() else // linear probing i  (i  1) mod N p  p  1 until p  N return null

ADSA: Hashing/10 19 Lazy Deletion Deletions are done by marking a table cell as deleted, rather than emptying the cell. Deleted locations are treated as empty when inserting and as occupied during a search.

ADSA: Hashing/10 20 Updates with Lazy Deletion o o delete(k) – –Start at cell h(k) – –Probe consecutive cells until: A cell with key k is found: put DELETED in cell; return true or an empty cell is found return false or N cells have been probed return false o o insert(k, v) – –Start at cell h(k) – –Probe consecutive cells until: A cell i is found that is either empty or contains DELETED put v in cell; return true or N cells have been probed return false

ADSA: Hashing/10 21 Clustering Linear Probing tends to form “clusters”. – –a cluster is a sequence of non-empty array locations with no empty cells among them e.g. the cluster in Example 1 on slide 12 The bigger a cluster gets, the more likely it is that new items will hash into that cluster, and make it ever bigger. Clusters reduce hash table efficiency – –searching becomes sequential (O(n)) continued

ADSA: Hashing/10 22 If the size of the table is large relative to the number of items, linear probing is fast ( O(1)) – –a good hash function generates indices that are evenly distributed over the table range, and collisions will be minimal As the ratio of the number of items (n) to the table size (N) approaches 1, hashing slows down to the speed of a sequential search ( O(n)).

ADSA: Hashing/ Solution #2: Double Hashing In the event of a collision, compute a second 'offset' hash function – –this is used as an offset from the collision location Linear probing always uses an offset of 1, which contributes to clustering. Hashing makes the offset more random.

ADSA: Hashing/10 24 Example of Double Hashing A hash table with N = 13, h(k) = k mod 13, and d(k) = 7 - k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73 Total number of probes:

ADSA: Hashing/ Solution #3: Bucket Hashing The previous solutions use open hashing: – –all items go into the array In bucket hashing, an array cell points to a linked list of items that all hash to that cell. robin sparrow hawk bluejay owl seagull also called chaining continued

ADSA: Hashing/10 26 Chaining is generally faster than linear probing: – –searching only examines items that hash to the same table location With linear probing and double hashing, the number of table items (n) is limited to the table size (N), whereas the linked lists in chaining can keep growing. To delete an element, just erase it from its list.

ADSA: Hashing/ Table Resizing As the number of items in the hash table increases, search speed goes down. Increase the hash table size when the number of items in the table is a specified percentage of its size. Works with open chaining also. continued

ADSA: Hashing/10 28 Create a new table with the specified size and cycle through the items in the original table. For each item, use the hash() value modulo the new table size to hash to a new index. Insert the item at the front of the linked list.

ADSA: Hashing/ Java's hashCode() public int hashCode() is defined in Object – –it returns the memory address of the object hashCode() does not know the size of the hash table – –the returned value must be adjusted e.g hashCode() % N hashCode() can be overridden in your classes

ADSA: Hashing/10 30 Coding your own hashCode() Your hashCode() must: – –always return the same value for the same item it can’t use random numbers, or the time of day – –always return the same value for equal items if o1.equals(o2) is true then hashCode() for o1 and o2 must be the same number continued

ADSA: Hashing/10 31 A good hashCode() should: – –be fast to evaluate – –produce uniformly distributed hash values this spreads the hash table indices around the table, which helps minimize collisions – –not assign similar hash values to similar items

ADSA: Hashing/10 32 In the majority of hash table applications, the key is a string. – –combine the string's characters to form an integer public int hashCode() { int hash = 0; for (int i = 0; i < s.length; i++) hash = 31*hash + s[i]; return hash; } String Hash Function continued

ADSA: Hashing/10 33 String strA = "and"; String strB = "uncharacteristically"; String strC = "algorithm"; hashVal = strA.hashCode();// hashVal = hashVal = strB.hashCode();// hashVal = hashVal = strC.hashCode();// hashVal = A hash function might overflow and return a negative number. The following code insures that the table index is nonnegative. tableIndex = (hashVal & Integer.MAX_VALUE) % tableSize

ADSA: Hashing/10 34 Time24 Hash Function For the Time24 class, the hash value for an object is its time converted to minutes. Since each hour is 60 mins more than the last, and a minute is between 0--59, then each hash is unique. public int hashCode() { return hour*60 + minute; }

ADSA: Hashing/ Hash Tables in Java Java provides HashSet, Hashtable and HashMap in java.util – –HashSet is a set – –Hashtable and HashMap are maps continued

ADSA: Hashing/10 36 Hashtable is synchronized; it can be accessed safely from multiple threads – –Hashtable uses an open hash, and has rehash() for resizing the table HashMap is newer, faster, and usually better, but it is not synchronized – –HashMap uses a bucket hash, and has a remove() method

ADSA: Hashing/10 37 Hash Table Operations HashSet, Hashtable and HashMap have no- argument constructors, and constructors that take an integer table size. HashSet has add(), contains(), remove(), iterator(), etc. continued

ADSA: Hashing/10 38 Hashtable and HashMap include: – –public Object put(Object key, Object value) returns the previous value for this key, or null – –public Object get(Object key) – –public void clear() – –public Set keySet() dynamically reflects changes in the hash table – –many others

ADSA: Hashing/10 39 Using HashMap A HashMap with Strings as keys and values "Charles Nguyen" HashMap "(531) " "Lisa Jones""(402) " "William H. Smith""(998) " A telephone book

ADSA: Hashing/10 40 Coding a Map HashMap phoneBook = new HashMap (); phoneBook.put("Charles Nguyen", "(531) "); phoneBook.put("Lisa Jones", "(402) "); phoneBook.put("William H. Smith", "(998) "); String phoneNumber = phoneBook.get("Lisa Jones"); System.out.println( phoneNumber ); prints: (402)

ADSA: Hashing/10 41 HashMap h = new HashMap (100, /*capacity*/ 0.75f /*load factor*/ ); h.put( "WA", "Washington" ); h.put( "NY", "New York" ); h.put( "RI", "Rhode Island" ); h.put( "BC", "British Columbia" );

ADSA: Hashing/10 42 Capacities and Load Factors HashMaps round capacities up to powers of two HashMaps round capacities up to powers of two –e.g > 128 –default capacity is 16; load factor is 0.75 The load factor is used to decide when it is time to double the size of the table The load factor is used to decide when it is time to double the size of the table –just after you have added 96 elements in this example –128 * 0.75 == 96 Hashtables work best with capacities that are prime numbers. Hashtables work best with capacities that are prime numbers.

ADSA: Hashing/ Efficiency Hash tables are efficient – –until the table is about 70% full, the number of probes (places looked at in the table) is typically only 2 or 3 Cost of insertion / accessing, is O(1) continued

ADSA: Hashing/10 44 Even if the table is nearly full (leading to long searches), efficiency remains quite high. Hash tables work best when the table size (N) is a prime number. HashMaps use powers of 2 for N. continued

ADSA: Hashing/10 45 o o In the worst case, searches, insertions and removals on a hash table take O(n) time (n = no. of items) – –the worst case occurs when all the keys inserted into the map collide o o The load factor  n  N affects the performance of a hash table (N = table size). – –for linear probe, 0 ≤ ≤ 1 – –for chaining with lists, it is possible that > 1 continued

ADSA: Hashing/10 46 Assume that the hash function uniformly distributes indices around the hash table. – –we can expect = n/N elements in each cell. on average, an unsuccessful search makes comparisons before arriving at the end of a list and returning failure mathematical analysis shows that the average number of probes for a successful search is approximately 1 + /2 – –so keep small!

ADSA: Hashing/ Ordered/Unordered Sets and Maps Use an ordered set or map if an iteration should return items in order – –average search time: O(log 2 n) Use an unordered set or map with hashing when fast access and updates are needed without caring about the ordering of items – –average search time: O(1)

ADSA: Hashing/10 48 Timing Tests SearchComp.java: – –read a file of randomly ordered words and insert each word into a TreeSet and a HashSet. – –report the amount of time required to build both data structures – –shuffle the file input and time a search of the TreeSet and HashSet for each shuffled word – –report the total time required for both search techniques continued

ADSA: Hashing/10 49 Ford & Topp's HashSet build and search times are much better than TreeSet. continued

ADSA: Hashing/10 50 SearchJComp.java – –replace Ford & Topp's TreeSet and HashSet by the ones in the JDK. JDK HashSet and TreeSet are much the same speed for searching