The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.

Slides:



Advertisements
Similar presentations
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Hash Tables,
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Hash Table March COP 3502, UCF.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CSC 427: Data Structures and Algorithm Analysis
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
1 CSC 427: Data Structures and Algorithm Analysis Fall 2011 Space vs. time  space/time tradeoffs  hashing  hash table, hash function  linear probing.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
DATA STRUCTURE Presented By: Mahmoud Rafeek Alfarra Using C# MINISTRY OF EDUCATION & HIGHER EDUCATION COLLEGE OF SCIENCE AND TECHNOLOGY KHANYOUNIS- PALESTINE.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Implementing the Map ADT.  The Map ADT  Implementation with Java Generics  A Hash Function  translation of a string key into an integer  Consider.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Building Java Programs Generics, hashing reading: 18.1.
Sets and Maps Chapter 9.
Hashing.
Slides by Steve Armstrong LeTourneau University Longview, TX
Efficiency add remove find unsorted array O(1) O(n) sorted array
Building Java Programs
Searching Tables Table: sequence of (key,information) pairs
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Sets and Maps Chapter 9.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Data Structures and Algorithm Analysis Hashing
Presentation transcript:

The Map ADT and Hash Tables

2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new mappings  Only need the key to find mappings  Only need the key to remove mappings

3 Key and Value  With Java generics, you need to specify  the type of key  the type of value  Here the key type is String and the value type is BankAccount Map accounts = new HashMap ();

4 Put and get  Add new mappings (a key mapped to a value): Map accounts = new HashMap (); accounts.put("M", new BankAccount("Michel", )); accounts.put("G", new BankAccount("Georgie", )); accounts.put("R", new BankAccount("Daniel", )); BankAccount current = accounts.get("M"); assertEquals(111.11, current.getBalance(), 0.001); assertEquals("Michel", current.getID()); current = accounts.get("R"); // What is current.getID()? _______________ // What is current.getBalance()? __________

5 keys must be unique  put returns replaced value if key existed  In this case, the mapping now has the same key mapped to a new value  or returns null if the key does not exist Map ranking = new HashMap (); assertNull(ranking.put(50, "Kim")); assertNull(ranking.put(25, "Li")); // The key 25 is already in the map assertNotNull(ranking.put(25, "Any Name"));

6 remove  remove will return false if key is not found  return true if the mapping (the key-value pair) was successfully removed from the collection assertTrue(accounts.remove("G")); assertFalse(accounts.remove("Not Here"));

7 get returns null  get will return null if the key is not found assertNotNull(accounts.get("M")); assertTrue(accounts.remove("M")); assertNull(accounts.get("M"));

8 Generic  Can have different types of keys and values  However, keys must implement Comparable and override equals (use Integer and String for key type) Map ranking = new HashMap (); ranking.put(1, "Kim"); ranking.put(2, "Li"); ranking.put(3, "Sandeep"); assertEquals("Kim", ranking.get(1)); assertEquals("Li", ranking.get(2)); assertEquals("Sandeep", ranking.get(3)); assertNull(ranking.get(4)); assertNotNull(ranking.get(1)); assertTrue(ranking.remove(1)); assertNull(ranking.get(1));

9 Which data structure?  What data structures could we use to implement Map?  ________, _________, _______, _________  We will use …

 Outline  What is hash function?  translation of a string key into an integer  Consider a few strategies for implementing a hash table  linear probing  quadratic probing  separate chaining hashing Hash Tables A "fast" implementation for Map ADTs

Data Structures put get remove Unsorted Array Sorted Array Unsorted Linked List Sorted Linked List Binary Search Tree Big O using different data structures for a Map ADT?

Hash Tables  Hash table: another data structure  Provides virtually direct access to objects based on a key (a unique String or Integer)  key could be your SID, your telephone number, social security number, account number, …  Must have unique keys  Each key is associated with–mapped to–a value

Hashing  Must convert keys such as " " into an integer index from 0 to some reasonable size  Elements can be found, inserted, and removed using the integer index as an array index  Insert (called put), find (get), and remove must use the same "address calculator"  which we call the Hash function

 Can make String or Integer keys into integer indexes by "hashing"  Need to take hashCode % array size  Turn “S ” into an int 0..students.length  Ideally, every key has a unique hash  Then the hash value could be used as an array index, however,  Ideal is impossible, Some keys will "hash" to the same integer index, Known as a collision  Need a way to handle collisions!  "abc" may hash to the same integer as "cba" Hashing

Hash Tables: Runtime Efficient  Lookup time does not grow when n increases  A hash table supports  fast insertion O(1)  fast retrieval O(1)  fast removal O(1)  Could use String keys each ASCII character equals some unique integer  "able" = == 404

Hash method works something like… zzzzzzzz A string of 8 charsRange: hash(key) AAAAAAAA 8482 hash(key) 1273 Convert a String key into an integer that will be in the range of 0 through the maximum capacity-1 Assume the array capacity is 9997

Hash method  What if the ASCII value of individual chars of the string key added up to a number from ("A") 65 to possibly 488 ("zzzz") 4 chars max  If the array has size = 309, mod the sum 390 % TABLE_SIZE = % TABLE_SIZE = % TABLE_SIZE = 95  These array indices index these keys abba abcd able

A too simple hash public void testHash() { assertEquals(81, hash("abba")); assertEquals(81, hash("baab")); assertEquals(85, hash("abcd")); assertEquals(86, hash("abce")); assertEquals(308, hash("IKLT")); assertEquals(308, hash("KLMP")); } private final int TABLE_SIZE = 309; public int hash(String key) { // return an int in the range of 0..TABLE_SIZE-1 int result = 0; int n = key.length(); for (int j = 0; j < n; j++) result += key.charAt(j); // add up the chars return result % TABLE_SIZE; }

Collisions  A good hash method  executes quickly  distributes keys equitably  But you still have to handle collisions when two keys have the same hash value  the hash method is not guaranteed to return a unique integer for each key  example: simple hash method with "baab" and "abba"  There are several ways to handle collisions  let us first examine linear probing

Linear Probing Dealing with Collisions  Collision : When an element to be inserted hashes out to be stored in an array position that is already occupied.  Linear Probing : search sequentially for an unoccupied position  uses a wraparound (circular) array

A hash table after three insertions using the too simple (lousy) hash method "abba" Keys insert objects with these three keys: "abba" "abcd" "abce"... "abcd" "abce"

Collision occurs while inserting "baab" can't insert "baab" where it hashes to same slot as "abba" Linear probe forward by 1, inserting it at the next available slot "baab" Try [81] Put in [82] "abba" "abcd" "abce" "baab"

Wrap around when collision occurs at end Insert "KLMP" and "IKLT" both of which have a hash value of "abba" "abcd" "abce" "baab" "KLMP" "IKLT"

Find object with key "baab" "abba" "abcd" "abce" "baab" "KLMP" "IKLT" "baab" still hashes to 81, but since [81] is occupied, linear probe to [82] At this point, you could return a reference or remove baab

HashMap put with linear probing public class HashTable { private class HashTableNode { private Key key; private Value value; private boolean active; private boolean tombstoned; // Allows reuse public HashTableNode() { // All nodes in array will begin initialized this way key = null; value = null; active = false; tombstoned = false; } public HashTableNode(Key initKey, Value initData) { key = initKey; value = initData; active = true; tombstoned = false; }

Constructor and beginning of put private final static int TABLE_SIZE = 9; private Object[] table; public HashTable() { // Since HashNodeTable has generics, we can not have // a new HashNodeTable[], so use Object[] table = new Object[TABLE_SIZE]; for (int j = 0; j < TABLE_SIZE; j++) table[j] = new HashTableNode(); } public Value put(Key key, Value value) // TBA

put  Four possible states when looking at slots  the slot was never occupied, a new mapping  the slot is occupied and the key equals argument  will wipe out old value  the slot is occupied and key is not equal  proceed to next  the slot was occupied, but nothing there now removed  We could call this a tombStoned slot  It can be reused

A better hash function  This is the actual hashCode() algorithm of Java.lang.String (Integer’s is…well, the int) s[0]*31^(n-1) + s[1]*31^(n-2) s[n-1] Using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

An implementation private static int TABLE_SIZE = 309; // s[0]*31^(n-1) + s[1]*31^(n-2) s[n-1] // With "baab", index will be 246. // With "abba", index will be 0 (no collision). public int hashCode(String s) { if(s.length() == 0) return 0; int sum = 0; int n = s.length(); for(int i = 0; i < n-1; i++) { sum += s.charAt(i)*(int)Math.pow(31, n-i-1); } sum += s.charAt(n-1); return index = Math.abs(sum) % TABLE_SIZE; }

 Used slots tend to cluster with linear probing Array based implementation has Clustering Problem

Quadratic Probing  Quadratic probing eliminates the primary clustering problem  Assume hVal is the value of the hash function  Instead of linear probing which searches for an open slot in a linear fashion like this hVal + 1, hVal + 2, hVal + 3, hVal + 4,...  add index values in increments of powers of 2 hVal + 2 1, hVal + 2 2, hVal + 2 3, hVal + 2 4,...

Does it work?  Quadratic probing works well if  1) table size is prime  studies show the prime numbered table size removes some of the non-randomness of hash functions  2) table is never more than half full  probes 1, 4, 9, 17, 33, 65, 129,... slots away  So make your table twice as big as you need  insert, find, remove are O(1)  A space (memory) tradeoff:  4*n additional bytes required for unused array locations

Separate Chaining  Separate Chaining is an alternative to probing  How? Maintain an array of lists  Hash to the same place always and insert at the beginning (or end) of the linked list.  The list needs add and remove methods

“AB” 9“BA”  Each array element is a List Array of LinkedLists Data Structure

Insert Six public void testPutAndGet() { MyHashTable h = new MyHashTable (); BankAccount a1 = new BankAccount("abba", ); BankAccount a2 = new BankAccount("abcd", ); BankAccount a3 = new BankAccount("abce", ); BankAccount a4 = new BankAccount("baab", ); BankAccount a5 = new BankAccount("KLMP", ); BankAccount a6 = new BankAccount("IKLT", ); // Insert BankAccount objects using ID as the key h.put(a1.getID(), a1); h.put(a2.getID(), a2); h.put(a3.getID(), a3); h.put(a4.getID(), a4); h.put(a5.getID(), a5); h.put(a6.getID(), a6); System.out.println(h.toString()); }

Lousy hash function and TABLE_SIZE==11 0. [IKLT=IKLT $600.00, KLMP=KLMP $500.00] 1. [] 2. [] 3. [] 4. [] 5. [baab=baab $400.00, abba=abba $100.00] 6. [] 7. [] 8. [] 9. [abcd=abcd $200.00] 10. [abce=abce $300.00]

With Java’s better hash method, collisions still happen 0. [IKLT=IKLT $600.00] 1. [abba=abba $100.00] 2. [abcd=abcd $200.00] 3. [baab=baab $400.00, abce=abce $300.00] 4. [KLMP=KLMP $500.00] 5. [] 6. [] 7. [] 8. [] 9. [] 10. []

Experiment Rick v. Java  Rick's linear probing implementation, Array size was 75,007 Time to construct an empty hashtable: seconds Time to build table of entries: 0.65 seconds Time to lookup each table entry once: 0.19 seconds  8000 arrays of Linked lists Time to construct an empty hashtable: 0.04 seconds Time to build table of entries: seconds Time to lookup each table entry once: seconds  Java's HashMap Time to construct an empty hashtable: 0.0 seconds Time to build table of entries: seconds Time to lookup each table entry once: 0.11 seconds

Runtimes?  What are the runtimes in big-O for the linear probing of an array for method  get __________  put ____________  remove _____________

Hash Table Summary  Hashing involves transforming data to produce an integer in a fixed range ( 0..TABLE_SIZE-1 )  The function that transforms the key into an array index is known as the hash function  When two data values produce the same hash value, you get a collision—it happens!  Collision resolution may be done by searching for the next open slot at or after the position given by the hash function, wrapping around to the front of the table when you run off the end (known as linear probing)

Hash Table Summary  Another common collision resolution technique is to store the table as an array of linked lists and to keep at each array index the list of values that yield that hash value known as separate chaining  Most often the data stored in a hash table includes both a key field and a data field (e.g., social security number and student information).  The key field determines where to store the value.  A lookup on that key will then return the value associated with that key (if it is mapped in the table)