12 Hash-Table Data Structures  Hash-table principles  Searching  Insertion  Deletion  Hash-table design  Implementations of sets and maps using hash-tables.

Slides:



Advertisements
Similar presentations
CSE 1302 Lecture 23 Hashing and Hash Tables Richard Gesick.
Advertisements

CSCE 3400 Data Structures & Algorithm Analysis
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey) 5/2/20151.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Maps, Dictionaries, Hashtables
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS2110 Recitation Week 8. Hashing Hashing: An implementation of a set. It provides O(1) expected time for set operations Set operations Make the set empty.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Hashing CS 202 – Fundamental Structures of Computer Science II Bilkent.
Dictionaries Collection of pairs.  (key, element)  Pairs have different keys. Operations.  get(theKey)  put(theKey, theElement)  remove(theKey)
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
10 Binary-Search-Tree Data Structure  Binary-trees and binary-search-trees  Searching  Insertion  Deletion  Traversal  Implementation of sets using.
The Map ADT and Hash Tables. 2 The Map ADT  Map: An abstract data type where a value is "mapped" to a unique key  Need a key and a value to insert new.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
12 Hash-Table Data Structures
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hashing CS 110: Data Structures and Algorithms First Semester,
11 Map ADTs  Map concepts  Map applications  A map ADT: requirements, contract.  Implementations of maps: using key-indexed arrays, entry arrays, linked-lists,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
9-1 9 Set ADTs Set concepts. Set applications. A set ADT: requirements, contract. Implementations of sets: using arrays, linked lists, boolean arrays.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hash Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Sets and Maps Chapter 9.
11 Map ADTs Map concepts. Map applications.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Searching.
Hash functions Open addressing
Sets and Maps Chapter 9.
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Presentation transcript:

12 Hash-Table Data Structures  Hash-table principles  Searching  Insertion  Deletion  Hash-table design  Implementations of sets and maps using hash-tables © 2008 David A Watt, University of Glasgow Algorithms & Data Structures (M)

12-2 Hash-table principles (1)  If a map’s keys are small integers, we can represent the map by a key-indexed array. Search, insertion, and deletion are then O(1).  Surprisingly, we can approach this performance with keys of any type!  Idea: Translate each key to a small integer, and then use that integer to index an array. This is called hashing.  A hash-table is an array of m buckets, together with a hash function hash(k) that translates each key k to a bucket index (in range 0…m–1).

12-3 Hash-table principles (2)  Illustration: keyvalue k1k1 v1v1 k2k2 v2v2 k3k3 v3v3 k4k4 v4v4 …… knkn vnvn m–1 m–2 hash-table (array of m buckets) hashing (translating keys to bucket indices) collision

12-4 Hash-table principles (3)  Each key k has a home bucket in the hash- table, namely the bucket with index hash(k).  To insert a new entry with key k into the hash- table, assign that entry to k ’ s home bucket. –If k ’ s home bucket is already occupied, we have a collision. This must be resolved somehow.  To search for an entry with key k in the hash- table, look in k ’ s home bucket.  To delete an entry with key k from the hash- table, look in k ’ s home bucket.

12-5 Hash-table principles (4)  The hash function must be consistent: k 1 = k 2 implies that hash(k 1 ) = hash(k 2 ).  In general, the hash function is many-to-one.  Therefore different keys may share the same home bucket: k 1  k 2 but hash(k 1 ) = hash(k 2 ). If this actually happens, it is called a collision.  Always prefer a hash function that is likely to make collisions infrequent.

12-6 Example: a hash function for words  Suppose that the keys are English words.  A possible hash function: m= 26 hash(k)= (initial letter of k) – ‘ A ’  All words with initial letter ‘ A ’ share bucket 0; … ; all words with initial letter ‘ Z ’ share bucket 25.  For illustrative purposes, this is a convenient choice.  For practical purposes, this is a poor choice: collisions are likely to be frequent in some buckets.

12-7 Hashing in Java (1)  Instance method in class Object : public int hashCode (); // Translate this object to an integer, such that // x.equals(y) implies that // x.hashcode() == y.hashcode().  Note that hashCode() is consistent.  We can use hashCode() to implement a hash function for a hash-table with m buckets: static int hash (Object k) { return Math.abs(k.hashCode()) % m ; } Math.abs returns a nonnegative integer. Modulo-m arithmetic then gives an integer in the range 0…m–1.

12-8 Hashing in Java (2)  Each subclass of Object should override hashCode.  Examples: Class of k Result of k.hashCode() String weighted sum of characters of k Integer integer value of k Date (high 32 bits of k) exclusive-or (low 32 bits of k) Note: a date k is represented by a long integer, the number of milliseconds since

12-9 Closed buckets vs open buckets  In an open-bucket hash-table: –each bucket may be occupied by at most one entry –whenever there is a collision, displace the new entry to an unoccupied bucket.  In a closed-bucket hash-table (CBHT): –each bucket may be occupied by several entries –buckets are completely separate. not covered here: see Watt & Brown

12-10 Closed-bucket hash-tables (1)  Simplest implementation of a CBHT: each bucket is an SLL.  In the following illustrations, the keys are names of chemical elements. For illustrative purposes, assume: m= 26 hash(k)= (initial letter of k) – ‘ A ’

12-11 Closed-bucket hash-tables (2)  Illustration (with no collisions): elementnumber F9 Ne10 Cl17 Ar18 Br35 Kr36 I53 Xe54 is represented by Xe54 F9F9 Ne10 Kr36 I53 Ar Cl17 Br35 13

12-12 Closed-bucket hash-tables (3)  Illustration (with collisions): elementnumber H1 He2 Li3 Be4 Na11 Mg12 K19 Ca20 Rb37 Sr38 Cs55 Ba56 is represented by Sr38 K19 Na11 Mg12 Li3 Ba56 Ca20 Be4 H1H1He2 Rb37 Cs

12-13 Closed-bucket hash-tables (4)  Java class implementing CBHTs: public class CBHT { private CBHT.Node[] buckets; public CBHT (int m) { buckets = new Node[m]; } … private int hash (Object key) { return Math.abs(key.hashCode()) % buckets.length; }

12-14 Closed-bucket hash-tables (5)  Java class (continued): //////// Inner class //////// private static class Node { // Each CBHT.Node object is a node of a CBHT. private Object key, value; private Node succ; private Node (Object key, Object val, Node succ) { … }}

12-15 CBHT search (1)  CBHT search algorithm: To find which if any node of a CBHT contains an entry whose key is equal to target-key: 1.Set b to hash(target-key). 2.Find which if any node of the SLL of bucket b contains an entry whose key is equal to target-key, and terminate with that node as answer.  Step 2 is SLL linear search.

12-16 CBHT search (2)  Implementation (in class CBHT ): public CBHT.Node search (Object targetKey) { int b = hash(targetKey); CBHT.Node curr = buckets[b]; while (curr != null) { if (targetKey.equals(curr.key)) return curr; curr = curr.succ; } return null; }

12-17 CBHT insertion (1)  CBHT insertion algorithm: To insert the entry (key, val) into a CBHT: 1.Set b to hash(key). 2.Insert the entry (key, val) into the SLL of bucket b, replacing any existing entry whose key is key. 3.Terminate.

12-18 CBHT insertion (2)  Implementation (in class CBHT ): public void insert (Object key, Object val) { int b = hash(key); CBHT.Node curr = buckets[b]; while (curr != null) { if (key.equals(curr.key)) { curr.value = val; return; } curr = curr.succ; } buckets[b] = new Node(key, val, buckets[b]); }

12-19 CBHT deletion (1)  CBHT deletion algorithm: To delete the entry (if any) whose key is equal to key from a CBHT: 1.Set b to hash(key). 2.Delete the entry (if any) whose key is equal to key from the SLL of bucket b. 3.Terminate.

12-20 CBHT deletion (2)  Implementation (in class CBHT ): public void delete (Object key) { int b = hash(key); CBHT.Node pred = null, curr = buckets[b]; while (curr != null) { if (key.equals(curr.key)) { if (pred == null) buckets[b] = curr.succ; else pred.succ = curr.succ; return; } pred = curr; curr = curr.succ; } }

12-21 CBHTs: analysis  Analysis of CBHT search, insertion, and deletion algorithms (counting comparisons): Let the number of entries be n.  In the best case, no bucket contains more than (say) 2 entries: Max. no. of comparisons = 2 Best-case time complexity is O(1).  In the worst case, one bucket contains all n entries: Max. no. of comparisons = n Worst-case time complexity is O(n).

12-22 CBHTs: design  CBHT design consists of: –choosing the number of buckets m –choosing the hash function hash.  Design aims: –collisions should be infrequent –entries should be distributed evenly among the buckets, such that few buckets contain more than (say) 2 entries.

12-23 CBHTs: choosing the number of buckets  The load factor of a hash-table is the average number of entries per bucket, n/m.  If n is (roughly) predictable, choose m such that the load factor is likely to stay in range 0.5 … –A load factor < 0.5 wastes space. –A load factor > 0.75 tends to result in some buckets having many entries.  Where possible, choose m to be a prime number. –Typically the hash function performs modulo-m arithmetic. If m is prime, the entries are more likely to be distributed evenly over the buckets, regardless of any pattern in the keys.

12-24 CBHTs: choosing the hash function  The hash function should be efficient (performing few arithmetic operations).  The hash function should distribute the entries evenly among the buckets, regardless of any patterns in the keys.  Possible compromise: –Speed up the hash function by using only part of the key. –But beware of any patterns in that part of the key.

12-25 Example: hash-table for words (1)  Suppose that a hash-table will contain about 1,000 common English words.  Known patterns in the keys: –Letters vary in frequency (A, E, I, N, S, T are common; Q, X, Z are uncommon). –Word lengths vary in frequency (lengths of 4 – 8 are common; other lengths are less common).

12-26 Example: hash-table for words (2)  hash(k) can depend on any of k ’ s letters and/or length.  Consider m = 20, hash(k) = length of k – 1. – Far too few buckets. Load factor = 1000/20 = 50. – Very uneven distribution.  Consider m = 26, hash(k) = initial letter of k – ‘ A ’. – Far too few buckets. Load factor = 1000/26  38. – Very uneven distribution.

12-27 Example: hash-table for words (3)  Consider m = 520, hash(k) = 26  (length of k – 1) + (1 st letter of k – ‘ A ’ ). – Too few buckets. Load factor = 1000/520  1.9. – Very uneven distribution. Words of length 4 are far more common than words of length 2, so buckets 78 – 103 will be far more densely populated than buckets 52 – 77. Words with 1 st letter ‘ A ’ are far more common than words with 1 st letter ‘ Z ’, so buckets 0, 26, 52, … will be far more densely populated than buckets 25, 51, 77, … And so on.

12-28 Example: hash-table for words (4)  Consider m = 1499, hash(k) = (weighted sum of letters of k) modulo m i.e., (c 1  1 st letter + c 2  2 nd letter + … ) modulo m +Good number of buckets. Load factor = 1000/1499  Reasonably even distribution.

12-29 Hash-tables in practice (1)  The following trials illustrate the performance of a CBHT with m buckets.  The hash function was chosen to distribute keys uniformly among the m buckets (i.e., the probability that a key is mapped to any particular bucket was 1/m).  In each trial, the CBHT was loaded with a set of n randomly-generated keys.

12-30 Hash-tables in practice (2)  Trials with m = 47: CBHT n = 23 (load factor  0.5) Few buckets have more than 1 entry. Few buckets have more than 2 entries. CBHT n = 36 (load factor  0.75)

12-31 Example: student records (1)  Consider a hypothetical university ’ s student records. About 1,500 new students register each year, and most students stay for 4 years.  Each student has a unique student-number of the form yydddd –yy are the last two digits of the year of first registration –dddd is a serial number.  Suppose that the student records will be held in a hash-table.  hash(k) can depend on any or all of the digits of the student-number k.

12-32 Example: student records (2)  Consider m = 100, hash(k) = first 2 digits of k. – Far too few buckets. Load factor  60. – Very uneven distribution. E.g., in academic year 2012 – 13, most student-numbers start with 09, 10, 11, or 12.  Consider m = 10,000, hash(k) = last 4 digits of k. +Good number of buckets. Load factor  0.6. – Uneven distribution. Most student-numbers end with 0000 …  Consider m = 9,997, hash(k) = k modulo m. +Good number of buckets. Load factor  Even distribution (since m is prime).

12-33 Implementation of maps using CBHTs  Summary of algorithms: OperationAlgorithmTime complexity bestworst get CBHT searchO(1)O(n) remove CBHT deletionO(1)O(n) put CBHT insertionO(1)O(n) putAll merge on corresponding buckets of both CBHTs O(m)O(n'n) equals equality test on corresponding buckets of both CBHTs O(m)O(n'n)

12-34 Implementation of sets using CBHTs  Summary of algorithms: OperationAlgorithmTime complexity bestworst contains CBHT search O(1)O(n) add CBHT insertion O(1)O(n) remove CBHT deletion O(1)O(n) equals equality test on corresponding buckets of both CBHTs O(m)O(n'n) containsAll subsumption test on corresponding buckets of both CBHTs O(m)O(n'n) addAll merge on corresponding buckets of both CBHTs O(m)O(n'n) removeAll variant of merge on corresponding buckets of both CBHTs O(m)O(n'n) retainAll variant of merge on corresponding buckets of both CBHTs O(m)O(n'n)