Hash Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne.

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hashing as a Dictionary Implementation
CS 171: Introduction to Computer Science II Hashing and Priority Queues.
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
Lecture 10 Sept 29 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
CSE 373 Data Structures and Algorithms Lecture 18: Hashing III.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
COSC 2007 Data Structures II
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
IS 2610: Data Structures Searching March 29, 2004.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 9: Searching Data Structures.
CS 61B Data Structures and Programming Methodology July 17, 2008 David Sun.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
1 Introduction to Hashing - Hash Functions Sections 5.1, 5.2, and 5.6.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Author: Takdir, S.ST. © Sekolah Tinggi Ilmu Statistik.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Symbol Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne.
A Introduction to Computing II Lecture 11: Hashtables Fall Session 2000.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Week 9 - Monday.  What did we talk about last time?  Practiced with red-black trees  AVL trees  Balanced add.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
1 Data Structures CSCI 132, Spring 2014 Lecture 33 Hash Tables.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin, and Skylight.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Data Structures and Algorithms I
Data Abstraction & Problem Solving with C++
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures and Algorithms I
Presentation transcript:

Hash Tables From “Algorithms” (4 th Ed.) by R. Sedgewick and K. Wayne

implementation NNNN/2N noequals() lg NNN N/2 yescompareTo() ST implementations: summary worst-case cost average-case cost (after N inserts) (after N random inserts) ordered key iteration? interface search insert delete search hit insert delete sequential search (unordered list) binary search (ordered array) BST N N N 1.38 lg N 1.38 lg N ? yes compareTo() splay tree lg N ** lg N ** lg N ** 1.00 lg N 1.00 lg N 1.00 lg N yes compareTo() ** amortized O(log N)

・ Equality test: Method for checking whether two keys are equal. ・ No time limitation: trivial collision resolution with sequential search. Hashing: basic plan Save items in a key-indexed table (index is a function of the key). Hash function. Method for computing array index from key. 0 1 hash("it") = "it" ?? 4 Issues. hash("times") = 3 5 ・ Computing the hash function. ・ Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index. Classic space-time tradeoff. ・ No space limitation: trivial hash function with key as index. ・ Space and time limitations: hashing (the real world).

・ Each table index equally likelyforeachkey. table ・ Better: last three digits. (assigned in chronological order within geographic region) ・ Better: last three digits. Computing the hash function Idealistic goal. Scramble the keys uniformly to produce a table index. ・ Efficiently computable. key thoroughly researched problem, still problematic in practical applications Ex 1. Phone numbers. ・ Bad: first three digits. index Ex 2. Social Security numbers. ・ Bad: first three digits. 573 = California, 574 = Alaska Practical challenge. Need different approach for each key type.

Java’s hash code conventions All Java classes inherit a method hashCode(), which returns a 32-bit int. Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()). Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()). x y x.hashCode() y.hashCode() Default implementation. Memory address of x. Legal (but poor) implementation. Always return 17. Customized implementations. Integer, Double, String, File, URL, Date, … User-defined types. Users are on their own.

・ If field is a primitive type, use wrapper type hashCode(). ・ If field is a reference type, use hashCode(). applies rule recursively Hash code design "Standard" recipe for user-defined types. ・ Combine each significant field using the 31x + y rule. ・ If field is null, return 0. ・ If field is an array, apply to each entry. or use Arrays.deepHashCode() In practice. Recipe works reasonably well; used in Java libraries. In theory. Keys are bitstring; "universal" hash functions exist. Basic rule. Need to use the whole key to compute hash code; consult an expert for state-of-the-art hash codes.

Modular hashing Hash code. An int between and Hash function. An int between 0 and M - 1 (for use as array index). typically a prime or power of 2 private int hash(Key key) { return key.hashCode() % M; } bug private int hash(Key key) { return Math.abs(key.hashCode()) % M; } 1-in-a-billion bug hashCode() of "polygenelubricants" is private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; } correct

Uniform hashing assumption Uniform hashing assumption. Each key is equally likely to hash to an integer between 0 and M - 1. Bins and balls. Throw balls uniformly at random into M bins Birthday problem. Expect two balls in the same bin after ~ π M / 2 tosses. Coupon collector. Expect every bin has ≥ 1 ball after ~ M ln M tosses. Load balancing. After M tosses, expect most loaded bin has Θ ( log M / log log M ) balls.

Uniform hashing assumption Uniform hashing assumption. Each key is equally likely to hash to an integer between 0 and M - 1. Bins and balls. Throw balls uniformly at random into M bins Hash value frequencies for words in Tale of Two Cities (M = 97) Java's String data uniformly distribute the keys of Tale of Two Cities

2 Collisions Collision. Two distinct keys hashing to same index. ・ Birthday problem ⇒ can't avoid collisions unless you have a ridiculous (quadratic) amount of memory. ・ Coupon collector + load balancing ⇒ collisions are evenly distributed. 0 1 hash("it") = 3 3 "it" ?? 4 hash("times") = 3 5 Challenge. Deal with collisions efficiently.

・ Insert: of i th chain putatfront(ifnotalreadythere). E06 Separate chaining symbol table Use an array of M < N linked lists. [H. P. Luhn, IBM 1953] ・ Hash: map key to integer i between 0 and M - 1. ・ Search: need to search only i th chain. key hash value S 2 0 E 0 1 A 8 E 12 A 0 2 R 4 3 st[] null C H X 7 S 0 3 X A 0 8 L 11 P 10 M 4 9 P 3 10 M 9 H 5 C 4 R 3 L 3 11 E 0 12

Separate chaining ST: Java implementation public class SeparateChainingHashST { private int M = 97; // number of chains array doubling and private Node[] st = new Node[M]; // array of chains halving code omitted private static class Node { private Object key; no generic array creation private Object val; ( declare key and value of type Object) private Node next;... } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; } public Value get(Key key) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null; }

Separate chaining ST: Java implementation public class SeparateChainingHashST { private int M = 97; // number of chains private Node[] st = new Node[M]; // array of chains private static class Node { private Object key; private Object val; private Node next;... } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; } public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) { x.val = val; return; } st[i] = new Node(key, val, st[i]); }

.125 M times faster than ・ M too small ⇒ chains too long. Analysis of separate chaining Proposition. Under uniform hashing assumption, prob. that the number of keys in a list is within a constant factor of N / M is extremely close to 1. Pf sketch. Distribution of list size obeys a binomial distribution. (10, ) Binomial distribution ( N = 10 4, M = 10 3, ! = 10 ) equals() and hashCode() Consequence. Number of probes for search/insert is proportional to N / M. ・ M too large ⇒ too many empty chains. sequential search ・ Typical choice: M ~ N / 5 ⇒ constant-time ops.

implementation iteration?interface (unordered list) (ordered array) separate chaininglg N * 3-5 * no ST implementations: summary worst-case cost average case (after N inserts) (after N random inserts) ordered key search insert delete search hit insert delete sequential search N N N N/2 N N/2 no equals() binary search lg N N N lg N N/2 N/2 yes compareTo() BST N N N 1.38 lg N 1.38 lg N ? yes compareTo() splay tree lg N ** lg N ** lg N ** 1.00 lg N 1.00 lg N 1.00 lg N yes compareTo() equals() hashCode() * under uniform hashing assumption ** amortized O(log N)

・ No effective alternative for unordered keys. ・ Better system support in Java for strings (e.g.,cachedhashcode). ・ Support for ordered ST operations. ・ Hash tables: java.util.HashMap, java.util.IdentityHashMap. Hash tables vs. balanced search trees Hash tables. ・ Simpler to code. ・ Faster for simple keys (a few arithmetic ops versus log N compares). Balanced search trees. ・ Stronger performance guarantee. ・ Easier to implement compareTo() correctly than equals() and hashCode(). Java system includes both. ・ Red-black BSTs: java.util.TreeMap, java.util.TreeSet.