CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
CSE 250: Data Structures Week 12 March 31 – April 4, 2008.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Spring 2015 Lecture 6: Hash Tables
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing - resolving collisions
Hashing Alexandra Stefan.
Hashing - Hash Maps and Hash Functions
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hash Table.
Hashing and Hash Tables
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Collision Resolution Neil Tang 02/18/2010
Hashing.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
Hash Tables – 2 Comp 122, Spring 2004.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
CS 3343: Analysis of Algorithms
Slide Sources: CLRS “Intro. To Algorithms” book website
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Lecture-Hashing.
Presentation transcript:

CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331

Dictionary Operations Given a universe of elements U Need to store some keys Need to perform the following for keys Insert Search Delete Let’s call this a dictionary. CSE 2331/5331

Not great if universe size First Try: Use an array T of the size of universe. Each operation: O(1). Not great if universe size is much larger than the number of keys ever needed. CSE 2331/5331

Hash Table U : universe T[0 … m-1] : a hash table of size m 𝑚≪ 𝑈 Hash functions ℎ:𝑈→{0, 1, …, 𝑚−1} ℎ 𝑘 is called the hash value of key k. Given a key k, we will store it in location h(k) of hash table T. CSE 2331/5331

Collisions Since the size of hash table is smaller than the universe: Multiple keys may hash to the same slot. How to handle collisions? Chaining Open addressing CSE 2331/5331

Collision Resolved by Chaining T[j]: a pointer to the head of the linked list of all stored elements that hash to j Nil otherwise CSE 2331/5331

Dictionary Operations Chained-Hash-Insert (T, x) Insert x at the head of list T[h(key(x))] Chained-Hash-Search(T, k) Search for an element with key k in list T[h(k)] Chained-Hash-Delete(T, x) Delete x from the list T[h(key(x))] O(1) O(length(T[h(k)]) O(length(T[h(key(x))]) CSE 2331/5331

Average-case Analysis n: # elements in the table m: size of table (# slots in the table) Load factor: 𝛼= 𝑛 𝑚 : average number of elements per linked list Intuitively the optimal time needed Individual operation can be slow (O(n) time) Under certain assumption of the distribution of keys, analyze expected performance. CSE 2331/5331

Simple Uniform Hashing Simple uniform hashing assumption: any given element is equally likely to hash into any of the m slots in T Let nj be length of list T[j] 𝑛= 𝑛 0 + 𝑛 1 +⋯+ 𝑛 𝑚−1 Under simple uniform hashing assumption: expected value 𝐸 𝑛 𝑗 =𝛼= 𝑛 𝑚 Why? CSE 2331/5331

Let 𝑘 1 , 𝑘 2 ,…, 𝑘 𝑛 be the set of keys Goal: Estimate E(nj) Let 𝑋 𝑖 =1 if ℎ 𝑘 𝑖 =𝑗 0 otherwise Note: 𝑛 𝑗 = 𝑖=1 𝑛 𝑋 𝑖 ! Hence 𝐸 𝑛 𝑗 =𝐸 𝑖=1 𝑛 𝑋 𝑖 = 𝑖=1 𝑛 𝐸 𝑋 𝑖 = 𝑖=1 𝑛 1 𝑚 = 𝑛 𝑚 𝐸 𝑋 𝑖 = Pr ℎ 𝑘 𝑖 =𝑗 ∗1 = 1 𝑚 CSE 2331/5331

Search Complexity – Case 1 If search is unsuccessful Based on simple uniform hashing, a new key is equally likely to be in any slot 𝐸𝑇 ℎ 𝑘 = 𝑗=1 𝑚 1 𝑚 𝐸[ 𝑛 𝑗 ] =𝛼= 𝑛 𝑚 Expected search time: Θ 1+𝛼 CSE 2331/5331

Search Complexity – Case 2 If the search of k is successful Note, simple uniform hashing assumption does not necessarily implies that there is a equal chance for k in any slot. Assume: k is equally likely to be any of the n elements already stored in the hash table. Theorem: Under simple uniform hashing assumption, the search procedure takes Θ 1+𝛼 expected time, when using collision resolution by chaining. CSE 2331/5331

Hash Functions Ideally, Hash function satisfies the assumption of simple uniform hashing Hard to achieve without knowledge of distribution where keys are drawn from Give a few heuristic examples CSE 2331/5331

Division Method ℎ 𝑘 =𝑘 𝑚𝑜𝑑 𝑚 Choice of m is important e.g, ℎ 𝑘 =𝑘 𝑚𝑜𝑑 701 Choice of m is important Power of 2 not very good Depends only on few least significant bits Higher bits not used A good choice is a prime number not too close to exact power of 2 Related: ℎ 𝑘 =(𝑘 𝑝) 𝑚𝑜𝑑 𝑚 Power of 2 means that we simply take the least significant p bits of k CSE 2331/5331

Multiplication Method Choose some 0<𝐴<1 ℎ 𝑘 = 𝑚 𝑘 𝐴 𝑚𝑜𝑑 1 Slower than division method, but choice of m not so critical One reasonable choice of A: 𝐴≈ 5 −1 2 ≈0.6180339887… CSE 2331/5331

Open-address Hashing All keys are stored in the table itself No extra pointers Each slot is either a key or NIL To hash a key k: In the ith iteration, compute h(k, i) If h(k,i) is taken (not NIL) Go to next iteration If h(k, i) is free Store k here in this slot. Terminate. CSE 2331/5331

Pseudo-code CSE 2331/5331

Search CSE 2331/5331

Re-hashing Functions ℎ:𝑈× 0, 1, …, 𝑚−1 → 0, 1, …, 𝑚−1 Possible choices Linear probing: ℎ 𝑘, 𝑖 = ℎ ′ 𝑘 +𝑖 𝑚𝑜𝑑 𝑚 Quadratic probing: ℎ 𝑘, 𝑖 = ℎ ′ 𝑘 + 𝑐 1 𝑖+ 𝑐 2 𝑖 2 𝑚𝑜𝑑 𝑚 Double hashing ℎ 𝑘, 𝑖 = ℎ 1 𝑘 +𝑖 ℎ 2 𝑘 𝑚𝑜𝑑 𝑚 Probe number Slot hashed to Tend to cause primary clustering Draw a pic, and given an example Secondary clustering: if h(k_1, 0) = h(k_2, 0), then h(k_1, i) = h(k_2, i). Secondary clustering CSE 2331/5331

Remarks Advantages: But: no pointer, no memory allocation during the course But: Load factor 𝛼= 𝑛 𝑚 <1 Need resizing strategy when n > m CSE 2331/5331

Summary Hash Table Very practical data structure for dictionary operations Especially when the number of keys necessary is much smaller than the size of universe Need to choose hash functions properly There exist more intelligent hashing schemes CSE 2331/5331