Randomized Algorithms CS648

Slides:



Advertisements
Similar presentations
David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
Advertisements

Randomized Algorithms Randomized Algorithms CS648 Lecture 15 Randomized Incremental Construction (building the background) Lecture 15 Randomized Incremental.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Tirgul 5 AVL trees.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Lecture 10: Search Structures and Hashing
Data Structures Hashing Uri Zwick January 2014.
Sorting with Heaps Observation: Removal of the largest item from a heap can be performed in O(log n) time Another observation: Nodes are removed in order.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Lecture 15 Nov 3, 2013 Height-balanced BST Recall:
Week 7 - Friday CS221.
Hash table CSC317 We have elements with key and satellite data
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Hashing CSE 2011 Winter July 2018.
Lecture No.43 Data Structures Dr. Sohail Aslam.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Introduction to Algorithms 6.046J/18.401J
Dynamic Order Statistics
Streaming & sampling.
Cse 373 April 24th – Hashing.
EEE2108: Programming for Engineers Chapter 8. Hashing
CSE373: Data Structures & Algorithms Lecture 7: AVL Trees
Hashing Alexandra Stefan.
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Hash Functions/Review
Hash Table.
Instructor: Lilian de Greef Quarter: Summer 2017
Randomized Algorithms: Data Structures
Fast Trie Data Structures
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Randomized Algorithms CS648
Multi-Way Search Trees
(2,4) Trees /26/2018 3:48 PM (2,4) Trees (2,4) Trees
Randomized Algorithms CS648
Introduction to Algorithms 6.046J/18.401J
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables – 2 Comp 122, Spring 2004.
(2,4) Trees (2,4) Trees (2,4) Trees.
Hashing Alexandra Stefan.
CSE 332: Data Abstractions AVL Trees
(2,4) Trees 2/15/2019 (2,4) Trees (2,4) Trees.
Introduction to Algorithms
(2,4) Trees /24/2019 7:30 PM (2,4) Trees (2,4) Trees
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
(2,4) Trees (2,4) Trees (2,4) Trees.
CS 5243: Algorithms Hash Tables.
Dynamic Graph Algorithms
Compact routing schemes with improved stretch
Randomized Algorithms CS648
CS 3343: Analysis of Algorithms
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Hash Tables – 2 1.
CS210- Lecture 20 July 19, 2005 Agenda Multiway Search Trees 2-4 Trees
Lecture-Hashing.
CMPT 225 Lecture 16 – Heap Sort.
Presentation transcript:

Randomized Algorithms CS648 Lecture 11 Hashing - I

“Does 𝑖∈ 𝑺 ?” for any given 𝑖∈𝑼. Problem Definition 𝑼= 1,2,…,𝑚 called universe 𝑺⊆𝑼 and 𝑠=|𝑺| 𝑠≪ 𝑚 Examples: 𝑚= 10 18 , 𝑠= 10 3 Aim Maintain a data structure for storing 𝑺 to support the search query : “Does 𝑖∈ 𝑺 ?” for any given 𝑖∈𝑼.

Solutions Solutions with worst case guarantees Alternative: Solution for static 𝑺 : Array storing 𝑺 in sorted order Solution for dynamic 𝑺 : Height Balanced Search trees (AVL trees, Red-Black trees,…) Time per operation: O(log 𝑠), Space: O(𝑠) Alternative: Time per operation: O(1), Space: O(𝑚) Solutions used in practice with no worst case guarantees Hashing.

How many bits needed to encode 𝒉 ? Hashing Hash table: 𝑻: an array of size 𝒏. Hash function 𝒉 : 𝑼 [𝒏] Answering a Query: “Does 𝑖∈ 𝑺 ?” 𝑘𝒉(𝑖); Search the list stored at 𝑻[𝑘]. Properties of 𝒉 : 𝒉 𝑖 computable in O(1) time. Space required by 𝒉: O(1). Elements of 𝑺 𝑻 ⋮ 1 𝒏−𝟏 How many bits needed to encode 𝒉 ?

Collision Definition: Two elements 𝑖,𝑗∈𝑼 are said to collide under hash function 𝒉 if 𝒉 𝑖 =𝒉 𝑗 Worst case time complexity of searching an item 𝑖 : No. of elements in 𝑺 colliding with 𝑖. A Discouraging fact: No hash function can be found which is good for all 𝑺. Proof: At least 𝑚/𝑛 elements from 𝑼 are mapped to a single index in 𝑻. ⋮ 1 𝒏−𝟏 𝑻

Collision Definition: Two elements 𝑖,𝑗∈𝑼 are said to collide under hash function 𝒉 if 𝒉 𝑖 =𝒉 𝑗 Worst case time complexity of searching an item 𝑖 : No. of elements in 𝑺 colliding with 𝑖. A Discouraging fact: No hash function can be found which is good for all 𝑺. Proof: At least 𝑚/𝑛 elements from 𝑼 are mapped to a single index in 𝑻. ⋮ 1 𝒏−𝟏 𝑻 ⋯ 𝑚/𝑛

The following result gave an answer in affirmative Hashing A very popular heuristic since 1950’s Achieves O(1) search time in practice Worst case guarantee on search time: O(𝒔) Question: Can we have a hashing ensuring O(1) worst case guarantee on search time. O(𝒔) space. Expected O(𝒔) preprocessing time. The following result gave an answer in affirmative Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.

Why does hashing work so well in Practice ?

Why does hashing work so well in Practice ? Question: What is the simplest hash function 𝒉 : 𝑼 [𝒏] ? Answer: 𝒉 𝑖 =𝑖 𝐦𝐨𝐝 𝑛 Hashing works so well in practice because the set 𝑺 is usually a uniformly random subset of 𝑼. Let us give a theoretical reasoning for this fact.

Why does hashing work so well in Practice ? 1 2 m Let 𝑦 1 , 𝑦 2 ,…, 𝑦 𝑠 denote 𝑠 elements selected randomly uniformly from 𝑼 to form 𝑺. Question: What is expected number of elements colliding with 𝑦 1 ? Answer: Let 𝑦 1 takes value 𝑖. P( 𝑦 𝑗 collides with 𝑦 1 ) = ?? ⋮ 𝑖−𝑛 𝑖 How many possible values can 𝑦 𝑗 take ? 𝑖+𝑛 How many possible values can collide with 𝑖 ? 𝑖+2𝑛 𝑚−1 𝑖+3𝑛 ⋮

Why does hashing work so well in Practice ? 1 2 m Let 𝑦 1 , 𝑦 2 ,…, 𝑦 𝑠 denote 𝑠 elements selected randomly uniformly from 𝑼 to form 𝑺. Question: What is expected number of elements colliding with 𝑦 1 ? Answer: Let 𝑦 1 takes value 𝑖. P( 𝑦 𝑗 collides with 𝑦 1 ) = 𝑚 𝑛 𝑚−1 Expected number of elements of 𝑺 colliding with 𝑦 1 = = 𝑚 𝑛 𝑚−1 (𝑠−1) =𝑂 1 for 𝑛=𝐎(𝑠) ⋮ 𝑖−𝑛 Values which may collide with 𝑖 under the hash function 𝒉 𝑥 =𝒙 𝐦𝐨𝐝 𝑛 𝑖 𝑖+𝑛 𝑖+2𝑛 𝑖+3𝑛 ⋮

Why does hashing work so well in Practice ? Conclusion 𝒉 𝑖 =𝑖 𝐦𝐨𝐝 𝑛 works so well because for a uniformly random subset of 𝑼, the expected number of collision at an index of 𝑻 is O(1). It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: “How can we achieve worst case O(1) search time for a given set 𝑺.”

How to achieve worst case O(1) search time

Key idea to achieve worst case O(1) search time Observation: Of course, no single hash function is good for every possible 𝑺. But we may strive for a hash function which is good for a given 𝑺. A promising direction: Find out a set of hash functions H such that For any given 𝑺, many of them are good. Select a function randomly from H and try for 𝑺. The notion of goodness is captured formally by Universal hash family in the following slide.

Universal Hash Family

Universal Hash Family Definition: A collection 𝑯 of hash-functions is said to be universal if there exists a constant 𝑐 such that for any 𝑖,𝑗∈𝑼, 𝐏 𝒉 ∈ 𝑟 𝑯 𝒉 𝑖 =𝒉 𝑗 ≤ 𝑐 𝑛 Fact: Set of all functions from 𝑼 to [𝒏] is a universal hash family (do it as homework). Question: Can we use the set of all functions as universal hash family in real life ? Answer: No. There are 𝑛 𝑚 possible functions. Every pair of them must differ in at least one bit. At least one of them will require 𝑚 log 𝑛 bits to encode. So the space occupied by a randomly chosen hash function is too large . Question: Does there exist a Universal hash family whose hash functions have a compact encoding?

Universal Hash Family Definition: A collection 𝑯 of hash-functions is said to be universal if there exists a constant 𝑐 such that for any 𝑖,𝑗∈𝑼, 𝐏 𝒉 ∈ 𝑟 𝑯 𝒉 𝑖 =𝒉 𝑗 ≤ 𝑐 𝑛 There indeed exist many c-Universal hash families with compact hash function  Example: Let 𝒉 𝒂 : 𝑼 [𝒏] defined as 𝒉 𝒂 𝑖 = 𝒂𝑖 𝐦𝐨𝐝 𝒑 𝐦𝐨𝐝 𝒏 𝑯= 𝒉 𝒂 𝟏≤𝒂≤𝒑−𝟏} is 𝑐-universal. This looks complicated. In the next class we shall show that it is very natural and intuitive. For today’s lecture, you don’t need it 

Static Hashing worst Case O(1) search time

The Journey One Milestone in Our Journey: Tools Needed: A perfect hash function using hash table of size O( 𝑠 2 ) Tools Needed: Universal Hash Family where 𝑐 is a small constant Elementary Probability

Perfect hashing using O( 𝒔 𝟐 ) space Let 𝑯 be Universal Hash Family. Let 𝑿 : the number of collisions for 𝑺 when 𝒉 ∈ 𝑟 𝑯 ? Question: What is 𝐄[𝑿] ? 𝑿 𝑖,𝑗 = 𝟏 if 𝒉 𝑖 =𝒉(𝑗) 𝟎 otherwise 𝑿= 𝑖<𝑗 𝐚𝐧𝐝 𝑖,𝑗∈𝑺 𝑿 𝑖,𝑗 𝐄 𝑿 = 𝑖<𝑗 𝐚𝐧𝐝 𝑖,𝑗∈𝑺 𝐄[ 𝑿 𝑖,𝑗 ] = 𝑖<𝑗 𝐚𝐧𝐝 𝑖,𝑗∈𝑺 𝐏[ 𝑿 𝑖,𝑗 =𝟏] ≤ 𝑖<𝑗 𝐚𝐧𝐝 𝑖,𝑗∈𝑺 𝒄 𝒏 = 𝒄 𝒏 ∙ 𝒔(𝒔−𝟏) 𝟐

Perfect hashing using O( 𝒔 𝟐 ) space Let 𝑯 be Universal Hash Family. Let 𝑿 : the number of collisions for 𝑺 when 𝒉 ∈ 𝑟 𝑯 ? Lemma1: 𝐄[𝑿]= 𝒄 𝒏 ∙ 𝒔(𝒔−𝟏) 𝟐 Question: How large should 𝒏 be to achieve no collision ? Question: How large should 𝒏 be to achieve 𝐄 𝑿 = 𝟏 𝟐 ? Answer: Pick 𝒏=𝒄 𝒔 𝟐 .

Perfect hashing using O( 𝒔 𝟐 ) space Let 𝑯 be Universal Hash Family. Let 𝑿 : the number of collisions for 𝑺 when 𝒉 ∈ 𝑟 𝑯 ? Lemma1: 𝐄[𝑿]= 𝒄 𝒏 ∙ 𝒔(𝒔−𝟏) 𝟐 Observation: 𝐄 𝑿 ≤ 𝟏 𝟐 when 𝒏=𝒄 𝒔 𝟐 . Question: What is the probability of no collision when 𝒏=𝒄 𝒔 𝟐 ? Answer: “No collision”  “𝑿=𝟎” P(No collision ) = P(𝑿=𝟎) = 𝟏 − P(𝑿≥𝟏) ≥𝟏 − 𝟏 𝟐 = 𝟏 𝟐 Use Markov’s Inequality to bound it.

Perfect hashing using O( 𝒔 𝟐 ) space Let 𝑯 be Universal Hash Family. Lemma2: For 𝒏=𝒄 𝒔 𝟐 , there will be no collision with probability at least 1 2 . Algorithm1: Perfect hashing for 𝑺 Repeat Pick 𝒉 ∈ 𝑟 𝑯 ; 𝒕  the number of collisions for 𝑺 under 𝒉. Until 𝒕=𝟎. Theorem: A perfect hash function can be computed for 𝑺 in expected O( 𝒔 𝟐 ) time. Corollary: A hash table occupying O( 𝒔 𝟐 ) space and worst case O(𝟏) search time.

Hashing with O(𝒔) space and O(1) worst case search time We have completed almost 90% of our journey. To achieve the goal of O(𝒔) space and worst case O(𝟏) search time, here is the sketch (the details will be given in the beginning of the next class) Use the same hashing scheme as used in Algorithm1 except that use 𝒏= O(𝒔). Of course, there will be collisions. Use an additional level of hash tables to take care of collisions. In the next class: We shall complete our algorithm for hashing with O(𝒔) space and O(1) worst case search time We shall present a very natural way to design various Universal Hash Families.