Data Structures Hashing Uri Zwick January 2014.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
© 2004 Goodrich, Tamassia Hash Tables1  
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
Dictionaries and Hash Tables1  
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Advanced Algorithms for Massive Datasets Basics of Hashing.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Table March COP 3502, UCF.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CSC 172 DATA STRUCTURES. SETS and HASHING  Unadvertised in-store special: SETS!  in JAVA, see Weiss 4.8  Simple Idea: Characteristic Vector  HASHING...The.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Introduction to Algorithms 6.046J/18.401J
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Presentation transcript:

Data Structures Hashing Uri Zwick January 2014

Dictionaries Can use balanced search trees O(log n) time per operation D  Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k) – Find an item with key k in D Delete(D,k) – Delete item with key k from D (Predecessors and successors, etc., not supported) Can use balanced search trees O(log n) time per operation Can we do better? YES !!!

Dictionaries with “small keys” Suppose all keys are in [m] = {0,1,…,m−1}, where m = O(n) Can implement a dictionary using an array D of length m. Direct addressing 1 m-1 Special case: Sets D is a bit vector O(1) time per operation (after initialization) (Assume different items have different keys.) What if m>>n ? Use a hash function

Hashing Huge universe U Hash function h Collisions 1 m-1 Hash table

Hashing with chaining [Luhn (1953)] [Dumey (1956)] Each cell points to a linked list of items 1 m-1 i

Hashing with chaining with a random hash function Balls in Bins Throw n balls randomly into m bins

Balls in Bins Throw n balls randomly into m bins All throws are uniform and independent

Balls in Bins Throw n balls randomly into m bins Expected number of balls in each bin is n/m When n=(m), with probability of at least 11/n, all bins contain at most O(log n/(log log n)) balls

What makes a hash function good? Behaves like a “random function” Has a succinct representation Easy to compute A single hash function cannot satisfy the first condition

Families of hash functions We cannot choose a “truly random” hash function Using a fixed hash function is usually not a good idea Compromise: Choose a random hash function h from a carefully chosen family H of hash functions Each function h from H should have a succinct representation and should be easy to compute Goal: For every sequence of operations, the running time of the data structure, when a random hash function h from H is chosen, is expected to be small

Modular hash functions [Carter-Wegman (1979)] p – prime number Form a “Universal Family” (see below) Require (slow) divisions

Multiplicative hash functions [Dietzfelbinger-Hagerup-Katajainen-Penttonen (1997)]  Form an “almost-universal” family (see below) Extremely fast in practice!

Tabulation based hash functions [Patrascu-Thorup (2012)] “byte” A variant can also be used to hash strings hi can be stored in a small table + Very efficient in practice Very good theoretical properties

Universal families of hash functions [Carter-Wegman (1979)] A family H of hash functions from U to [m] is said to be universal if and only if A family H of hash functions from U to [m] is said to be almost universal if and only if

k-independent families of hash functions A family H of hash functions from U to [m] is said to be k-independent if and only if A family H of hash functions from U to [m] is said to be almost k-independent if and only if

A simple universal family [Carter-Wegman (1979)] To represent a function from the family we only need two numbers, a and b. The size m of the hash table can be arbitrary.

A simple universal family [Carter-Wegman (1979)]

Probabilistic analysis of chaining n – number of elements in dictionary D m – size of hash table =n/m – load factor Assume that h is randomly chosen from a universal family H Expected Worst-case Successful Search Delete Unsuccessful Search (Verified) Insert

Chaining: pros and cons Simple to implement (and analyze) Constant time per operation (O(1+)) Fairly insensitive to table size Simple hash functions suffice Cons: Space wasted on pointers Dynamic allocations required Many cache misses

Hashing with open addressing Hashing without pointers Insert key k in the first free position among Assumed to be a permutation No room found  Table is full To search, follow the same order

Hashing with open addressing

How do we delete elements? Caution: When we delete elements, do not set the corresponding cells to null! “deleted” Problematic solution…

Probabilistic analysis of open addressing n – number of elements in dictionary D m – size of hash table =n/m – load factor (Note: 1) Uniform probing: Assume that for every k, h(k,0),…,h(k,m−1) is random permutation Expected time for unsuccessful search Expected time for successful search

Probabilistic analysis of open addressing Claim: Expected no. of probes for an unsuccessful search is at most: If we probe a random cell in the table, the probability that it is full is . The probability that the first i cells probed are all occupied is at most i.

Open addressing variants How do we define h(k,i) ? Linear probing: Quadratic probing: Double hashing:

Linear probing “The most important hashing technique” More probes than uniform probing, as probe sequences “merge” But, much less cache misses Extremely efficient in practice More complicated analysis (Requires 5-independence or tabulation hashing)

Linear probing – Deletions Can the key in cell j be moved to cell i?

Linear probing – Deletions When an item is deleted, the hash table is in exactly the state it would have been if the item were not inserted!

Expected number of probes Assuming random hash functions Successful Search Unsuccessful Search Uniform Probing Linear Probing [Knuth (1962)] When, say, 0.6, all small constants

Expected number of probes 0.5

Perfect hashing Suppose that D is static. We want to implement Find is O(1) worst case time. Perfect hashing: No collisions Can we achieve it?

Expected no. of collisions

Expected no. of collisions If we are willing to use m=n2, then any universal family contains a perfect hash function. No collisions!

Two level hashing [Fredman, Komlós, Szemerédi (1984)]

Two level hashing [Fredman, Komlós, Szemerédi (1984)]

Two level hashing [Fredman, Komlós, Szemerédi (1984)] Assume that each hi can be represented using 2 words Total size:

A randomized algorithm for constructing a perfect two level hash table: Choose a random h from H(p,n) and compute the number of collisions. If there are more than n collisions, repeat. For each cell i,if ni>1, choose a random hash function hi from H(p,ni2). If there are any collisions, repeat. Expected construction time – O(n) Worst case search time – O(1)

Cuckoo Hashing [Pagh-Rodler (2004)]

Cuckoo Hashing [Pagh-Rodler (2004)] O(1) worst case search time! What is the (expected) insert time?

Cuckoo Hashing [Pagh-Rodler (2004)] Difficult insertion How likely are difficult insertion?

Cuckoo Hashing [Pagh-Rodler (2004)] Difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] Difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] Difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] Difficult insertion How likely are difficult insertion?

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A more difficult insertion

Cuckoo Hashing [Pagh-Rodler (2004)] A failed insertion If Insertion takes more than MAX steps, rehash

Cuckoo Hashing [Pagh-Rodler (2004)] With hash functions chosen at random from an appropriate family of hash functions, the amortized expected insert time is O(1)

Other applications of hashing Comparing files Cryptographic applications …