Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
© 2004 Goodrich, Tamassia Hash Tables1  
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing CS 3358 Data Structures.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
Dictionaries and Hash Tables1  
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
REPRESENTING SETS CSC 172 SPRING 2002 LECTURE 21.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Data Structures Hashing Uri Zwick January 2014.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
CHAPTER 9 HASH TABLES, MAPS, AND SKIP LISTS ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Advanced Associative Structures
Randomized Algorithms CS648
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Presentation transcript:

Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick May 2010

2 Dictionaries D  Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k) – Find an item with key k in D Delete(D,k) – Delete item with key k from D Can use balanced search trees O(log n) time per operation (Predecessors and successors, etc., not required) Can we do better? YES !!!

3 Dictionaries with “small keys” Suppose all keys are in {0,1,…,m−1}, where m=O(n) Can implement a dictionary using an array D of length m. (Assume different items have different keys.) O(1) time per operation (after initialization) What if m>>n ?Use a hash function 01m-1 Special case: Sets D is a bit vector

Hashing Huge universe U Hash table 01m-1 Hash function h Collisions

Hashing with chaining Each cell points to a linked list of items 01m-1 i

What makes a hash function good? Should behave like a “random function” Should have a succinct representation Should be easy to compute Usually interested in families of hash functions Allows rehashing, resizing, …

Simple hash functions  The modular method The multiplicative method

Tabulation based hash functions … + … Can be used to hash strings h i can be stored in a small table “byte”

Universal families of hash functions A family H of hash functions from U to [m] is said to be universal if and only if

A simple universal family To represent a function from the family we only need two numbers, a and b. The size m of the hash table is arbitrary.

Probabilistic analysis of chaining n – number of elements in dictionary D m – size of hash table Assume that h is randomly chosen from a universal family H ExpectedWorst-case Successful Search Delete Unsuccessful Search (Verified) Insert  =n/m – load factor

Chaining: pros and cons Pros: Simple to implement (and analyze) Constant time per operation (O(n/m)) Fairly insensitive to table size Simple hash functions suffice Cons: Space wasted on pointers Dynamic allocations required Many cache misses

Hashing with open addressing Hashing without pointers Insert key k in the first free position among Assumed to be a permutation To search, follow the same order No room found  Table is full

Hashing with open addressing

How do we delete elements? Caution: When we delete elements, do not set the corresponding cells to null! “deleted” Problematic solution…

Probabilistic analysis of open addressing n – number of elements in dictionary D m – size of hash table Uniform probing: Assume that for every k, h(k,0),…,h(k,m-1) is random permutation  =n/m – load factor (Note:  1) Expected time for unsuccessful search Expected time for successful search

Probabilistic analysis of open addressing Claim: Expected no. of probes for an unsuccessful search is at most: If we probe a random cell in the table, the probability that it is full is . The probability that the first i cells probed are all occupied is at most  i.

Open addressing variants Linear probing: Quadratic probing: Double hashing: How do we define h(k,i) ?

Linear probing “The most important hashing technique” But, much less cache misses More probes than uniform probing, as probe sequences “merge” More complicated analysis (Universal hash families, as defined, do not suffice.)

Linear probing – Deletions Can the key in cell j be moved to cell i?

Expected number of probes Successful Search Unsuccessful Search Uniform Probing Linear Probing When, say,  0.6, all small constants

Expected number of probes

Perfect hashing Suppose that D is static. We want to implement Find is O(1) worst case time. Perfect hashing: No collisions Can we achieve it?

Expected no. of collisions

No collisions! If we are willing to use m=n 2, then any universal family contains a perfect hash function.

Two level hashing [Fredman, Komlós, Szemerédi (1984)]

Total size: Assume that each h i can be represented using 2 words Two level hashing [Fredman, Komlós, Szemerédi (1984)]

A randomized algorithm for constructing a perfect two level hash table: Choose a random h from H(n) and compute the number of collisions. If there are more than n collisions, repeat. For each cell i,if n i >1, choose a random hash function from H(n i 2 ). If there are any collisions, repeat. Expected construction time – O(n) Worst case search time – O(1)

Other applications of hashing Comparing files Cryptographic applications …