CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
September 26, Algorithms and Data Structures Lecture VI Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Data Structures – LECTURE 11 Hash tables
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Hashing is another method for sorting and searching data.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
Many slides here are based on E. Demaine , D. Luebke slides
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Introduction to Algorithms 6.046J/18.401J
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Space-for-time tradeoffs
EE 312 Software Design and Implementation I
CS 5243: Algorithms Hash Tables.
CS 3343: Analysis of Algorithms
EE 312 Software Design and Implementation I
Presentation transcript:

CS6045: Advanced Algorithms Data Structures

Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated data Symbols: variable names, procedure names, etc. Associated data: memory location, call graph, etc. –For a symbol table (also called a dictionary), we care about search, insertion, and deletion –We typically don’t care about sorted order

Hash Tables More formally: –Given a table T and a record x, with key (= symbol), we need to support: Insert (T, x) Delete (T, x) Search(T, x) –We want these to be fast, but don’t care about sorting the records The structure we will use is a hash table –Supports all the above in O(1) expected time!

Hashing: Keys In the following discussions we will consider all keys to be (possibly large) natural numbers How can we convert floats to natural numbers for hashing purposes? How can we convert ASCII strings to natural numbers for hashing purposes?

Direct Addressing Suppose: –The range of keys is 0..m-1 –Keys are distinct The idea: –Set up an array T[0..m-1] in which T[i] = xif x  T and key[x] = i T[i] = NULLotherwise –This is called a direct-address table Operations take O(1) time! So what’s the problem?

Direct Addressing Direct addressing works well when the range m of keys is relatively small But what if the keys are 32-bit integers? –Problem 1: direct-address table will have 2 32 entries, more than 4 billion –Problem 2: even if memory is not an issue, actually stored maybe so small, so that most of the space allocated would be wasted Solution: map keys to smaller range 0..m-1 This mapping is called a hash function

Hash Functions Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing

Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining How do we insert an element? –How about the element was already inserted? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys ) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Chaining —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys ) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7 How do we delete an element? –Do we need a doubly-linked list for efficient delete?

Chaining How do we search for a element with a given key? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  )

Analysis of Chaining Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot What will be the average cost of an unsuccessful search for a key? A: O(1+  ) What will be the average cost of a successful search? A: O(1 +  )

Analysis of Chaining Continued So the cost of searching = O(1 +  ) If the number of keys n is proportional to the number of slots in the table, what is  ? A:  = n/m = O(1) –In other words, we can make the expected cost of searching constant if we make  constant The cost of searching = O(1) if we size our table appropriately

Choosing A Hash Function Choosing the hash function well is crucial –Bad hash function puts all elements in same slot –A good hash function: Should distribute keys uniformly into slots Should not depend on patterns in the data We discussed three methods: –Division method –Multiplication method –Universal hashing

Hash Functions: The Division Method h(k) = k mod m –In words: hash k into a table with m slots using the slot given by the remainder of k divided by m –Example: m = 20, k = 91  h(k) = ? Advantage: fast Disadvantage: –What happens to elements with adjacent values of k? –What happens if m is a power of 2 (say 2 P )? –What if m is a power of 10? Upshot: pick table size m = prime number not too close to a power of 2 (or 10)

Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA -  kA  )  What doe this term represent? The fractional part of kA Disadvantage: slower than division method Advantage: m is not critical

Hash Functions: The Multiplication Method For a constant A, 0 < A < 1: h(k) =  m (kA -  kA  )  Upshot: –Choose m = 2 P –Choose A not too close to 0 or 1 –Knuth: Good choice for A = (  5 - 1)/2 = … Fractional part of kA

Hash Functions: Universal Hashing As before, when attempting to foil an malicious adversary: randomize the algorithm Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored –Guarantees good performance on average, no matter what keys adversary chooses –Need a family of hash functions to choose from

Universal Hashing Let  be a (finite) collection of hash functions –…that map a given universe U of keys… –…into the range {0, 1, …, m - 1}.  is said to be universal if: –for each pair of distinct keys x, y  U, the number of hash functions h   for which h(x) = h(y) is |  |/m –In other words: With a random hash function from , the chance of a collision between x and y is exactly 1/m (x  y)

Universal Hashing Theorem 11.3: –Choose h from a universal family of hash functions –Hash n keys into a table T of m slots, n  m –Using chaining to resolve collisions If key k is not in the table, the expected length of the list that k hashes to is   = n/m If key k is in the table, the expected length of the list that holds k is  1 + 

Open Addressing Basic idea (details in Section 11.4): –To insert: if slot is full, try another slot, …, until an open slot is found (probing) –To search, follow same sequence of probes as would be used when inserting the element If reach element with correct key, return it If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion) Table needn’t be much bigger than n