David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 CS 332: Algorithms Skip Lists Introduction to Hashing.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables Introduction to Algorithms Hash Tables CSE 680 Prof. Roger Crawfis.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
© 2004 Goodrich, Tamassia Hash Tables1  
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Data Structures – LECTURE 11 Hash tables
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hashing Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Hashing Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures.
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Introduction to Algorithms 6.046J/18.401J
Hash Tables – 2 Comp 122, Spring 2004.
CSCE 3110 Data Structures & Algorithm Analysis
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 5243: Algorithms Hash Tables.
CS 3343: Analysis of Algorithms
Hash Tables – 2 1.
Presentation transcript:

David Luebke 1 11/26/2015 Hash Tables

David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search, insertion, and deletion ■ We typically don’t care about sorted order ■ Example: given a student’s SSN, give their GPA.

David Luebke 3 11/26/2015 Hash Tables ● More formally: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support: ○ Insert (T, x) ○ Delete (T, x) ○ Search(T, x) ■ We want these to be fast, but don’t care about sorting the records ● The structure we will use is a hash table ■ Supports all the above in O(1) expected time!

David Luebke 4 11/26/2015 Hashing: Keys ● In the following discussions we will consider all keys to be (possibly large) natural numbers ● How can we convert ASCII strings to natural numbers for hashing purposes?

David Luebke 5 11/26/2015 Direct Addressing ● Suppose: ■ The range of keys is 0..m-1 ■ Keys are distinct ● The idea: ■ Set up an array T[0..m-1] in which ○ T[i] = xif x  T and key[x] = i ○ T[i] = NULLotherwise ■ This is called a direct-address table ○ Operations take O(1) time! ○ So what’s the problem?

David Luebke 6 11/26/2015 The Problem With Direct Addressing ● Direct addressing works well when the range m of keys is relatively small ● But what if the keys are 32-bit integers? ■ Problem 1: direct-address table will have 2 32 entries, more than 4 billion ■ Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be ● Solution: map keys to smaller range 0..m-1 ● This mapping is called a hash function

David Luebke 7 11/26/2015 Example: SSN’s ● Suppose we have student records, that we want to index by the key of social security numbers. ● Direct Addressing means having an array of 1,000,000,000 entries ● What if I just have an array of 5,000,000 entries and divide each number by two before putting the record in the array? ● Dividing by two is the hash function ● What’s the problem with doing this?

David Luebke 8 11/26/2015 Hash Functions ● Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

David Luebke 9 11/26/2015 Resolving Collisions ● How can we solve the problem of collisions? ● Solution 1: chaining ● Solution 2: open addressing

David Luebke 10 11/26/2015 Open Addressing ● Basic idea ■ To insert: if slot is full, try another slot, …, until an open slot is found (probing) ■ To search, follow same sequence of probes as would be used when inserting the element ○ If reach element with correct key, return it ○ If reach a NULL pointer, element is not in table ● Good for fixed sets (adding but no deletion)

David Luebke 11 11/26/2015 Open Addressing Hashing ● Approach ■ Hash table contains objects ■ Probe  examine table entry ■ Collision ○ Move K entries past current location ○ Wrap around table if necessary ■ Find location for X Examine entry at A[ key(X) ] If entry = X, found If entry = empty, X not in hash table Else increment location by K, repeat

David Luebke 12 11/26/2015 Open Addressing Hashing ● Approach ■ Linear probing ○ Move one slot in the table at a time ○ May form clusters of contiguous entries ■ Deletions – the tricky part ○ Find location for X ○ If X inside cluster, leave non-empty marker ■ Insertion ○ Find location for X ○ Insert if X not in hash table ○ Can insert X at first non-empty marker

David Luebke 13 11/26/2015 Open Addressing Example ● Hash codes ■ H(A) = 6H(C) = 6 ■ H(B) = 7H(D) = 7 ● Hash table ■ Size = 8 elements ■  = empty entry ■ * = non-empty marker ● Linear probing ■ Collision  move 1 entry past current location 

David Luebke 14 11/26/2015 Open Addressing Example ● Operations (A=>6, B =>7, C=>6, D=>7) ■ Insert A, Insert B, Insert C, Insert D AA ABAB ABCABC DABCDABC

David Luebke 15 11/26/2015 Open Addressing Example ● Operations (A=>6, B =>7, C=>6, D=>7) ■ Find A, Find B, Find C, Find D DABCDABC DABCDABC DABCDABC DABCDABC

David Luebke 16 11/26/2015 Open Addressing Example ● Operations (A=>6, B =>7, C=>6, D=>7) ■ Delete A, Delete C, Find D, Insert C DCB*DCB* D*BCD*BC D*B*D*B* D*B*D*B*

David Luebke 17 11/26/2015 Types of Probing ● Linear: If there’s a collision, skip by a constant number of entries (we did 1 at a time) ■ Problem: creates clusters ● Quadratic probing: for k, the key, and i, the number of times I’ve probed so far, h(k,i) = (h’(k) + ci + di 2 ) mod m where c and d are constants and h’ is yet another hash function ■ Result: the more collisions I get, the more space I skip

David Luebke 18 11/26/2015 Types of Probing ● Double hashing h(k,i) = (h1(k) + i*h2(k)) mod m for h1 and h2 auxiliary hash functions. ■ first hash depends on h1 only ■ collisions skip different amounts ● Some of the best hash functions in practice are double hash functions

David Luebke 19 11/26/2015 Chaining ● Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 20 11/26/2015 Chaining ● How do we insert an element? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 21 11/26/2015 Chaining —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7 ● How do we delete an element?

David Luebke 22 11/26/2015 Chaining ● How do we search for a element with a given key? —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 23 11/26/2015 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table: the load factor  = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key?

David Luebke 24 11/26/2015 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+  )

David Luebke 25 11/26/2015 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+  ) ● What will be the average cost of a successful search?

David Luebke 26 11/26/2015 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor  = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+  ) ● What will be the average cost of a successful search? A: O(1 +  /2) = O(1 +  )

David Luebke 27 11/26/2015 Analysis of Chaining Continued ● So the cost of searching = O(1 +  ) ● If the number of keys (n) is proportional to the number of slots in the table (m), what is  ? ● A:  = O(1) ■ In other words, we can make the expected cost of searching constant if we make  constant

David Luebke 28 11/26/2015 Choosing A Hash Function ● Clearly choosing the hash function well is crucial ■ What will a worst-case hash function do? ■ What will be the time to search in this case? ● What are desirable features of the hash function? ■ Should distribute keys uniformly into slots ■ Should not depend on patterns in the data

David Luebke 29 11/26/2015 Hash Functions: Worst Case Scenario ● Scenario: ■ You are given an assignment to implement hashing ■ You will self-grade in pairs, testing and grading your partner’s implementation ■ In a blatant violation of the honor code, your partner: ○ Analyzes your hash function ○ Picks a sequence of “worst-case” keys, causing your implementation to take O(n) time to search ● What’s an honest CS student to do?

David Luebke 30 11/26/2015 Hash Functions: Universal Hashing ● As before, when attempting to foil an malicious adversary: randomize the algorithm ● Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored ■ Guarantees good performance on average, no matter what keys adversary chooses

David Luebke 31 11/26/2015 Practice ● Demonstrate the insertion of the following keys into a hash table with collisions resolved by chaining. Let the table have 9 slots and let the hash function be h(k) = k mod 9. ● Keys are: 5,28,19,15,20,33,12,17,10 ● Do the same for an open address table with linear probing.