Hash table CSC317 We have elements with key and satellite data

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Spring 2015 Lecture 6: Hash Tables
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Data Structures Using C++ 2E
Hashing Jeff Chastine.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hashing and Hash Tables
Computer Science 2 Hashing
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Hash Tables.
Data Structures and Algorithms
Dictionaries and Their Implementations
Hashing.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Data Structures and Algorithms
Hash Tables – 2 Comp 122, Spring 2004.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
CS 5243: Algorithms Hash Tables.
Ch Hash Tables Array or linked list Binary search trees
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
CS 3343: Analysis of Algorithms
Slide Sources: CLRS “Intro. To Algorithms” book website
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Lecture-Hashing.
Presentation transcript:

Hash table CSC317 We have elements with key and satellite data Operations performed: Insert, Delete, Search/lookup We don’t maintain order information We’ll see that all operations on average O(1) But worse case can be O(n) CSC317

Hash table Simple implementation: If universe of keys comes from a small set of integers [0..9], we can store directly in array using the keys as indices into the slots This is also called a direct-address table Search time just like in array – O(1)! CSC317

CSC317 Example: Array versus Hash table Imagine we have keys corresponding to friends that we want to store Could use huge array, with each friend’s name mapped to some slot in array (eg, one slot in array for every possible name; each letter one of 26 characters, n letters in each name…) We could insert, find key, and delete element in O(1) time – very fast! But huge waste of memory, with many slots empty in many applications! John = A[34] Jane = A[33334] CSC317

CSC317 Example: versus Linked List An alternative might be to use a linked list with all the friend names linked (e.g. John -> Jane -> Bill) Pro: This is not wasteful because we only store the names that we want Con: Search goes with O(n) Can we have an approach that is fast and not wasteful (like best of both worlds)? CSC317

Hash table Extremely useful alternative to static array for insert, search, and delete in O(1) time (on average) – VERY FAST Useful when the universe is large, but at any given time number of keys stored is small relative to total number of possible keys (not wasteful like a huge static array) We usually don’t store key directly as index into array, but rather compute a hash function of the key k, h(k), as index Problem: Collision (i.e. two keys map to the same slot) CSC317

CSC317 Collisions Guaranteed to happen, when more keys than slots Or if “bad” hashing function – all keys were hashed to just one slot of hash table (more later …) Even by chance, collisions are likely to happen. Consider keys that are birthdays. Recall the birthday paradox – room of 28 people, then 2 people have a 50 percent chance to have same birthday Resolution? Chaining: CSC317

Analysis Worst case: all n elements map to one slot (one big linked list…). O(n) Average case: Need to define: m number of slots n number of keys in hashtable alpha = load factor Intuitively, alpha is average number elements per linked list. CSC317

CSC317 Hash table analyses Example: let’s take n = m; alpha = 1 Good hash function: each element of hash table has one linked list Bad hash function: hash function always maps to first slot of hash table, one linked list size n, and all other slots are empty. Define: Unsuccessful search: new key searched for doesn’t exist in hash table (we are searching for a new friend, Sarah, who is not yet in hash table) Successful search: key we are searching for already exists in hash table (we are searching for Tom, who we have already stored in hash table) CSC317

CSC317 Hash table analyses Theorem: Assuming uniform hashing, unsuccessful search takes on average O(α + 1). Here the actual search time is O(α) and the added 1 is the constant time to compute a hash function on the key that is being searched. Interpretation: n = m, θ(1+1) = θ(1) n = 2m, θ(2+1) = θ(1) n = m3, θ(m2+1) ≠ θ(1) we say constant time on average when n and m similar order, but not generally guaranteed CSC317

CSC317 Hash table analyses Theorem: Intuitively: Search for key k, hash function will map onto slot h(k). We need to search through linked list in the slot mapped to, up until the end of the list (because key is not found = unsuccessful search). For n=2m, on average linked list is length 2. More generally, on average length is α, our load factor. CSC317

CSC317 Hash table analyses Proof with indicator random variables: Consider keys j in hash table (n of them), and key k not in hash table that we are searching for. For each j: As with indicator (binary) random variables: By our definition of the random variable: Since we assume uniform hashing: If key x hashes to same slot as key j (i.e. h(x) = h(j)) otherwise CSC317

CSC317 Hash table analyses Proof with indicator random variables: We want to consider key x with regards to every possible key j in hash table: Linearity of expectations: CSC317

CSC317 Hash table analyses We’ve proved that average search time is O(α) for unsuccessful search and uniform hashing. It is O(α+1) if we also count the constant time of mapping a hash function for a key CSC317

CSC317 Hash table analyses We’ve proved that average search time is O(α) for unsuccessful search and uniform hashing. It is O(α+1) if we also count the constant time of mapping a hash function for a key. Theorem: Assuming uniform hashing, successful search takes on average O(α+1). Here, the actual search time is O(α) and the added 1 is the constant time to compute a hash function on the key that is being searched. Intuitively, successful search should take less than unsuccessful. Proof in book with indicator random variables;; more involved than before; we won’t prove here. CSC317

What makes a good hash function? Uniform hashing: Each key equally likely to hash to each of m slots (1/m probability) Could be hard in practice: we don’t always know the distributions of keys we will encounter and want to store – could be biased We would like to choose hash functions that do a good job in practice at distributing keys evenly CSC317

CSC317 Example 1 for potential bias Key = phone number Bad hash function: First three digits of phone number (eg, what if all friends in Miami, and in any case likely to have patterns of regularity) Better although could still have patterns: last 3 digits of phone number CSC317

CSC317 Example 2 for potential bias Hash function: h(k) = k mod 100 with k the key For example, keys happen to be all even numbers key1 = 1986 key1 mod 100 = 86 key2 = 2014 key2 mod 100 = 14 Pattern: keys all map to even values. All odd slots of hash table unused! Question: How do we determine a (preferably good) hash function? CSC317

CSC317 How do we determine a (preferably good) hash function? Division method • h(k) = k mod m, m = number of slots, k = key Example: m=20; k=91, h(k) = k mod m = 11 Pros: Any key will indeed map to one of m slots (as we want from a hash function, mapping a key to one of m slots) Fast and simple Cons (pay attention to …): Need to avoid certain values of m to avoid bias (as in the even number example) Best practices: m that is a prime number and not too close to base of 2 or 10 CSC317

Resolution to collisions Chaining Another approach: open addressing Only one object per slot is allowed As a consequence because α load factor, n number of keys, m number of slots No pointers or linked lists Good when space is limited Deleting keys more tricky than chaining CSC317

Main approach: If collision, keep probing hash table until empty slot is found. To determine which slots to probe, we extend the hash function to include the probe number (starting from 0) as a second input. Formally, we require that for every key k, the probe sequence is a permutation of CSC317

CSC317 We could do the following: HASH-INSERT(T, k) 1 i=0 2 repeat 3 j = h(k, i) if T[j] ==NIL 5 T[j] =k 6 return j 7 else i = i + 1 8 until i == m 9 error ‘hash table overflow’ What’s the problem? Only works well when hashing is uniform (i.e. each permutation of has same probability). Better solutions: Linear probing Double hashing CSC317

Linear probing When inserting into hash table (also when searching)- If hash function results in collision, try the next available slot; if that results in collision try the next slot after that, and so on (if slot 3 is full, try slot 4, then slot 5, then slot 6, and so on) using the following auxiliary hash function: for all normal hash function as before to make sure it is mapped overall to one of the m slots in the hash table number of slots to skip for probe i CSC317

Linear probing CSC317 2 9 16 1 Example: 2 6 3 4 5 6 1 2 3 4 5 6 2 9 16 6 Example: Insert keys 2, 6, 9, 16 CSC317

Linear probing CSC317 Pro: easy to implement Con: can result in primary clustering keys can cluster by taking adjacent slots in hash table, since each time searching for next available slot when there is collision Consequence: Longer search time … What about if we insert 2 or 3 slots away instead of 1 slot away? CSC317

Linear probing What about if we insert 2 or 3 slots away instead of 1 slot away? Answer: still have problem that if two keys initially mapped to same hash slot, they have identical probe sequences, since offset of next slot to check doesn’t depend on key What to do? Make offset determined by another key function – double hashing! CSC317