Hash functions Open addressing

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hash Table March COP 3502, UCF.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Hash Tables.  What can we do if we want rapid access to individual data items?  Looking up data for a flight in an air traffic control system  Looking.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Copyright © Curt Hill Hashing A quick lookup strategy.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
Slides by Steve Armstrong LeTourneau University Longview, TX
Hashing Alexandra Stefan.
Hashing - Hash Maps and Hash Functions
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Quadratic probing Double hashing Removal and open addressing Chaining
Advanced Associative Structures
Hash Table.
Hash Table.
Hash Tables.
Dictionaries and Their Implementations
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Double hashing Removal (open addressing) Chaining
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Chapter 12.7 Wherein we throw all the data into random array slots and somehow obtain O(1) retrieval time Nyhoff, ADTs, Data Structures and.
Data Structures – Week #7
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Ch Hash Tables Array or linked list Binary search trees
Ch. 13 Hash Tables  .
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
What we learn with pleasure we never forget. Alfred Mercier
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Lecture-Hashing.
Presentation transcript:

Hash functions Open addressing Hash Tables Hash functions Open addressing March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Hash functions A hash function is a function that map key values to array indexes Hash functions are performed in two steps Map the key value to an integer Map the integer to a legal array index Hash functions should have the following properties Fast Deterministic Uniformity March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Hash function speed Hash functions should be fast and easy to calculate Access to a hash table should be nearly instantaneous and in constant time Most common hash functions require a single division on the representation of the key Converting the key to a number should also be able to be performed quickly March 07, 2018 Cinda Heeren / Geoffrey Tien

Deterministic hash functions A hash function must be deterministic For a given input it must always return the same value Otherwise it will not generate the same array index And the item will not be found in the hash table Hash functions should therefore not be determined by System time Memory location Pseudo-random numbers March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Scattering data A typical hash function usually results in some collisions Where two different search keys map to the same index The goal is to reduce the number and effect of collisions To achieve this the data should be distributed evenly over the table March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Possible values and expected inputs Any set of values stored in a hash table is an instance of the universe of possible values The universe of possible values may be much larger than the instance we wish to store There are many possible combinations of 10 letters But we might want a hash table to store 1,000 names March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Uniformity A good hash function generates each value in the output range with the same probability That is, each legal hash table index has the same chance of being generated This property should hold for the universe of possible values and for the expected inputs The expected inputs should also be scattered evenly over the hash table March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien A bad hash function A hash table is to store 1,000 numeric estimates that can range from 1 to 1,000,000 Hash function h(estimate) = estimate % n Where n = array size = 1,000 Is the distribution of values from the universe of all possible values uniform? What about the distribution of expected values? March 07, 2018 Cinda Heeren / Geoffrey Tien

Another bad hash function A hash table is to store 676 names The hash function considers just the first two letters of a name Each letter is given a value where a = 1, b = 2, … Function = (1st letter * 26 + value of 2nd letter) % 676 Is the distribution of values from the universe of all possible values uniform? What about the distribution of expected values? March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien General principles Use the entire search key in the hash function If the hash function uses modulo arithmetic make the table size a prime number A simple and (usually) effective hash function is Convert the key value to an integer, 𝑥 ℎ 𝑥 =𝑥 mod 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒 Where 𝑡𝑎𝑏𝑙𝑒𝑠𝑖𝑧𝑒 is the first prime number larger than twice the size of the number of expected values Determining a good hash function is a complex subject for now, just use provided hash functions March 07, 2018 Cinda Heeren / Geoffrey Tien

Converting strings to integers In the previous examples, we had a convenient numeric key which could be easily converted to an array index what about non-numeric keys (e.g. strings)? Strings are already numbers (in a way) e.g. 7/8-bit ASCII encoding "cat", 'c' = 0110 0011, 'a' = 0110 0001, 't' = 0111 0100 "cat" becomes 6,513,012 March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Strings to integers If each letter of a string is represented as an 8-bit number then for a length n string value = ch0*256n-1 + … + chn-2*2561 + chn-1*2560 For large strings, this value will be very large And may result in overflow (i.e. 64-bit integer, 9 characters will overflow) This expression can be factored (…(ch0*256 + ch1) * 256 + ch2) * …) * 256 + chn-1 This technique is called Horner's Method This minimizes the number of arithmetic operations Overflow can then be prevented by applying the modulo operator after each expression in parentheses March 07, 2018 Cinda Heeren / Geoffrey Tien

Horner’s method example Consider the integer representation of some string, e.g. "Grom" 71*2563 + 114*2562 + 111*2561 + 109*2560 = 1,191,182,336 + 7,471,104 + 28,416 + 109 = 1,198,681,965 Factoring this expression results in (((71*256 + 114) * 256 + 111) * 256 + 109) = 1,198,681,965 Assume that this key is to be hashed to an index using the hash function key % 23 1,198,681,965 % 23 = 4 ((((71 % 23)*256 + 114) % 23 * 256 + 111) % 23 * 256 + 109) % 23 = 4 March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Open Addressing Linear probing March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Collision handling A collision occurs when two different keys are mapped to the same index Collisions may occur even when the hash function is good Inevitable due to pigeonhole principle There are two main ways of dealing with collisions Open addressing Separate chaining March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Open addressing Idea – when an insertion results in a collision look for an empty array element Start at the index to which the hash function mapped the inserted item Look for a free space in the array following a particular search pattern, known as probing There are three major open addressing schemes Linear probing Quadratic probing Double hashing March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Linear probing The hash table is searched sequentially Starting with the original hash location For each time the table is probed (for a free location) add one to the index Search h(search key) + 1, then h(search key) + 2, and so on until an available location is found If the sequence of probes reaches the last element of the array, wrap around to arr[0] Linear probing leads to primary clustering The table contains groups of consecutively occupied locations These clusters tend to get larger as time goes on Reducing the efficiency of the hash table March 07, 2018 Cinda Heeren / Geoffrey Tien

Linear probing example Hash table is size 23 The hash function, h = x mod 23, where x is the search key value The search key values are shown in the table 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 21 March 07, 2018 Cinda Heeren / Geoffrey Tien

Linear probing example Insert 81, h = 81 mod 23 = 12 Which collides with 58 so use linear probing to find a free space First look at 12 + 1, which is free so insert the item at index 13 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 21 March 07, 2018 Cinda Heeren / Geoffrey Tien

Linear probing example Insert 35, h = 35 mod 23 = 12 Which collides with 58 so use linear probing to find a free space First look at 12 + 1, which is occupied so look at 12 + 2 and insert the item at index 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 21 March 07, 2018 Cinda Heeren / Geoffrey Tien

Linear probing example Insert 60, h = 60 mod 23 = 14 Note that even though the key doesn’t hash to 12 it still collides with an item that did First look at 14 + 1, which is free 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 60 21 March 07, 2018 Cinda Heeren / Geoffrey Tien

Linear probing example Insert 12, h = 12 mod 23 = 12 The item will be inserted at index 16 Notice that primary clustering is beginning to develop, making insertions less efficient 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 60 12 21 March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Try It! Insert the items into a hash table of 29 elements using linear probing: 61, 19, 32, 72, 3, 76, 5, 34 Using a hash function: ℎ(𝑥) = 𝑥 mod 29 Using a hash function: ℎ(𝑥) = (𝑥∗17) mod 29 March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Searching Searching for an item is similar to insertion Find 59, ℎ=59 mod 23=13, index 13 does not contain 59, but is occupied Use linear probing to find 59 or an empty space Conclude that 59 is not in the table 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 29 32 58 81 35 60 12 21 Search must use the same probe method as insertion Terminates when item found, empty space, or entire table searched March 07, 2018 Cinda Heeren / Geoffrey Tien

Cinda Heeren / Geoffrey Tien Hash Table Efficiency When analyzing the efficiency of hashing it is necessary to consider load factor, 𝜆 𝜆 = number of items / table size As the table fills, 𝜆 increases, and the chance of a collision occurring also increases Performance decreases as 𝜆 increases Unsuccessful searches make more comparisons An unsuccessful search only ends when a free element is found It is important to base the table size on the largest possible number of items The table size should be selected so that 𝜆 does not exceed 1/2 March 07, 2018 Cinda Heeren / Geoffrey Tien

Readings for this lesson Carrano & Henry Chapter 18.4.2 (Collision resolution) Next class: Collision resolution (continued) Chapter 18.4.6 (Chaining) March 07, 2018 Cinda Heeren / Geoffrey Tien