Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing CS 3358 Data Structures.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hash Table March COP 3502, UCF.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
1 Chapter 5 Hashing General ideas Methods of implementing the hash table Comparison among these methods Applications of hashing Compare hash tables with.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Data Structures Week 6 Further Data Structures The story so far  We understand the notion of an abstract data type.  Saw some fundamental operations.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
1 Symbol Tables The symbol table contains information about –variables –functions –class names –type names –temporary variables –etc.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Hashing Lecture 10. More Efficient Searching Question: Can searching be more efficient than logarithmic O(log n) significantly? Example: a set of students.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing - resolving collisions
Hashing Alexandra Stefan.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Hash Table.
Resolving collisions: Open addressing
CSE 326: Data Structures Hashing
Hashing Alexandra Stefan.
EE 312 Software Design and Implementation I
Hashing.
Data Structures and Algorithm Analysis Hashing
EE 312 Software Design and Implementation I
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing 1 Hashing

Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks, queues, … n Nonlinear ones: trees, graphs (relations between elements are explicit) n Now for the case ‘relation is not important’, but want to be ‘efficient’ for searching (like in a dictionary)!   for example, on-line spelling check in words … * Generalizing an ordinary array, n direct addressing! n An array is a direct-address table * A set of N keys, compute the index, then use an array of size N n Key k at k -> direct address, now key k at h(k) -> hashing * Basic operation is in O(1)! n O(n) for lists n O(log n) for trees * To ‘hash’ (is to ‘chop into pieces’ or to ‘mince’), is to make a ‘map’ or a ‘transform’ …

Hashing 3 Example Applications * Compilers use hash tables (symbol table) to keep track of declared variables. * On-line spell checkers. After prehashing the entire dictionary, one can check each word in constant time and print out the misspelled word in order of their appearance in the document. * Useful in applications when the input keys come in sorted order. This is a bad case for binary search tree. AVL tree and B+-tree are harder to implement and they are not necessarily more efficient.

Hashing 4 Hash Table * Hash table is a data structure that support Finds, insertions, deletions (deletions may be unnecessary in some applications) * The implementation of hash tables is called hashing n A technique which allows the executions of above operations in constant average time * Tree operations that requires any ordering information among elements are not supported findMin and findMax n Successor and predecessor n Report data within a given range n List out the data in order

Hashing 5 Breaking comparison-based lower bounds * 5

Hashing 6 * 6

Hashing 7 * Reducing space 7

Hashing 8 * Collision Resolution by Chaining 8

Hashing 9 Analysis of Hashing with Chaining * 9

Hashing 10 Hash Functions * 10

Hashing 11 Dealing with non-numerical Keys * Can the keys be strings? * Most hash functions assume that the keys are natural numbers n if keys are not natural numbers, a way must be found to interpret them as natural numbers

Hashing 12 * Decimal expansion: 523 = 5*10^2+2*10^1+3*10^0 * Hexadecimal: 5B3 = 5*16^2 + 11*16^1 + 3*16^0 * So what might be interpreted as for a string like ‘smith’?

Hashing 13 * Method 1: Add up the ASCII values of the characters in the string n Problems:  Different permutations of the same set of characters would have the same hash value  If the table size is large, the keys are not distribute well. e.g. Suppose m=10007 and all the keys are eight or fewer characters long. Since ASCII value <= 127, the hash function can only assume values between 0 and 127*8=1016

Hashing 14 * Method 2 n If the first 3 characters are random and the table size is 10,0007 => a reasonably equitable distribution n Problem  English is not random  Only 28 percent of the table can actually be hashed to (assuming a table size of 10,007) 27 2 a,…,z and space

Hashing 15 * Method 3 n computes n involves all characters in the key and be expected to distribute well

Hashing 16 * Collison resolution: Open Addressing 16

Hashing 17 Open Addressing * 17

Hashing 18 Linear Probing * f(i) =i n cells are probed sequentially (with wrap-around) n h i (K) = (hash(K) + i) mod m * Insertion: n Let K be the new key to be inserted, compute hash(K) n For i = 0 to m-1  compute L = ( hash(K) + I ) mod m  T[L] is empty, then we put K there and stop. n If we cannot find an empty entry to put K, it means that the table is full and we should report an error.

Hashing 19 Linear Probing Example * h i (K) = (hash(K) + i) mod m * E.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10 To insert 58, probe T[8], T[9], T[0], T[1] To insert 69, probe T[9], T[0], T[1], T[2]

Hashing 20 Quadratic Probing Example * f(i) = i 2 * h i (K) = ( hash(K) + i 2 ) mod m * E.g., inserting keys 89, 18, 49, 58, 69 with hash(K) = K mod 10 To insert 58, probe T[8], T[9], T[(8+4) mod 10] To insert 69, probe T[9], T[(9+1) mod 10], T[(9+4) mod 10]

Hashing 21 Quadratic Probing * Two keys with different home positions will have different probe sequences n e.g. m=101, h(k1)=30, h(k2)=29 n probe sequence for k1: 30,30+1, 30+4, 30+9 n probe sequence for k2: 29, 29+1, 29+4, 29+9 * If the table size is prime, then a new key can always be inserted if the table is at least half empty (see proof in text book) * Secondary clustering n Keys that hash to the same home position will probe the same alternative cells n Simulation results suggest that it generally causes less than an extra half probe per search n To avoid secondary clustering, the probe sequence need to be a function of the original key value, not the home position

Hashing 22 Double Hashing * To alleviate the problem of clustering, the sequence of probes for a key should be independent of its primary position => use two hash functions: hash() and hash2() * f(i) = i * hash2(K) n E.g. hash2(K) = R - (K mod R), with R is a prime smaller than m

Hashing 23 Double Hashing Example * h i (K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod m * f(i) = i * hash2(K); hash2(K) = R - (K mod R), * Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69 To insert 49, hash2(49)=7, 2 nd probe is T[(9+7) mod 10] To insert 58, hash2(58)=5, 2 nd probe is T[(8+5) mod 10] To insert 69, hash2(69)=1, 2 nd probe is T[(9+1) mod 10]

Hashing 24 Choice of hash2() * Hash2() must never evaluate to zero * For any key K, hash2(K) must be relatively prime to the table size m. Otherwise, we will only be able to examine a fraction of the table entries. n E.g.,if hash(K) = 0 and hash2(K) = m/2, then we can only examine the entries T[0], T[m/2], and nothing else! * One solution is to make m prime, and choose R to be a prime smaller than m, and set hash2(K) = R – (K mod R) * Quadratic probing, however, does not require the use of a second hash function n likely to be simpler and faster in practice

Hashing 25 Deletion in Open Addressing * Actual deletion cannot be performed in open addressing hash tables n otherwise this will isolate records further down the probe sequence * Solution: Add an extra bit to each table entry, and mark a deleted slot by storing a special value DELETED (tombstone) or it’s called ‘lazy deletion’.

Hashing 26 Re-hashing * If the table is full * Double the size and re-hash everything with a new hashing function

Hashing 27 Analysis of Open Addressing * 27

Hashing 28 Comparison between BST and hash tables 28 BSTHash tables Comparison-basedNon-comparison-based Keys stored in sorted orderKeys stored in arbitrary order More operations are supported: min, max, neighbor, traversal Only search, insert, delete Can be augmented to support range queries Do not support range queries In C++: std::mapIn C++: std::unordered_map