Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,

Slides:



Advertisements
Similar presentations
Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Advertisements

Preliminaries Advantages –Hash tables can insert(), remove(), and find() with complexity close to O(1). –Relatively easy to program Disadvantages –There.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Hashing The Magic Container. Interface Main methods: –Void Put(Object) –Object Get(Object) … returns null if not i –… Remove(Object) Goal: methods are.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
MA/CSSE 473 Day 28 Hashing review B-tree overview Dynamic Programming.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Data Structures Using C++ 2E
Hash Tables (Chapter 13) Part 2.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Data Structures Using C++ 2E
Quadratic probing Double hashing Removal and open addressing Chaining
Advanced Associative Structures
Hash Table.
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Collision Resolution Neil Tang 02/18/2010
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Tree traversal preorder, postorder: applies to any kind of tree
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Collision Resolution Neil Tang 02/21/2008
Data Structures and Algorithm Analysis Hashing
DATA STRUCTURES-COLLISION TECHNIQUES
Hash Tables – 2 1.
Collision Resolution: Open Addressing Extendible Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College

Collision Resolution  Collision Resolution  Separate Chaining  Open Addressing  Linear Probing  Quadratic Probing  Double Hashing  Rehashing  Extendible Hashing

Open Addressing Invented by A. P. Ershov and W. W. Peterson in 1957 independently. Idea: Store collisions in the hash table. Table size - must be at least twice the number of the records

Open Addressing If collision occurs, next probes are performed following the formula: h i (x) = ( hash(x) + f(i) ) mod Table_Size where: h i (x) is an index in the table to insert x hash(x) is the hash function f(i) is the collision resolution function. i - the current attempt to insert an element

Open Addressing Problems with delete: a special flag is needed to distinguish deleted from empty positions. Necessary for the search function – if we come to a “deleted” position, the search has to continue as the deletion might have been done after the insertion of the sought key – the sought key might be further in the table.

Linear Probing f(i) = i Insert: If collision - probe the next slot. If unoccupied – store the key there. If occupied – continue probing next slot. Search: a) match – successful search b) empty position – unsuccessful search c) occupied and no match – continue probing. If end of the table - continue from the beginning

Key:A S E A R C H I N G E X A M P L E Hash S A E * A R C G H I N * E * * * * * X * * * A L M P * * * * * * E Example * - unsuccessful attempts

Linear Probing Disadvantage: “ Primary clustering” Large clusters tend to build up. Probability to fill a slot: ?? i filled slots slot a slot b slot a: (i+1)/M slot b: 1/M

Quadratic Probing Use a quadratic function to compute the next index in the table to be probed. The idea here is to skip regions in the table with possible clusters. f(i) = i 2

Quadratic Probing In linear probing we check the I-th position. If it is occupied, we check the I+1 st position, next I+2 nd, etc. In quadric probing, if the I-th position is occupied we check the I+1 st, next we check I+4 th, next I + 9 th, etc.

Double Hashing Purpose – to overcome the disadvantage of clustering. A second hash function to get a fixed increment for the “probe” sequence. hash2(x) = R - (x mod R) R: prime, smaller than table size. f(i) = i*hash2(x)

Rehashing Table size : M > N For small load factor the performance is much better, than for N/M close to one. Best choice: N/M = 0.5 When N/M > rehashing

Rehashing Build a second table twice as large as the original and rehash there all the keys of the original table. Expensive operation, running time O(N) However, once done, the new hash table will have good performance.

Extendible Hashing  external storage  N records in total to store,  M records in one disk block No more than two blocks are examined.

Extendible Hashing Idea: Keys are grouped according to the first m bits in their code. Each group is stored in one disk block. If some block becomes full, each group is split into two, and m+1 bits are considered to determine the location of a record.

Example 4 disk blocks, each can contain 3 records 4 groups of keys according to the first two bits directory

Example (cont.) New key to be inserted: Block2 is full, so we start considering 3 bits directory 000/ / /111 (still on same block)

Extendible Hashing Size of the directory : 2 D 2 D = O(N (1+1/M) / M) D - the number of bits considered. N - number of records M - number of disk blocks

Conclusion 1 Hashing is a search method, used when  sorting is not needed  access time is the primary concern

Conclusion 2 Time-space trade-off: No memory limitations – use the key as a memory address (minimum amount of time). No time limitations – use sequential search (minimum amount of memory) Hashing – gives a balance between these two extremes – a way to use a reasonable amount of both memory and time.

Conclusion 3 To choose a good hash function is a “black art”. The choice depends on the nature of keys and the distribution of the numbers corresponding to the keys.

Conclusion 4 Best course of action: separate chaining: if the number of records is not known in advance open addressing: if the number of the records can be predicted and there is enough memory available