Open Addressing: Quadratic Probing

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
HASHING COL 106 Shweta Agrawal, Amit Kumar
Fixed-Point Iteration Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007 by Douglas Wilhelm.
MATH 212 NE 217 Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada Copyright © 2011.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Binary Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007 by Douglas Wilhelm Harder.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Problems with Floating-Point Representations Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright.
Hashing as a Dictionary Implementation Chapter 19.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Decimal Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007 by Douglas Wilhelm Harder.
Double-Precision Floating-Point Numbers Douglas Wilhelm Harder Department of Electrical and Computer Engineering University of Waterloo Copyright © 2007.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
CSE 373 Data Structures and Algorithms Lecture 17: Hashing II.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
1 Complete binary trees Outline Introducing complete binary trees –Background –Definitions –Examples –Logarithmic height –Array storage.
Fundamental Structures of Computer Science II
Hashing.
Data Structures Using C++ 2E
Slides by Steve Armstrong LeTourneau University Longview, TX
AVL Trees.
Hashing - resolving collisions
Outline Introducing perfect binary trees Definitions and examples
Hash Tables (Chapter 13) Part 2.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Quadratic probing Double hashing Removal and open addressing Chaining
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Design and Analysis of Algorithms
Hash Table.
Hash Table.
Hash Tables.
Open addressing.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Double hashing Removal (open addressing) Chaining
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Collision Resolution Neil Tang 02/21/2008
Hash Maps Introduction
DATA STRUCTURES-COLLISION TECHNIQUES
Collision Resolution: Open Addressing Extendible Hashing
CSE 373: Data Structures and Algorithms
Presentation transcript:

Open Addressing: Quadratic Probing ECE 250 Data Structures and Algorithms Open Addressing: Quadratic Probing Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Copyright © 2006 by Douglas Wilhelm Harder. All rights reserved.

Outline Problems with linear problem and primary clustering Outline of quadratic probing insertions, searching restrictions deletions weaknesses

Quadratic Probing Primary clustering occurs with linear probing because the same linear pattern: if a bin is inside a cluster, then the next bin must either: also be in that cluster, or expand the cluster Instead of searching forward in a linear fashion, consider searching forward using a quadratic function

Quadratic Probing Suppose that an element should appear in bin h: if bin h is occupied, then check the following sequence of bins: h + 12, h + 22, h + 32, h + 42, h + 52, ... h + 1, h + 4, h + 9, h + 16, h + 25, ... For example, with M = 17:

Quadratic Probing If one of h + i2 falls into a cluster, this does not imply the next one will

Quadratic Probing For example, suppose an element was to be inserted in bin 23 in a hash table with 31 bins The sequence in which the bins would be checked is: 23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0

Quadratic Probing Even if two bins are initially close, the sequence in which subsequent bins are checked varies greatly Again, with M = 31 bins, compare the first 16 bins which are checked starting with 22 and 23: 22 22, 23, 26, 0, 7, 16, 27, 9, 24, 10, 29, 19, 11, 5, 1, 30 23 23, 24, 27, 1, 8, 17, 28, 10, 25, 11, 30, 20, 12, 6, 2, 0

Quadratic Probing Thus, quadratic probing solves the problem of primary clustering Unfortunately, there is a second problem which must be dealt with Suppose we have M = 8 bins: 12 ≡ 1, 22 ≡ 4, 32 ≡ 1 In this case, we are checking bin h + 1 twice having checked only one other bin

Quadratic Probing Unfortunately, there is no guarantee that h + i2 mod M will cycle through 0, 1, ..., M – 1 Solution: require that M be prime in this case, h + i2 mod M for i = 0, ..., (M – 1)/2 will cycle through exactly (M + 1)/2 values before repeating

Quadratic Probing Example with M = 11: With M = 13: With M = 17: 0, 1, 4, 9, 16 ≡ 5, 25 ≡ 3, 36 ≡ 3 With M = 13: 0, 1, 4, 9, 16 ≡ 3, 25 ≡ 12, 36 ≡ 10, 49 ≡ 10 With M = 17: 0, 1, 4, 9, 16, 25 ≡ 8, 36 ≡ 2, 49 ≡ 15, 64 ≡ 13, 81 ≡ 13

Quadratic Probing Thus, quadratic probing avoids primary clustering Unfortunately, we are not guaranteed that we will use all the bins In reality, if the hash function is reasonable, this is not a significant problem until l approaches 1

Quadratic Probing For example, with a hash table with M = 19 using quadratic probing, insert the following random 3-digit numbers: 086, 198, 466, 709, 973, 981, 374, 766, 473, 342, 191, 393, 300, 011, 538, 913, 220, 844, 565 using the number modulo 19 to be the initial bin

Quadratic Probing The first two fall into their correct bin: 086 → 10, 198 → 8 The next already causes a collision: 466 → 10 → 11 The next four cause no collisons: 709 → 6, 973 → 4, 981 → 12, 374 → 13 Then another collision: 766 → 6 → 7

Quadratic Probing At this point, two clusters have appeared and the load factor is l = 0.42

Quadratic Probing The next three also go into their appropriat bin: 473 → 17, 342 → 0, 191 → 1 Then there is one more collision 393 → 13 → 14 and 300 falls into its correct bin: 300 → 15

Quadratic Probing With five more insertions, the load factor is is l = 0.68 with one large cluster:

Quadratic Probing At this point, insertions become more tedious: 011 → 11 → 12 → 15 → 1 → 8 → 17 → 9 538 → 6 → 7 → 10 → 15 → 3 913 → 1 → 2 220 → 11 → ⋅⋅⋅ → 9 → 3 → 18 844 → 8 → 9 → 12 → 17 → 5

Quadratic Probing To show how quadratic probing works, consider the addition of 583, starting in bin 6: The first four bins all fall within the same cluster, however, the fifth bin checked falls far outside the cluster

Quadratic Probing At this point, the array is almost full (bin 16 is open) and the load factor is l = 0.95 If we try to add the last number, 565 the sequence of bins checked is 14 → 15 → 18 → 4 → 11 → 1 → 12 → 6 → 2 → 0 which does not hit bin 16

Quadratic versus Linear Probing We can compare the number of probes required with that of linear probing: 086 → 10, 10 198 → 8 466 → 10 → 11 709 → 6 973 → 4 981 → 12 374 → 13 766 → 6 → 7 473 → 17 342 → 0 191 → 1 393 → 13 → 14 300 → 15 011 → 11 → 12 → 13 → 14 → 15 → 16 538 → 6 → 7 → 8 → 9 913 → 1 → 2 220 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 844 → 8 → 9 → 10 → 11 → 12 → 13 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 565 → 14 → 15 → 16 → 17 → 18 → 0 → 1 → 2 → 3 → 4 → 5

Quadratic Probing: Deletions We have seen how we can perform insertions – next is deletions With linear probing, if we deleted the contents of a bin, we had to search ahead to determine if any nodes had to be moved back easy with linear probing; we simply moved from bin to bin until an empty bin was located

Quadratic Probing: Deletions The nonlinear probing associated with quadratic probing does not allow us to do this efficiently For example, suppose we delete 466 which is currently in bin 11: The two other entries which pass through bin 11 were 011 and 220 We cannot (efficiently) find these entries

Quadratic Probing: Deletions Solution: associate with each bin a field which is either EMPTY, OCCUPIED, or DELETED

Quadratic Probing: Deletions Initially, all bins are initially marked EMPTY When a bin is filled, it is marked OCCUPIED If a bin is emptied (as a result of a remove), it is marked DELETED Note that a bin which is marked as being DELETED may once again be filled (and hence marked OCCUPIED)

Quadratic Probing: Deletions For example, given a hash table with M = 11 bins, enter the values 135 909 246 894 518 365 Bin 1 2 3 4 5 6 7 8 9 10 Entry Flag E

Quadratic Probing: Deletions The first three are straight-forward: 135 → 3 909 → 7 246 → 4 Bin 1 2 3 4 5 6 7 8 9 10 Entry 135 246 909 Flag E O

Secondary Clustering The phenomenon of primary clustering will not occur with quadratic probing However, if multiple items all hash to the same initial bin, the same sequence of numbers will be followed This is termed secondary clustering The effect is less significant than that of primary clustering

Secondary Clustering Secondary clustering may be a problem if the hash function does not produce an even distribution of entries One solution to secondary is double hashing: associating with each element an initial bin (defined by one hash function) and a skip (defined by a second hash function)

Example Insert the 6 elements 3, 107, 9, 119, 35, 112 into an initially empty hash table of size 11 using quadratic hashing Let the hash function be the number modulo 11

Example: Insertion 14, 107, 31, 119, 35, 112 The first three fall into bins 3, 8, and 9, respectively 1 2 3 4 5 6 7 8 9 10 14 107 31

Example: Insertion 14, 107, 31, 118, 35, 112 118 also falls into bin 8 (occupied) Thus, we check: 8 + 1 = 9 - occupied 8 + 4 = 1 - unoccupied 1 2 3 4 5 6 7 8 9 10 118 14 107 31

Example: Insertion 14, 107, 31, 118, 35, 112 34 falls into bin 1 which is occupied, thus we check: 1 + 1 = 2 - unoccupied 1 2 3 4 5 6 7 8 9 10 118 34 14 107 31

Example: Insertion 14, 107, 31, 118, 35, 112 112 falls into bin 2 which is now occupied, thus we check: 2 + 1 = 3 - occupied 2 + 4 = 6 - unoccupied 1 2 3 4 5 6 7 8 9 10 118 34 14 112 107 31

Example: Insertion At this point, the hash table is over half full We are no longer guaranteed that the insertion of a new element may be possible Solution: increase the size of the table (perhaps only after failing) Problem: the new size must, too, be prime 1 2 3 4 5 6 7 8 9 10 118 34 14 112 107 31

Example: Removal To remove an element, we must simply mark it as deleted In our example, removing 118, we begin in bin 8, and continue to check 9, and then 1 Mark that bin as having had an element deleted: 1 2 3 4 5 6 7 8 9 10 DEL 34 14 112 107 31

Example: Finding To find an element we start by checking the bin it should have initially been in, and then begin checking following quadratic probing until either: we find it, or we find a bin which is neither occupied or deleted 1 2 3 4 5 6 7 8 9 10 DEL 34 14 112 107 31

Example: Finding We find 14 in bin 3 We don’t find 34 in bin 1 (marked as deleted), so we check bin 1 + 1 = 2, and find it 1 2 3 4 5 6 7 8 9 10 DEL 34 14 112 107 31

Example: Finding We search for 19 in bin 8 Not finding it, we check: 8 + 1 = 9 - occupied 8 + 4 = 1 - deleted 8 + 9 = 6 - occupied 8 + 16 = 2 - occupied 8 + 25 = 0 - unoccupied: not found 1 2 3 4 5 6 7 8 9 10 DEL 34 14 112 107 31

Usage Notes These slides are made publicly available on the web for anyone to use If you choose to use them, or a part thereof, for a course at another institution, I ask only three things: that you inform me that you are using the slides, that you acknowledge my work, and that you alert me of any mistakes which I made or changes which you make, and allow me the option of incorporating such changes (with an acknowledgment) in my set of slides Sincerely, Douglas Wilhelm Harder, MMath dwharder@alumni.uwaterloo.ca