Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Tutorial 8 CSI 2132 Database I. Exercise 1 Both disks and main memory support direct access to any desired location (page). On average, main memory accesses.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Hashing Techniques.
Hashing. 2 Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Maps & Hashing Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Lecture 10: Search Structures and Hashing
Hashing. 2 Preview A hash function is a function that: When applied to an Object, returns a number When applied to equal Objects, returns the same number.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Copyright Curt Hill Balance in Binary Trees Impact on Performance.
Hashing Hashing is another method for sorting and searching data.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Hashing. Searching Consider the problem of searching an array for a given value If the array is not sorted, the search requires O(n) time If the value.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing.
Data Structures Using C++ 2E
Hashing.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hashing.
Advance Database System
Hashing.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Hashing.
Algorithms: Design and Analysis
Hashing.
Hashing.
Hashing.
Hashing.
Lecture No.42 Data Structures Dr. Sohail Aslam.
Presentation transcript:

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley

2 - Sailesh Kumar - 11/26/2015 Overview n Overview of Hash Tables and Segmented Hash Table n Analysis and Limitations »Increased memory references n Adding Bloom Filters per segment n Selective Filter Insertion Algorithm n Simulation Results and Analysis n Conclusion

3 - Sailesh Kumar - 11/26/2015 Hash Tables n Consider the problem of searching an array for a given value »If the array is not sorted, the search requires O(n) time »If the array is sorted, we can do a binary search –O(lg n) time »Can we do in O(1) time –It doesn’t seem like we could do much better

4 - Sailesh Kumar - 11/26/2015 Hash Tables n Suppose we were to come up with a “magic function” that, given a value to search for, would tell us exactly where in the array to look »If it’s in that location, it’s in the array »If it’s not in that location, it’s not in the array n If we look at the function’s inputs and outputs, they probably won’t “make sense” n This function is called a hash function because it “makes hash” of its inputs

5 - Sailesh Kumar - 11/26/2015 Hash Tables n How can we come up with this magic function? n In general, we cannot--there is no such magic function »In a few specific cases, where all the possible values are known in advance, it has been possible to compute a perfect hash function n What is the next best thing? »A perfect hash function would tell us exactly where to look »In general, the best we can do is a function that tells us where to start looking!

6 - Sailesh Kumar - 11/26/2015 Hash Tables n Suppose our hash function gave us the following values: »hash("apple") = 5 hash("watermelon") = 3 hash("grapes") = 8 hash("cantaloupe") = 7 hash("kiwi") = 0 hash("strawberry") = 9 hash("mango") = 6 hash("banana") = 2 »hash("honeydew") = 6 n This is called collision »Now what kiwi banana watermelon apple mango cantaloupe grapes strawberry

7 - Sailesh Kumar - 11/26/2015 Collision Resolution Policies n Linear Probing »Successively search for the first empty subsequent table entry n Linear Chaining »Link all collided entries at any bucket as a linked-list n Double Hashing »Uses a second hash function to successively index the table

8 - Sailesh Kumar - 11/26/2015 Performance Analysis n Average performance is O(1) n However, worst-case performance is O(n) n In fact the likelihood that a key is at a distance > 1 is pretty high These keys will take twice time to be probed These will take thrice the time to be probed Pretty high probability that throughput is half or three times lower than the peak throughput

9 - Sailesh Kumar - 11/26/2015 Segmented Hashing n Uses power of multiple choices »has been proposed earlier by Azar et al n A N-way segmented hash »Logically divides the hash table array into N equal segments »Maps the incoming keys onto a bucket from each segment »Picks the bucket which is either empty or has minimum keys k i h( ) k i is mapped to this bucket k i+1 h( ) k i+1 is mapped to this bucket A 4-way segmented hash table 1 2

10 - Sailesh Kumar - 11/26/2015 Segmented Hash Performance n More segments improves the probabilistic performance »With 64 segments, probability that a key is inserted at distance > 2 is nearly zero even at 100% load »Improvement in average case performance is still modest

11 - Sailesh Kumar - 11/26/2015 An obvious Deficiency n Even though distance of keys are one, every query requires at least N memory probes »Average probes are O(N) compared to O(1) of a naive table –If things are bandwidth limited, N times lower throughput n In order to ensure O(1) operations, segmented hash table uses on-chip Bloom filters »On-chip memory requirements are quite modest, 1-2 bytes per hash table bucket n Each segment has a Bloom filter, which supports membership queries »These on-chip filters are queried before actually making an off-chip hash table memory reference

12 - Sailesh Kumar - 11/26/2015 Adding per Segment Filters k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits We can select any of the above three segments and insert the key into the corresponding filter

13 - Sailesh Kumar - 11/26/2015 False Positive Rates n With Bloom Filters, there is likelihood of false positives »A filter might say that the key is present in its segment, while key is actually not present n With N segments, clearly the false positive rates will be at least N times higher »In fact, it will be even higher, because we have to also consider several permutations of false positives n We propose Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitudes

14 - Sailesh Kumar - 11/26/2015 Selective Filter Insertion Algorithm k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits Insert the key into segment 4, since fewer bits are set. Fewer bits are set => lower false positive With more segments (or more choices), our algorithm sets far fewer bits in the Bloom filter

15 - Sailesh Kumar - 11/26/2015 Selective Filter Insertion Results

16 - Sailesh Kumar - 11/26/2015 Selective Filter Insertion Details n First we build the set of segments where the arriving key can be inserted, we call it {minSet} »i.e. these segments will have minimum and equal collision chain length at the corresponding hash index n A naive or greedy algorithm will choose the segment, where least number of bits are set in the Bloom filter »Leads to unbalanced segments »An already loaded segment is likely to receive further keys because its filter array is more likely to have fewer transitions »Our simulations suggest that an enhancement in the insertion algorithm reduces the false positive further by up to an order of magnitude

17 - Sailesh Kumar - 11/26/2015 Selective Filter Insertion Enhancement n Our aim is to try to keep the segments balanced while also trying to reduce the bit transitions in the Bloom filters 1. Label segments in the set {minSet} eligible if its occupancy is less than (1+δ) times the occupancy of the least occupied segment. Parameter δ is typically set at 0.1 to If no segment remains eligible, select the least occupied segment from {minSet} 3. Otherwise choose a segment from {minSet}, which has minimum bit transitions 4. If multiple such segments exist, choose the least occupied one 5. If multiple such segments are again found, break the tie with a round-robin arbitration policy

18 - Sailesh Kumar - 11/26/2015 Simulation Results n 64K buckets, 32 bits/entry Bloom filter. n Simulation runs for 500 phases. »During every phase, 100,000 random searches are performed. Between every phase 10,000 random keys are deleted and inserted.

19 - Sailesh Kumar - 11/26/2015 Effectiveness of Modified Bloom Filters n Plotting average memory references at different successful search rates. »Lower memory references reflects the effectiveness of filters. Load is kept at 80%.

20 - Sailesh Kumar - 11/26/2015 Questions?