Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
CSCE 3400 Data Structures & Algorithm Analysis
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Hashing. 2 Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Hashing. Searching Consider the problem of searching an array for a given value –If the array is not sorted, the search requires O(n) time If the value.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 48 Hashing.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Design and Analysis of Algorithms - Chapter 71 Hashing b A very efficient method for implementing a dictionary, i.e., a set with the operations: – insert.
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing. 2 Preview A hash function is a function that: When applied to an Object, returns a number When applied to equal Objects, returns the same number.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Data Structures Hashing Uri Zwick January 2014.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Disk Storage, Basic File Structures, and Hashing
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
ADSA: Hashing/ Advanced Data Structures and Algorithms Objectives – –introduce hashing, hash functions, hash tables, collisions, linear probing,
Hashing Hashing is another method for sorting and searching data.
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
Doctoral Dissertation Proposal: Acceleration of Network Processing Algorithms Sailesh Kumar Advisors: Jon Turner, Patrick Crowley Committee: Roger Chamberlain,
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
1 Chapter 7 Skip Lists and Hashing Part 2: Hashing.
H ASH TABLES. H ASHING Key indexed arrays had perfect search performance O(1) But required a dense range of index values Otherwise memory is wasted Hashing.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Hashing. Searching Consider the problem of searching an array for a given value If the array is not sorted, the search requires O(n) time If the value.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hashing.
Hashing.
Hashing.
Chapter 21 Hashing: Implementing Dictionaries and Sets
Resolving collisions: Open addressing
Hashing.
Hashing.
Algorithms: Design and Analysis
Hash Functions for Network Applications (II)
Hashing.
Hashing.
Hashing.
Hashing.
DATA STRUCTURES-COLLISION TECHNIQUES
Collision Resolution: Open Addressing Extendible Hashing
Lecture No.42 Data Structures Dr. Sohail Aslam.
Presentation transcript:

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley

2 - Sailesh Kumar - 4/30/2015 Problem Statement n How to implement deterministic hast tables n Near worst case O(1) deterministic performance n We are given with a small amount of on-chip memory n On-chip memory limited to 1-2 bytes per table entry n In this paper we tackle the above problem

3 - Sailesh Kumar - 4/30/2015 Hash Tables n Hash table uses a hash function which is used to index the table entries »hash("apple") = 5 hash("watermelon") = 3 hash("grapes") = 9 hash("cantaloupe") = 7 hash("kiwi") = 0 hash("mango") = 6 hash("banana") = 2 »hash("honeydew") = 2 n This is called collision »Now what kiwi banana Watermelon apple mango cantaloupe grapes Linear ProbingDouble Hashing Hash2(honeydew) = 3 honeydew Linear Chaining honeydew No. of keys mapped to a bucket is called collision chain length

4 - Sailesh Kumar - 4/30/2015 Performance Analysis n Average performance is O(1) n However, worst-case performance is O(n) n In fact the probability of collision chain > 1 is pretty high These keys will take twice time to be probed These will take thrice the time to be probed Pretty high probability that performance is half or three times lower

5 - Sailesh Kumar - 4/30/2015 Segmented Hashing n Uses power of multiple choices »has been proposed and used earlier by several authors n A N-way segmented hash »Logically divides the hash table array into N equal segments »Maps the incoming keys onto a bucket from each segment »Picks the bucket which is either empty or has minimum keys k i h( ) k i is mapped to this bucket k i+1 h( ) k i+1 is mapped to this bucket A 4-way segmented hash table 1 2

6 - Sailesh Kumar - 4/30/2015 Segmented Hash Performance n More segments improves the probabilistic performance »With 64 segments, probability of collision chain > 2 is nearly zero even at 100% load »More deterministic hash table performance

7 - Sailesh Kumar - 4/30/2015 An Obvious Deficiency n O(N) memory probes per query »Requires N times higher memory bandwidth n How to ensure an O(1) memory probes per query n Use Bloom filters implemented using small on-chip memory (filters out unnecessary memory accesses) n Before going further brief introduction of Bloom filters k i h( ) Every query requires 4 probes

8 - Sailesh Kumar - 4/30/2015 Bloom Filter X m-bit Array H1H1 H2H2 H3H3 H4H4 HkHk Bloom Filter

9 - Sailesh Kumar - 4/30/2015 Bloom Filter Y m-bit Array H1H1 H2H2 H3H3 H4H4 HkHk

10 - Sailesh Kumar - 4/30/2015 Bloom Filter X m-bit Array match H1H1 H2H2 H3H3 H4H4 HkHk

11 - Sailesh Kumar - 4/30/2015 Bloom Filter W m-bit Array Match (false positive) H1H1 H2H2 H3H3 H4H4 HkHk

12 - Sailesh Kumar - 4/30/2015 Adding per Segment Filters k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits We can select any of the above three segments and insert the key into the corresponding filter

13 - Sailesh Kumar - 4/30/2015 False Positive Rates n With Bloom Filters, there is likelihood of false positives »False positive means unnecessary memory accesses n With N segments, clearly the false positive rates will be at least N times higher »In fact, it will be even higher, because we have to also consider several permutations of false positives n We use Selective Filter Insertion algorithm, which reduces the false positive rates by several orders of magnitude

14 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k i h( ) k i can go to any of the 3 buckets h 1 (kiki ) h 2 (kiki ) h k (kiki ) : m b bits Insert the key into segment 4, since fewer bits are set. Fewer bits are set => lower false positive With more segments (or more choices), our algorithm sets far fewer bits in the Bloom filter

15 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Details n Greedy policy n For every arriving key n We choose the segment where minimum bits are set in the Bloom filter n We show that this leads to unbalanced segments »Reduced performance

16 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k 1 h( ) h 1 h

17 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k 2 h( ) h 1 h

18 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k 3 h( ) h 1 h

19 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k 4 h( ) h 1 h

20 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Algorithm k 5 h( ) h 1 h Reduced No. of choices

21 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Enhancement n Objective is to keep segments balanced n Might need to make sub-optimal choices at times n One way is to avoid the most loaded segment »Reduces number of choices by 1 n However, it leads to situations where two segments alternately leads n Things get complicated »More detailed version of algorithm can be found in paper

22 - Sailesh Kumar - 4/30/2015 Selective Filter Insertion Results

23 - Sailesh Kumar - 4/30/2015 Simulation Results n 64K buckets, 32 bits/entry Bloom filter. n Simulation runs for 500 phases. »During every phase, 100,000 random searches are performed. Between two phases, 10,000 random keys are deleted and inserted.

24 - Sailesh Kumar - 4/30/2015 Conclusion n We presented a way to implement »Hash tables with deterministic performance »We utilize small on-chip memory to achieve it »We also show that on-chip memory requirements are modest »Well within the Moore’s law »A 1M hash table for example needs 1-2MB of on-chip memory n Questions?