Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Massively Parallel Cuckoo Pattern Matching Applied For NIDS/NIPS  Author: Tran Ngoc Thinh, Surin Kittitornkun  Publisher: Electronic Design, Test and.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
Cuckoo Filter: Practically Better Than Bloom
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
1 Hashing, randomness and dictionaries Rasmus Pagh PhD defense October 11, 2002.
Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture Author: Jahangir Hasan, Srihari Cadambi, Venkatta Jakkula Srimat.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
1 A Heuristic and Hybrid Hash- based Approach to Fast Lookup Author: Gianni Antichi, Andrea Di Pietro, Domenico Ficara, Stefano Giordano, Gregorio Procissi,
Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Data Structures Hashing Uri Zwick January 2014.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
@ Carnegie Mellon Databases Inspector Joins Shimin Chen Phillip B. Gibbons Todd C. Mowry Anastassia Ailamaki 2 Carnegie Mellon University Intel Research.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Memory Compression Algorithms for Networking Features Sailesh Kumar.
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
Author: Heeyeol Yu and Rabi Mahapatra
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Packet Classification Using Dynamically Generated Decision Trees
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
The Variable-Increment Counting Bloom Filter
HEXA: Compact Data Structures for Faster Packet Processing
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Statistical Optimal Hash-based Longest Prefix Match
A Small and Fast IP Forwarding Table Using Hashing
Hash Functions for Network Applications (II)
Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang
An index-split Bloom filter for deep packet inspection
Presentation transcript:

Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network and Computer Applications  Presenter: Yuen-Shuo Li  Date: 2013/06/26 1

Introduction  Hash table  A data structure for fast lookups that associates a set of keys to a set of values.  Achieves constant O(1) average memory accesses of query, insert, and delete operations at moderate loads.  Due to the excellent average-case performance, it can be found widespread application in networking.  such as IP route lookup, packet classification, deep packet inspection, etc.  These applications are typically deployed in critical data paths of high-speed routers/switches. Hence, it must provide a better performance in both average and worst cases.

Motivation  Collision in hash table  it increases the access time and induces non-deterministic performance.  The well-known collision resolution policies have been proposed to maintain good average-case performance. Nevertheless, at high loads and frequent collisions, the worst-case performance degrades shapely and becomes highly non-deterministic.  The problem of non-determinism  it can considerably hurt the performance and scalability of hash table in the multi- threaded parallel systems.  Each thread performs the hash table lookup using the same algorithm, but has the different lookup time due to the non-determinism.  The slowest thread becomes a bottleneck and determines the overall throughput of these systems.  Hence, it is critical to keep the hash operations faster and more deterministic.

Motivation  The need of large memory in hash table  Due to large memory requirements, hash tables are often not stored in small high- speed memory (e.g. on-chip SRAMs), but in slow off-chip DRAMs.  In order to achieve high speeds and determinism, it is viable to minimize the memory and bandwidth requirements of hash tables.

Background  Multiple-choice hashing  A simple and efficient technique, which places each element in one of d≥2 possible buckets of the hash table.  It can ensure a more even distribution of elements among all the buckets than traditional schemes using a single hash function, which helps to reduce the average-case and worst-case costs of hash tables. Hash Table Key

Background  Bloom Filters  A simple space-efficient randomized data structures for representing a set to support fast approximate membership queries.  Bloom filters can be used to represent the summary as they are simple space- efficient data structures for fast membership query.  A standard Bloom Filter allows for easy insertion, but not deletion.  uses Counting Bloom Filter(CBF) The false positive probability n: element m: bits vector k: hash function

Background  As memory access is very expensive and scarce, these schemes leverage a small summary in on-chip memory to significantly reduce off-chip memory accesses to an underlying multiple-choice hash table.  On-chip Bloom filters can filter out most of unnecessary off-chip accesses, achieving better lookup performance Bloom Filter Hash Table on-chip off-chip

Background  Collision-free hashing  it is a promising way to combat the non-determinism and non-randomness.  This scheme hashes an element to a unique bucket in the hash table without any collision.  A Collision-free hashing scheme  it is a variant of multi-choice hashing.  allows an element to contain a few additional c bits called discriminator, and maps the element plus its discriminator by a single hash function to a possible bucket.  This scheme needs at least 2 c memory accesses per lookup to check for each query, incurring low throughput and large bandwidth requirements.

Background

 In this paper we employ the Cuckoo hashing scheme to construct a CHT.

Our approach  In this paper, we propose two approaches to constructing an efficient discriminator table for achieving fast and deterministic hash table lookup.  First scheme directly uses a single Bloom filter to construct a discriminator table. It can eliminate most of unnecessary off-chip memory accesses and enhance collision-free lookup performance. But it needs 2 c memory accesses.  Second, uses Discriminative Bloom Filters (DBFs). It is stored in on-chip memory, which can not only filter out irrelevant off-chip memory accesses but also identify a possible discriminator value for a queried element.  This scheme performs a single memory access per lookup, instead of 2 c memory accesses per lookup.

Direct approach using a single bloom filter  Direct Collision-free Hash Table(DCHT) is composed of a front-end on-chip Bloom filters and an underlying off-chip CHT.  The Bloom Filter is used to construct a discriminator table, eliminating most of unnecessary off-chip memory accesses to the underlying CHT.  For an irrelevant element that is not in CHT, the Bloom filter may drop its lookup, significantly reducing off-chip memory accesses to CHT.  But through the Bloom filter, DCHT still requires 2 c off-chip memory accesses to check for the element.  Due to the factor that the Bloom filter cannot identify a unique discriminator value for the element.  Due to the positives of Bloom filter, DCHT requires many additional off-chip memory accesses to validate the match, limiting the hash table lookup throughput.

Direct approach using a single bloom filter

 Bloomier Filter  This solution can provide a possible discriminator value for a queried element, resulting in one off-chip memory access per lookup.  However, this solution has the issues of large memory requirements and dynamically changed elements.  First, each bucket in the Bloomier Filter needs at least c bits to store a c-bit discriminator instead of a bit of the standard Bloom filter.  Second, the Bloomier Filter can only support a static set of elements.

Fast approach using DBF  This scheme uses a DBF and a CHT to implement a fast and deterministic hash table called Fast Collision-free Hash Table(FCHT).  Discriminative Bloom Filter(DBF)  DBF comprises an array of parallel Bloom filters organized by the discriminator instead of a single Bloom Filter.  DBF is used as a summary to construct an efficient discriminator table, which can not only eliminate most of unnecessary off-chip memory accesses, but also identify a possible discriminator value for a queried element.

Fast approach using DBF

 using Counting Bloom filter  To handle incremental updates of FCHT, we use an array of parallel CBFs other than standard Bloom filters to compose an on-chip DBF.  But, the use of CBF requires larger memory space.  There have been several techniques (Bonomi et al., 2006a, Bonomi et al., 2006b, Hua et al., 2008 and Ficara et al., 2008) proposed for reducing the space required, generally at the cost of additional computation and shuffling of memory, while still keeping constant worst-case time bounds on various primitive operations.Bonomi et al., 2006aBonomi et al., 2006bHua et al., 2008Ficara et al., 2008  Such efforts (Hua et al., 2008 and Ficara et al., 2008) have exploited the idea of hierarchical structure to compress a great deal of wasted space corresponding to zero counters.Hua et al., 2008Ficara et al., 2008  using Cuckoo hashing

Fast approach using DBF incremental update of FCHT

Fast approach using DBF  False positive probability analysis  DBF may produce multiple possible discriminator values for the element. Then, FCHT needs multiple additional memory accesses to the underlying CHT for finding the exact match.  expect number E:  The analysis of the Cuckoo hashing scheme (Pagh and Rodler, 2004) has shown that we can have a constant small value of c if M is slightly greater than n.  For example, if M=1.1n, then c=2 can ensure a perfect matching with high probability.  Recent work (Kumar et al., 2007 and Ficara et al., 2009) has also shown that when M=n, using O(log long n) bits of a discriminator can guarantee that a perfect hash table exists and it can support fast updates. f: false positive probability

Network application of DBFs  We explore two network functions using DBF in high-speed routers, including IP route lookup and deep packet inspection (DPI).  Parallel Bloom Filters (PBFs)  This solution consists of an on-chip PBF and an array of off-chip hash tables.  PBF is composed of an array of standard Bloom filters organized by the rule length, e.g. prefix length for IP route lookup, and signature string length for DPI.  According to the rule length, all rules in a database are partitioned into an array of subset, and each subset of rules with the same length is inserted into both a corresponding Bloom filter of PBF and an off-chip hash table.  One hash table with a single hash function corresponds to one on-chip Bloom filter of PBF for validating the match.

Network application of DBFs  To reduce off-chip memory accesses of the solution above, we propose a novel DBF-based architecture for high-speed IP route lookup and DPI.

Experimental results  There are two categories of experiments for performance evaluation.  In the first experiments, we synthesize a storage set that is inserted in a hash table, and a testing set for query on the hash table. The testing set contains 10-fold elements of a storage set. Each element is a 4-byte string that is randomly generated from a given alphabet {‘a’-‘z’,‘A’-‘Z’}. The testing set contains true elements of 20% to 80% that are stored in the storage set.  In the second experiments, we obtain a storage set of equal-sized IP prefixes and Snort signatures from real-world networks. We synthesize a testing set of IP addresses and payload strings for query, which contains true elements of 40% and 80% that are stored in the storage set.

Experimental results m: bucket size n: element size

Experimental results DCHT(16, 11) => m/n = 16, k = 11

Experimental results Update Overhead DeletionInsertion

Experimental results