Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:

Slides:



Advertisements
Similar presentations
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Advertisements

Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part C Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
DBMS 2001Notes 4.2: Hashing1 Principles of Database Management Systems 4.2: Hashing Techniques Pekka Kilpeläinen (after Stanford CS245 slide originals.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Cuckoo Filter: Practically Better Than Bloom
1 TCAM Razor: A Systematic Approach Towards Minimizing Packet Classifiers in TCAMs Department of Computer Science and Information Engineering National.
Author: Francis Chang, Wu-chang Feng, Kang Li Publisher: INFOCOM 2004 Presenter: Yun-Yan Chang Date: 2010/12/01 1.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Look-up problem IP address did we see the IP address before?
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
1 Performance Improvement of Two-Dimensional Packet Classification by Filter Rephrasing Department of Computer Science and Information Engineering National.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Bin Fan, David G. Andersen, Michael Kaminsky
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.
Deterministic Finite Automaton for Scalable Traffic Identification: the Power of Compressing by Range Authors: Rafael Antonello, Stenio Fernandes, Djamel.
HASHING PROJECT 1. SEARCHING DATA STRUCTURES Consider a set of data with N data items stored in some data structure We must be able to insert, delete.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
1 Lecture 21: Hash Tables Wednesday, November 17, 2004.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
PC-TRIO: A Power Efficient TACM Architecture for Packet Classifiers Author: Tania Banerjee, Sartaj Sahni, Gunasekaran Seetharaman Publisher: IEEE Computer.
Lossy Compression of Packet Classifiers Author: Ori Rottenstreich, J’anos Tapolcai Publisher: 2015 IEEE International Conference on Communications Presenter:
Packet Classification Using Dynamically Generated Decision Trees
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
LOP_RE: Range Encoding for Low Power Packet Classification Author: Xin He, Jorgen Peddersen and Sri Parameswaran Conference : IEEE 34th Conference on Local.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
SRD-DFA Achieving Sub-Rule Distinguishing with Extended DFA Structure Author: Gao Xia, Xiaofei Wang, Bin Liu Publisher: IEEE DASC (International Conference.
SILT: A Memory-Efficient, High-Performance Key-Value Store
Practical Multituple Packet Classification Using Dynamic Discrete Bit Selection Author: Baohua Yang, Fong J., Weirong Jiang, Yibo Xue, Jun Li Publisher:
LightFlow : Speeding Up GPU-based Flow Switching and Facilitating Maintenance of Flow Table Author : Nobutaka Matsumoto and Michiaki Hayashi Conference:
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
Reorganized and Compact DFA for Efficient Regular Expression Matching
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
A DFA with Extended Character-Set for Fast Deep Packet Inspection
SigMatch Fast and Scalable Multi-Pattern Matching
Indexing and Hashing Basic Concepts Ordered Indices
Parallel Processing Priority Trie-based IP Lookup Approach
Binary Prefix Search Author: Yeim-Kuan Chang
Memory-Efficient Regular Expression Search Using State Merging
A Small and Fast IP Forwarding Table Using Hashing
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A New String Matching Algorithm Based on Logical Indexing
EMOMA- Exact Match in One Memory Access
Compact DFA Structure for Multiple Regular Expressions Matching
A Hybrid IP Lookup Architecture with Fast Updates
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
Presentation transcript:

Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter: Yi-Hao Lai Date: 2015/10/14 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Introdution Many databases, caches, routers, and storage systems use approximate set membership tests to decide if a given item is in a (usually large) set, with some small false positive probability. The most widely-used data structure for this test is the Bloom filter, which has been studied extensively due to its memory e ffi ciency. A limitation of standard Bloom filters is that one cannot remove existing items without rebuilding the entire filter (or possibly introducing generally less desirable false negatives). National Cheng Kung University CSIE Computer & Internet Architecture Lab 2

Introdution We propose the Cuckoo filter, a practical data structure that provides four major advantages. 1. It supports adding and removing items dynamically 2. It provides higher lookup performance than traditional Bloom filters, even when close to full (e.g., 95% space utilized) 3. It is easier to implement than alternatives such as the quotient filter 4. It uses less space than Bloom filters in many practical applications, if the target false positive rate ε is less than 3%. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3

Bloom filter Provide a compact representation of a set of items that supports two operations: Insert and Lookup. A Bloom filter allows a tunable false positive rate ε so that a query returns either “definitely not”, or “probably yes”. The lower ε is, the more space the filter requires. National Cheng Kung University CSIE Computer & Internet Architecture Lab 4

Bloom filter (insert) National Cheng Kung University CSIE Computer & Internet Architecture Lab 5 Input: hash I: hash II: set: { 13 } { 13, 22 }

Bloom filter (lookup) National Cheng Kung University CSIE Computer & Internet Architecture Lab Input:16 hash I: hash II: { 13, 22, 6, 2 }set: definitely not probably yes

Bloom filter National Cheng Kung University CSIE Computer & Internet Architecture Lab 7

Bloom filter and Variants National Cheng Kung University CSIE Computer & Internet Architecture Lab 8

Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 9 A basic cuckoo hash table consists of an array of buckets where each item has two candidate buckets determined by hash functions h1(x) and h2(x). The lookup procedure checks both buckets to see if either contains this item. Support insert and delete.

Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 10 insert

Cuckoo Hash Tables National Cheng Kung University CSIE Computer & Internet Architecture Lab 11 Cuckoo hashing ensures high space occupancy because it refines earlier item-placement decisions when inserting new items. Most practical implementations of cuckoo hashing extend the basic description above by using buckets that hold multiple items. With proper configuration of cuckoo hash table parameters, the table space can be 95% filled with high probability.

Cuckoo Filter National Cheng Kung University CSIE Computer & Internet Architecture Lab 12 To improve hash table performance by an optimization called partial-key cuckoo hashing. To reduce the hash table size, each item is first hashed into a constant-sized fingerprint before inserted into this hash table. The basic unit of the cuckoo hash tables used for our cuckoo filters is called an entry. Each entry stores one fingerprint. The hash table consists of an array of buckets, where a bucket can have multiple entries.

Cuckoo Filter (insert) National Cheng Kung University CSIE Computer & Internet Architecture Lab 13

Cuckoo Filter (lookup) National Cheng Kung University CSIE Computer & Internet Architecture Lab 14

Cuckoo Filter (delete) National Cheng Kung University CSIE Computer & Internet Architecture Lab 15

Asymptotic Behavior National Cheng Kung University CSIE Computer & Internet Architecture Lab 16

Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 17

Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 18

Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 19

Minimum Fingerprint Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 20

Empirical Evaluation National Cheng Kung University CSIE Computer & Internet Architecture Lab 21 For the experiments, we varied the fingerprint size f from 1 to 20 bits. Random 64-bit keys are inserted to an empty filter until a single insertion relocates existing fingerprints more than 500 times

Space Optimization National Cheng Kung University CSIE Computer & Internet Architecture Lab 22 Although each entry of the hash table stores one fingerprint, not all entries are occupied. As a result, each item e ff ectively costs more to store than a fingerprint. The amortized space cost C for each item is

Optimal Bucket Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 23 Larger buckets improve table occupancy The load factor α is 50% when the bucket size b = 1, but increases to 84%, 95% or 98% respectively using bucket size b = 2, 4 or 8.

Optimal Bucket Size National Cheng Kung University CSIE Computer & Internet Architecture Lab 24

Semi-sorting Buckets This subsection describes a technique for cuckoo filters with b = 4 entries per bucket that saves one bit per item. Assume each bucket contains b = 4 fingerprints and each fingerprint is f = 4 bits. An uncompressed bucket occupies 4×4 = 16 bits. If we sort all four 4-bit fingerprints stored in this bucket, there are only 3876 possible outcomes in total. Precompute these values, each original bucket can be represented by a 12-bit index. National Cheng Kung University CSIE Computer & Internet Architecture Lab 25

Space and lookup cost National Cheng Kung University CSIE Computer & Internet Architecture Lab 26

Comparison with Bloom filter Space E ffi ciency Number of Memory Accesses Bloom filter: k = 2 when ε = 25%, but k is 7 when ε = 1% Value Association Maximum Capacity Limited Duplicates National Cheng Kung University CSIE Computer & Internet Architecture Lab 27

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 28

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 29

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 30

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 31

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 32

Experiment National Cheng Kung University CSIE Computer & Internet Architecture Lab 33