An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
PROOF BY CONTRADICTION
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Data Structures Using C++ 2E
Cuckoo Filter: Practically Better Than Bloom
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Optimal Fast Hashing Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Hebrew Univ., Israel)
CPSC 335 Computer Science University of Calgary Canada.
Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.
Beyond Bloom Filters: Approximate Concurrent State Machines Michael Mitzenmacher Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Bloom Filters Kira Radinsky Slides based on material from:
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Look-up problem IP address did we see the IP address before?
Packet Level Algorithms Michael Mitzenmacher. Goals of the Talk Consider algorithms/data structures for measurement/monitoring schemes at the router level.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Advanced Algorithms for Massive Datasets Basics of Hashing.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Bloom filters Probability and Computing Randomized algorithms and probabilistic analysis P109~P111 Michael Mitzenmacher Eli Upfal.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
Hashing and Packet Level Algorithms
Data Structures Hashing Uri Zwick January 2014.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Hash Table March COP 3502, UCF.
Data Structures Using Java1 Chapter 8 Search Algorithms.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Hash Tables Universal Families of Hash Functions Bloom Filters Wednesday, July 23 rd 1.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structures Using C++
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Data Structures Using C++ 2E
The Variable-Increment Counting Bloom Filter
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Bloom filters Probability and Computing Michael Mitzenmacher Eli Upfal
Randomized Algorithms CS648
תרגול 8 Hash Tables ds162-ps08 11/23/2018.
Hash Tables – 2 Comp 122, Spring 2004.
Hashing Alexandra Stefan.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Bloom filters From Probability and Computing
What we learn with pleasure we never forget. Alfred Mercier
Hash Tables – 2 1.
Presentation transcript:

An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh Kumar

2 - Sailesh Kumar - 5/23/2015 Bloom Filter n Store a set S = {x 1,x 2,x 3,…x n } on some universe U, so that we are able to answer queries of the form: »Is x a member of S n Bloom Filter is a technique that can answer this »Small amount of space independent of element size »Constant query time »False positive probability (some probability of a wrong answer) n Alternative to hashing with some interesting trade- offs

3 - Sailesh Kumar - 5/23/2015 Bloom Filter X m-bit Array H1H1 H2H2 H3H3 H4H4 HkHk Bloom Filter

4 - Sailesh Kumar - 5/23/2015 Bloom Filter Y m-bit Array H1H1 H2H2 H3H3 H4H4 HkHk

5 - Sailesh Kumar - 5/23/2015 Bloom Filter X m-bit Array match H1H1 H2H2 H3H3 H4H4 HkHk

6 - Sailesh Kumar - 5/23/2015 Bloom Filter W m-bit Array Match (false positive) H1H1 H2H2 H3H3 H4H4 HkHk

7 - Sailesh Kumar - 5/23/2015 How many Hash Functions? n k = no. of hash functions n n = Total no. of elements n m = no. of bits in the array n Objective is to pick k so that we minimize the false positive prob. n It is fairly simple to derive that k = (ln 2)m/n »For opt. k, fpp is approx. (0.6185) m/n

8 - Sailesh Kumar - 5/23/2015 How many Hash Functions? m / n = 8 Opt k = 8 ln 2 = 5.5

9 - Sailesh Kumar - 5/23/2015 Counting Bloom Filter n Bloom filters do not support deletes »Use counting Bloom filter n Use counters instead of bits in the array »Instead of setting the bits, increment the counters n During query, if (counter > 0) implies the bit is set

10 - Sailesh Kumar - 5/23/2015 Counting Bloom Filter X m-counter Array H1H1 H2H2 H3H3 H4H4 HkHk Bloom Filter

11 - Sailesh Kumar - 5/23/2015 Bloom Filter Y m-counter Array H1H1 H2H2 H3H3 H4H4 HkHk Deletes are straightforward: Just decrement the counters

12 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n 4-bit counters ensures wvhp that counters do not overflow »4x increase in space compared to Bloom filter n Construct an alternative Bloom filter that is 2 times compact than CBF »Based upon d-left hashing and fingerprinting technique n We need to understand d-left hashing and fingerprinting

13 - Sailesh Kumar - 5/23/2015 Fingerprinting n Temporarily assume that we have a perfect hash function h »Use some random function to compute c-bit fingerprints »F() : U -> [2 c ] »False positive prob. = 1/2 c »2x compact than Bloom filter »Not easy to compute the perfect hash function h –Use near perfect hashing (d-left) Element 1Element 2Element 3Element 4Element 5 Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3) h

14 - Sailesh Kumar - 5/23/2015 d-left hashing n Use d equal sized tables n Use d different hash functions and chose bucket from each table n A bucket can store multiple elements n Store the element into least loaded bucket (break tie to left) n Interesting properties: »Very small maximum load O(log log n) »Maximum load is close to average load even for small d such as 4 »80% space utilization with d=4

15 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Use d-left hashing n d hash tables each containing B buckets »Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter n In order to store an element, we compute its fingerprint »Fingerprint consists of two components –Bucket index – [1, B] –Remainder – [1, R], thus log 2 R bits, stored explicitly »We use separate bucket index for each table but identical remainders »Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter

16 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter Element x H(x) = (3, 7), (4, 7) : we store element in first table 7 Element y H(y) = (1, 5), (5, 5) : we store element in first table 5 Element z H(z) = (1, 7), (4, 7) : we store element in second table 7 Now, if we try to delete x, we do not know whether fingerprint in table 1 or table 2 has to be removed

17 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Solve the problem by breaking the hash operating into 2 phases n 1st phase: compute a single true fingerprint n 2 nd phase: to obtain d locations, use permutations P 1, … P d n A permutation of a set is a one-to-one map of the set onto itself n This simple modification enables proper delete operations

18 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table. n Proof: »Suppose not. Then there is some element x ∈ S whose remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(f x ) = Pi(f y ) for i = j. »Since the Pi are permutations, we must have that f x = f y, so x and y share the same true fingerprint. »Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.

19 - Sailesh Kumar - 5/23/2015 Simulation Results n Target is fpp < n dlCBF configuration »d = 4 tables with 2048 buckets each »Each bucket has 8 cells »Target load = 0.75 (6 items per bucket) »14-bit fingerprint, r. »2-bit counter to handle identical fingerprints »Total size of structure = 2 20 bits. Total items = 3x2 14 n CBF configuration »13.5 counters per element (9 hash function) »For 3x2 14 elements, we will need 2.5x2 20 bits, 2.5 times dlCBF

20 - Sailesh Kumar - 5/23/2015 Questions?