Download presentation
Presentation is loading. Please wait.
Published byNigel Stokes Modified over 9 years ago
1
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh Kumar
2
2 - Sailesh Kumar - 5/23/2015 Bloom Filter n Store a set S = {x 1,x 2,x 3,…x n } on some universe U, so that we are able to answer queries of the form: »Is x a member of S n Bloom Filter is a technique that can answer this »Small amount of space independent of element size »Constant query time »False positive probability (some probability of a wrong answer) n Alternative to hashing with some interesting trade- offs
3
3 - Sailesh Kumar - 5/23/2015 Bloom Filter X 1 1 1 1 1 m-bit Array H1H1 H2H2 H3H3 H4H4 HkHk Bloom Filter
4
4 - Sailesh Kumar - 5/23/2015 Bloom Filter Y 1 1 1 1 1 m-bit Array 1 1 1 H1H1 H2H2 H3H3 H4H4 HkHk
5
5 - Sailesh Kumar - 5/23/2015 Bloom Filter X 1 1 1 1 1 m-bit Array 1 1 1 match H1H1 H2H2 H3H3 H4H4 HkHk
6
6 - Sailesh Kumar - 5/23/2015 Bloom Filter W 1 1 1 1 1 m-bit Array 1 1 1 Match (false positive) H1H1 H2H2 H3H3 H4H4 HkHk
7
7 - Sailesh Kumar - 5/23/2015 How many Hash Functions? n k = no. of hash functions n n = Total no. of elements n m = no. of bits in the array n Objective is to pick k so that we minimize the false positive prob. n It is fairly simple to derive that k = (ln 2)m/n »For opt. k, fpp is approx. (0.6185) m/n
8
8 - Sailesh Kumar - 5/23/2015 How many Hash Functions? m / n = 8 Opt k = 8 ln 2 = 5.5
9
9 - Sailesh Kumar - 5/23/2015 Counting Bloom Filter n Bloom filters do not support deletes »Use counting Bloom filter n Use counters instead of bits in the array »Instead of setting the bits, increment the counters n During query, if (counter > 0) implies the bit is set
10
10 - Sailesh Kumar - 5/23/2015 Counting Bloom Filter X 1 1 1 1 1 m-counter Array H1H1 H2H2 H3H3 H4H4 HkHk Bloom Filter
11
11 - Sailesh Kumar - 5/23/2015 Bloom Filter Y 1 1 1 m-counter Array 1 1 1 H1H1 H2H2 H3H3 H4H4 HkHk 1 1 2 2 Deletes are straightforward: Just decrement the counters
12
12 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n 4-bit counters ensures wvhp that counters do not overflow »4x increase in space compared to Bloom filter n Construct an alternative Bloom filter that is 2 times compact than CBF »Based upon d-left hashing and fingerprinting technique n We need to understand d-left hashing and fingerprinting
13
13 - Sailesh Kumar - 5/23/2015 Fingerprinting n Temporarily assume that we have a perfect hash function h »Use some random function to compute c-bit fingerprints »F() : U -> [2 c ] »False positive prob. = 1/2 c »2x compact than Bloom filter »Not easy to compute the perfect hash function h –Use near perfect hashing (d-left) Element 1Element 2Element 3Element 4Element 5 Fingerprint(4)Fingerprint(5)Fingerprint(2)Fingerprint(1)Fingerprint(3) h
14
14 - Sailesh Kumar - 5/23/2015 d-left hashing n Use d equal sized tables n Use d different hash functions and chose bucket from each table n A bucket can store multiple elements n Store the element into least loaded bucket (break tie to left) n Interesting properties: »Very small maximum load O(log log n) »Maximum load is close to average load even for small d such as 4 »80% space utilization with d=4
15
15 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Use d-left hashing n d hash tables each containing B buckets »Note that a bucket contains multiple cells; a cell can store a fingerprint and a small counter n In order to store an element, we compute its fingerprint »Fingerprint consists of two components –Bucket index – [1, B] –Remainder – [1, R], thus log 2 R bits, stored explicitly »We use separate bucket index for each table but identical remainders »Use d-left insertion policy; augment fingerprint with counters; if fingerprint matches, then increment the counter
16
16 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter Element x H(x) = (3, 7), (4, 7) : we store element in first table 7 Element y H(y) = (1, 5), (5, 5) : we store element in first table 5 Element z H(z) = (1, 7), (4, 7) : we store element in second table 7 Now, if we try to delete x, we do not know whether fingerprint in table 1 or table 2 has to be removed
17
17 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Solve the problem by breaking the hash operating into 2 phases n 1st phase: compute a single true fingerprint n 2 nd phase: to obtain d locations, use permutations P 1, … P d n A permutation of a set is a one-to-one map of the set onto itself n This simple modification enables proper delete operations 12345 35124
18
18 - Sailesh Kumar - 5/23/2015 Improved Counting Bloom Filter n Claim. When deleting an element in the set, only one remainder corresponding to the element will exist in the table. n Proof: »Suppose not. Then there is some element x ∈ S whose remainder is stored in table j to be deleted and at the same time another element y ∈ S such that Pi(f x ) = Pi(f y ) for i = j. »Since the Pi are permutations, we must have that f x = f y, so x and y share the same true fingerprint. »Let x was inserted before y; in this case, when y is inserted, the counter in table j associated with the remainder of x would be incremented, contradicting our assumption.
19
19 - Sailesh Kumar - 5/23/2015 Simulation Results n Target is fpp < 0.002 n dlCBF configuration »d = 4 tables with 2048 buckets each »Each bucket has 8 cells »Target load = 0.75 (6 items per bucket) »14-bit fingerprint, r. »2-bit counter to handle identical fingerprints »Total size of structure = 2 20 bits. Total items = 3x2 14 n CBF configuration »13.5 counters per element (9 hash function) »For 3x2 14 elements, we will need 2.5x2 20 bits, 2.5 times dlCBF
20
20 - Sailesh Kumar - 5/23/2015 Questions?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.