Presentation is loading. Please wait.

Presentation is loading. Please wait.

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

Similar presentations


Presentation on theme: "1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with."— Presentation transcript:

1 1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with negative results before accessing data. Allow false positive errors, as they only cost us an extra data access. Don ’ t allow false negative errors, because they result in wrong answers.

2 2Bloom Filters Bloom Filter [B70] Encoding an attribute a  U Maintain a Bit Vector V of size m Use k hash functions (h 1..h k ), h i : U  [1..m] Encoding: For item x, “ turn on ” bits V[h 1 (x)]..V[h k (x)]. Lookup: Check bits V[h 1 (i)]..V[h k (i)]. If all equal 1, return “ Probably Yes ”. Else “ Definitely No ”.

3 3Bloom Filters Bloom Filter 010001010000010 x h 1 (x)h 2 (x)h k (x) V0V0 V m-1 h 3 (x)

4 4Bloom Filters Bloom Errors 010001010000010 h 1 (x)h 2 (x)h k (x) V0V0 V m-1 h 3 (x) abcd x didn’t appear, yet its bits are already set

5 5Bloom Filters Error Estimation Assumption: Hash functions are perfectly random Probability of a bit being 0 after hashing all elements: Let p=e -kn/m, probability of a false positive is: Assuming we are given m and n, the optimal k is: k=#hash functions n=# keys m=vector length k random bits are of value 1

6 6Bloom Filters Bloom Filter Tradeoffs Three factors: m,k and n. Normally, n and m are given, and we select k. Small k –Less computations. –Actual number of bits accessed (nk) is smaller, so the chance of a “ step over ” is smaller too. –However, less bits (k) need to be stepped over to generate an error. For big k, the exact opposite holds. Not surprisingly, when k is optimal, the “ hit ratio ” (ratio of bits flipped in the array) is exactly 0.5 k=#hash functions n=# keys m=vector length

7 7Bloom Filters Summary Cache [FCAB00] Proxy servers maintain local cache to minimize expensive internet requests. Proxy must maintain an efficient lookup method into the cache. The lookup structure must be stored in DRAM for performance. Structure must be compact, as DRAM is expensive and is used for “ Hot Items ” storage and more. Pages are usually replaced in the cache using an LRU algorithm.

8 8Bloom Filters ICP – Request Handling Client Proxy Cache Proxy Cache Proxy Cache Proxy Cache Internet

9 9Bloom Filters Internet Cache Protocol (ICP) Allows for scaling-out when using proxies. Protocol that supports discovery and retrieval of documents from neighboring caches. Establish an hierarchy of proxy caches If page not found in local proxy cache, it searches for the page in neighboring proxies. If page not found anywhere, fetch it from the internet.

10 10Bloom Filters ICP – Request Handling Client Proxy Cache Proxy Cache Proxy Cache Proxy Cache Internet

11 11Bloom Filters Summary Cache Each proxy maintains a Bloom Filter representing its local cache. Also, it holds Bloom Filters representing caches of other proxies. Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced. ICP request is sent only to proxy who supposedly holds the requested document.

12 12Bloom Filters ICP – With Summary Cache Client Internet Proxy Cache Proxy Cache Proxy Cache Proxy Cache

13 13Bloom Filters Summary Cache – Bloom Filters To support deletions and updates, the proxy maintains the Bloom Filter and also an array of counters C, initially set to 0. The Bloom Filter is filled with the contents of the cache. Each bit in the BF is allowed 4 bits for its counter. On insert of item i, all C[h j (i)] are increased (to a maximum of 15). On deletion of item i, counters are decreased. When C[i] increases from 0 to 1, V[i] is turned on. When C[i] decreases from 1 to 0, V[i] is turned off.

14 14Bloom Filters Summary Cache – Bloom Filters Hashing scheme –Generate 128 bits using MD5 on the URL. –Divide to segments of M bits (usually 32) –Calculate modulus of segments by m, providing 128/M hash values (4, for 32 bit segments) –If 128 bits are not enough, calculate MD5 of URL concatenated with itself. Bloom Filter Exchange –Header contains MD5 properties, size of array. –If refresh rate is high, send only deltas. –Bit counts are internal and not exchanged. –Otherwise, send entire Bloom Filter.

15 15Bloom Filters Summary Cache - Errors False Misses –Document requested is cached at some remote proxy, but summary does not reflect that fact. –Hit ratio is reduce, a redundant internet access is performed. False Hits –Document is not at a remote proxy, but summary suggests that it is. –An Inter-Proxy query message is wasted. Remote Stale Hits –Document is cached at a remote proxy, but is stale. –Occurs in both ICP and Summary Cache. –Might not be a totally wasted effort, as delta compression can be used.

16 16Bloom Filters Implementation - Squid Squid – A publicly available web proxy cache software. http://www.squid-cache.org Summary Cache is implemented in Squid v1.1.14 A variation called cache digest is implemented in Squid 1.2b20


Download ppt "1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with."

Similar presentations


Ads by Google