Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Functions for Network Applications (II)

Similar presentations


Presentation on theme: "Hash Functions for Network Applications (II)"— Presentation transcript:

1 Hash Functions for Network Applications (II)
Yaxuan Qi NSLab, RIIT Tsinghua University

2 Outline Concept and Theory (1~2) Applications (3~4) Hash functions
Bloom Filters Applications (3~4)

3 Basic Idea Packet: header & payload. L3-L4 header
Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?

4 Technique Packet: header & payload. L3-L4 header
Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?

5 False Positive n: number of messages m: number of bloom bits
k: number of hash functions False Positive p(y是fp ) = p(y不属于X)*p(y对应的k个bits都是1) = p(y对应的k个bits都是1) 考虑对y对应的特定的k个bits, 都被set(由X引起)的概率 首先考虑1个指定bit被set(由X引起)的概率…

6 Math (I) Two potential assumptions: m: big enough… kn/m: constant…
n: number of messages m: number of bloom bits k: number of hash functions Two potential assumptions: m: big enough… kn/m: constant…

7 n: number of messages m: number of bloom bits k: number of hash functions In practice If the number of 0 bits in the array is substantially less than expected, then the probability of a false positive will be higher than the quantity f that we computed.

8 Optimal Number of Hash Functions
Given m and n minimizes f as a function of k Two competing forces k ?? (from view of search) more chances to find a 0 bit for an element that is not a match (from view of construction) increases the fraction of 0 bits in the array

9 Math (II) In practice, k must be an integer, and a smaller, suboptimal k might be preferred since this reduces the number of hash functions that have to be computed.

10 Optimization: Summary
Assumption We have good hash functions, look random. Given m bits for filter and n elements, choose number k of hash functions to minimize false positives: Let As k increases more chances to find a 0 but more 1’s in the array. Conclusion

11 Partial Bloom Filters The total number of bits is still m, but the bits are divided equally among the k hash functions. Each hash function has a range of m/k consecutive bit, make parallelization of array accesses. Packet: header & payload. L3-L4 header Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table? Though the probability of a false positive is actually always at least as large with this division, the difference is small...

12 Counting Bloom Filters: Idea

13 Counting Bloom filters: Implementation
4 bits is enough...

14 Compressed Bloom Filters: Problem

15 Compressed Bloom Filters: Motivation
Insight: Bloom filter is not just a data structure, it is also a message. If the Bloom filter is a message, worthwhile to compress it Further reduce traffic of URL exchanging Compressing bit vectors is easy. Arithmetic coding gets close to entropy. Can Bloom filters be compressed? Bloom filter looks like a random string

16 Compression: Technique

17 Compression: Results z/n = 8
Original Compressed At k = m (ln 2) /n, false positives are maximized with a compressed Bloom filter. Best case without compression is worst case with compression; compression always helps. Side benefit: Use fewer hash functions with compression; possible speedup (depend on the bottleneck: memory or link).

18 Bloom Filter vs. Perfect Hash
If the set X of n elements is fixed, one can find a perfect hash function for X plus a fully uniform random hash function Then build a table with n entries of j bits each Mapping each X to n j-bit index, thus the false positive is exactly (1/2)j . matches the lower bound of bloom filter: HOWEVER any change in the set X would require an expensive recomputation of a perfect hash function.

19 Bloom Filter: Tricks Union (combining two BFs)
The same m and the same hash functions Just OR the two bit vectors of the original Bloom filters Shrinking (halve a big BF) just OR the first and second halves together the highest order bit can be masked Intersection (estimation)

20 Applications

21 Questions?


Download ppt "Hash Functions for Network Applications (II)"

Similar presentations


Ads by Google