Hash Functions for Network Applications (II)

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Bloom Filters Kira Radinsky Slides based on material from:
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Look-up problem IP address did we see the IP address before?
1 Energy Efficient Multi-match Packet Classification with TCAM Fang Yu
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Hash Tables1 Part E Hash Tables  
Basic Data Structures for IP lookups and Packet Classification
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Fast Set Intersection in Memory Bolin Ding Arnd Christian König UIUC Microsoft Research.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Conjunctive Filter: Breaking the Entropy Barrier Daisuke Okanohara *1, *2 Yuichi Yoshida *1*3 *1 Preferred Infrastructure Inc. *2 Dept. of Computer Science,
Spatial Issues in DBGlobe Dieter Pfoser. Location Parameter in Services Entering the harbor (x,y position)… …triggers information request.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
Sets of Digital Data CSCI 2720 Fall 2005 Kraemer.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Copyright © Curt Hill Hashing A quick lookup strategy.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
Data Structures Using C++ 2E
Lecture 10 Hashing.
The Variable-Increment Counting Bloom Filter
Data Structures Using C++ 2E
Parallel Sorting Algorithms
Hash functions Open addressing
Advanced Associative Structures
Statistical Optimal Hash-based Longest Prefix Match
Bloom Filters Very fast set membership. Is x in S? False Positive
Network Applications of Bloom Filters: A Survey
Parallel Sorting Algorithms
Packet Classification Using Coarse-Grained Tuple Spaces
2018, Spring Pusan National University Ki-Joune Li
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
A Small and Fast IP Forwarding Table Using Hashing
Overview of Query Evaluation
Using decision trees to improve signature-based intrusion detection
Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang
NSLab Seminars 2005~2006: REVIEW
ECE 352 Digital System Fundamentals
Presentation transcript:

Hash Functions for Network Applications (II) Yaxuan Qi NSLab, RIIT Tsinghua University

Outline Concept and Theory (1~2) Applications (3~4) Hash functions Bloom Filters Applications (3~4)

Basic Idea Packet: header & payload. L3-L4 header Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?

Technique Packet: header & payload. L3-L4 header Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?

False Positive n: number of messages m: number of bloom bits k: number of hash functions False Positive p(y是fp ) = p(y不属于X)*p(y对应的k个bits都是1) = p(y对应的k个bits都是1) 考虑对y对应的特定的k个bits, 都被set(由X引起)的概率 首先考虑1个指定bit被set(由X引起)的概率…

Math (I) Two potential assumptions: m: big enough… kn/m: constant… n: number of messages m: number of bloom bits k: number of hash functions Two potential assumptions: m: big enough… kn/m: constant…

n: number of messages m: number of bloom bits k: number of hash functions In practice If the number of 0 bits in the array is substantially less than expected, then the probability of a false positive will be higher than the quantity f that we computed.

Optimal Number of Hash Functions Given m and n minimizes f as a function of k Two competing forces k ?? (from view of search) more chances to find a 0 bit for an element that is not a match (from view of construction) increases the fraction of 0 bits in the array

Math (II) In practice, k must be an integer, and a smaller, suboptimal k might be preferred since this reduces the number of hash functions that have to be computed.

Optimization: Summary Assumption We have good hash functions, look random. Given m bits for filter and n elements, choose number k of hash functions to minimize false positives: Let As k increases more chances to find a 0 but more 1’s in the array. Conclusion

Partial Bloom Filters The total number of bits is still m, but the bits are divided equally among the k hash functions. Each hash function has a range of m/k consecutive bit, make parallelization of array accesses. Packet: header & payload. L3-L4 header Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table? Though the probability of a false positive is actually always at least as large with this division, the difference is small...

Counting Bloom Filters: Idea

Counting Bloom filters: Implementation 4 bits is enough...

Compressed Bloom Filters: Problem

Compressed Bloom Filters: Motivation Insight: Bloom filter is not just a data structure, it is also a message. If the Bloom filter is a message, worthwhile to compress it Further reduce traffic of URL exchanging Compressing bit vectors is easy. Arithmetic coding gets close to entropy. Can Bloom filters be compressed? Bloom filter looks like a random string

Compression: Technique

Compression: Results z/n = 8 Original Compressed At k = m (ln 2) /n, false positives are maximized with a compressed Bloom filter. Best case without compression is worst case with compression; compression always helps. Side benefit: Use fewer hash functions with compression; possible speedup (depend on the bottleneck: memory or link).

Bloom Filter vs. Perfect Hash If the set X of n elements is fixed, one can find a perfect hash function for X plus a fully uniform random hash function Then build a table with n entries of j bits each Mapping each X to n j-bit index, thus the false positive is exactly (1/2)j . matches the lower bound of bloom filter: HOWEVER any change in the set X would require an expensive recomputation of a perfect hash function.

Bloom Filter: Tricks Union (combining two BFs) The same m and the same hash functions Just OR the two bit vectors of the original Bloom filters Shrinking (halve a big BF) just OR the first and second halves together the highest order bit can be masked Intersection (estimation)

Applications

Questions?