Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date:
1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.
IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.
1 MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup Author: Oğuzhan ERDEM Cüneyt F. BAZLAMAÇCI Publisher: ISCIS 2009 Presenter:
Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:
1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Scalable IPv6 Lookup/Update Design for High-Throughput Routers Authors: Chung-Ho Chen, Chao-Hsien Hsu, Chen -Chieh Wang Presenter: Yi-Sheng, Lin ( 林意勝.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Parallel-Search Trie-based Scheme for Fast IP Lookup
1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Authors: Yi Wang, Tian Pan, Zhian Mi, Huichen Dai, Xiaoyu Guo, Ting Zhang, Bin Liu, and Qunfeng Dong Publisher: INFOCOM 2013 mini Presenter: Chai-Yi Chu.
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
Authors: Haowei Yuan, Tian Song, and Patrick Crowley Publisher: ICCCN 2012 Presenter: Chai-Yi Chu Date: 2013/05/22 1.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
1 Towards Practical Architectures for SRAM-based Pipelined Lookup Engines Author: Weirong Jiang, Viktor K. Prasanna Publisher: INFOCOM 2010 Presenter:
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
Compact Trie Forest: Scalable architecture for IP Lookup on FPGAs Author: O˘guzhan Erdem, Aydin Carus and Hoang Le Publisher: ReConFig 2012 Presenter:
IP Address Lookup Masoud Sabaei Assistant professor
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.
PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.
Author: Heeyeol Yu and Rabi Mahapatra
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.
8.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Fragmentation External Fragmentation – total memory space exists to satisfy.
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.
Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.
DRES: Dynamic Range Encoding Scheme for TCAM Coprocessors 2008 YU-ANTL Lab Seminar June 11, 2008 JeongKi Park Advanced Networking Technology Lab. (YU-ANTL)
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Chapter 7: Main Memory CS 170, Fall Program Execution & Memory Management Program execution Swapping Contiguous Memory Allocation Paging Structure.
Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008
IP Routers – internal view
Statistical Optimal Hash-based Longest Prefix Match
Bloom Filters Very fast set membership. Is x in S? False Positive
Implementing an OpenFlow Switch on the NetFPGA platform
Packet Classification Using Coarse-Grained Tuple Spaces
Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Presentation transcript:

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09

Outline Introduction DLB-BF For IP Lookups Basic Architecture Further Improvements Ad Hoc Prefix Expansion Off-chip Prefix Table Optimization Performance Analysis Implementation Consideration 2

Introduction Given a set of n items and an m-bit array, each item sets up to k bits in the array using k independent hashes to index into the array. 3 Hash Function 1 Hash Function 2 Hash Function 3 Hash Function k …… Bloom Filter Prefix

IP Lookup using Bloom Filters (1/2) 4

IP Lookup using Bloom Filters (2/2) Multiple Bloom filters indicate false positives, the off-chip prefix table has to be searched multiple times (equal to the number of prefix lengths in the worst case). The number of prefixes in each prefix length group is highly variable and changes dynamically with routing updates. One-cycle lookups assume that the Bloom filters are implemented using k-port memory where k is the number of hash functions. As discussed earlier, this is impractical unless k is 2 or 3. 5

Basic Architecture (1/2) 6

Basic Architecture (2/2) If we use just one Bloom filter we do not have the prefix expansion problem nor the problem of managing memory allocation for Bloom filters that vary widely in the number of elements stored (from a few prefixes to several hundreds of thousands). It has a serious drawback in that it does not permit parallel searches and a sequential search is needed on each prefix length. 7

Distributed and Load Balanced Bloom Filters 8

Further Improvements 9 If each SRAM block receives no more than r access requests, the search on each DLB-BF can be done in just one clock cycle.

Ad Hoc Prefix Expansion (1/3) The real concern is that once a packet happens to hit a bad case (i.e. multiple false positives are generated), all subsequent packets having the same address prefix are also problematic unless a longer prefix hits a real match. We use an ad hoc prefix expansion scheme for this. When a packet causes several false positives and the longest false positive match is of length k, we extract the k-bit prefix of the packet’s IP address and insert it into the off-chip prefix table along with the next hop information. 10

Ad Hoc Prefix Expansion (2/3) For example, let us say we only allow two false positive matches, but a packet with address results in three false matches and the first (also the longest) false match happens at length of 24. To cope with this bad case, we insert a new “expanded” prefix /24 into the prefix table. This new prefix is associated with the same next hop information as the real matching prefix / /16 False Positive Next hop : A

Ad Hoc Prefix Expansion (3/3) Our prefix expansion scheme is invoked dynamically, generates only one new prefix, and is used only when absolutely necessary. We only need to insert one new prefix in the off-chip prefix table. The new prefix does not change the Bloom filter load nor affect its false positive probability. The actual false positive rate observed is not a function of the arrived packets but a function of the unique flows (i.e. unique destination IP addresses). 12

Off-chip Prefix Table Optimization (1/2) Each prefix is hashed using two hash functions and the prefix is stored in the less loaded bucket. Consequently, a prefix lookup needs to access the hash table two times using two hash functions. All prefixes stored in the two accessed buckets need to be compared to find the match. 13

Off-chip Prefix Table Optimization (2/2) Each lookup needs to access memory two times and each memory access takes two clock cycles, a 500MHz SRAM can support 125M lookups per second. This is a little short of the worst-case 150Mpps lookup rate required for 100GbE line cards. We can get around this problem in two ways:  We can use faster SRAM devices. A 600MHz SRAM device can satisfy the worst-case requirement.  We can use two 36 or 18-Mbit SRAM devices in parallel, with each addressed by a hash function. 14

Non-stop Forwarding First insert or delete the prefix to be updated from the off-chip hash table. Then modify the on-chip DLB-BFs. This can guarantee the error-free updates. For a prefix update, there is at most one memory access to each DLB-BF, and all the memory accesses can be conducted in parallel. So the impact to the system throughput is minimized. 15

Performance Analysis (1/3) Theorem 1: The SBF is identical to the DLB-BF in terms of the false positive probability. Theorem 2: The DLB-BF is identical to the PBF in its ideal configuration in terms of the false positive probability and the number of hash functions used. 16

Performance Analysis (2/3) Analyze the performance of partitioned DLB-BF implementation using multiple 2-port memory blocks. 17

Performance Analysis (3/3) Hash Table Bucket Load with 64K Buckets and 256 Hash Functions. 18

Comparison with TCAMs Compared to the TCAM solution, the DLB-BF algorithm has more than 3× cost advantage and 11× power advantage. The footprint of the TCAM-based solution is at least 3× larger than that of the DLB-BF algorithm. 19

Bloom Filter False Positive Rate With Fast Hash Function Summarizes the Bloom filter false positive rate with different m/n ratio and different number of hash functions(k). 20 m: buckets n:keys k: number of hash functions

DLB-BF Memory Port Scheduling When more than two requests are for a memory block, only the two requests for the top two longest prefixes are granted. The remaining requests are simply discarded. The simplified scheduler shows some preference to the longer prefixes. The requests for the top two prefix lengths, 32 and 30, are always granted, according to our scheduling strategy. 21

Prototype and Simulation The design is aimed at 40G line card with an IPv4 lookup rate of 60Mpps. Each one of the 16 DLB-BFs include 32 8K-bit 2- port memory blocks. The design uses 50% of the logic resource and 25% of the block memory resource. The synthesized circuit can run at 150MHz clock rate. The algorithm with a real Internet packet trace. For this case, we observed 38 false positive occurrences and a false positive rate of 1.4 × 10 −7. This means for seven million flows, only one can cause false positive with its first packet. 22