A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung.

Slides:

Advertisements

Similar presentations

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta

Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees

Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Network Algorithms, Lecture 4: Longest Matching Prefix Lookups George Varghese.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

Types of Algorithms.

Greedy Algorithms Amihood Amir Bar-Ilan University.

1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.

M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.

IP Routing Lookups Scalable High Speed IP Routing Lookups.

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

Data Compressor---Huffman Encoding and Decoding. Huffman Encoding Compression Typically, in files and messages, Each character requires 1 byte or 8 bits.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:

An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.

1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.

IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector Author: Hyesook Lim, Hyeong-gee Kim, Changhoon Publisher: IEEE TRANSACTIONS.

Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

CS 268: Lectures 13/14 (Route Lookup and Packet Classification) Ion Stoica April 1/3, 2002.

Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.

Parallel-Search Trie-based Scheme for Fast IP Lookup

1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,

1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.

PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET

IP Address Lookup Masoud Sabaei Assistant professor

8.4 paging Paging is a memory-management scheme that permits the physical address space of a process to be non-contiguous. The basic method for implementation.

Data : The Small Forwarding Table(SFT), In general, The small forwarding table is the compressed version of a trie. Since SFT organizes.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.

EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.

Higher Order Tries Key = Social Security Number.   9 decimal digits. 10-way trie (order 10 trie) Height

IP Address Lookup Masoud Sabaei Assistant professor

1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:

1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,

Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.

PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.

Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.

CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.

Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.

Dynamic Pipelining: Making IP-Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar Presented by Sailesh Kumar.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

Author : Masanori Bando and H. Jonathan Chao Publisher : INFOCOM, 2010 Presenter : Jo-Ning Yu Date : 2011/02/16.

Ofir Luzon Supervisor: Prof. Michael Segal Longest Prefix Match For IP Lookup.

IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.

Indexing Structures for Files and Physical Database Design

A DFA with Extended Character-Set for Fast Deep Packet Inspection

IP Routers – internal view

HEXA: Compact Data Structures for Faster Packet Processing

Statistical Optimal Hash-based Longest Prefix Match

Indexing and Hashing Basic Concepts Ordered Indices

Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)

Parallel Processing Priority Trie-based IP Lookup Approach

A Small and Fast IP Forwarding Table Using Hashing

Scalable Multi-Match Packet Classification Using TCAM and SRAM

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Presentation transcript:

A Small IP Forwarding Table Using Hashing Yeim-Kuan Chang and Wen-Hsin Cheng Dept. of Computer Science and Information Engineering National Cheng Kung University Tainan, Taiwan R.O.C.

Outline Introduction Existing Schemes Designed for Compaction  Techniques used  Performance Evaluation Proposed Data Structure using Hashing  Optimizations  Performance Conclusions

Introduction Increasing demand for high bandwidth on Internet  next generation routers: forward multiple millions of packets per second  Required fast routing table lookup

Introduction Routing table lookup problem  The most important operation in the critical path  Find the longest prefix that matches the input IP (from more than one matches) Solutions:  Reduces number of memory references  Reduce memory required the forwarding table fit in the application specific integrated circuit (ASIC) fit in the L1/L2 caches

Introduction: Binary Trie NGEHOADMICKBFLJ 00010*00010* 0001*0001* *001100* * * *01011* 01*01* 10*10* * * * * 1011*1011* 110*110* Routing table F is the Longest prefix match for IP = G E H K ON AB F I M J D C L

Existing Schemes Designed for Compaction Small Forwarding Table (Degermark et.al.)  Classified as The compressed 16-x scheme (Huang et.al.)  Classified as 16-x, x = 0 to 16 The Level Compressed Trie (Nilsson et.al.)  Variable stride Technique summary  Run length encoding  Local index of pre-allocation array

Memory Sizes # of segments (avg # of prefixes per segment) Sparse: 2,765 (2.9) Dense: 4,300 (25.5) Very dense: 586 (91.7) Maptable: 5.3K Base rray: 2K Code Word array: 8K # of nodes: 259,371 (4 bytes each) Base vector: 110,679 (16 bytes each) Prefix vector: 9,927 (12 bytes each) Next Hop vector: 255 (4 bytes each) Base array: 427 KB Compressed Bit-map: 427 KB CNHA: 37.9 KB # of nodes: 320,478 (7 bytes each) # of nodes: 222,334 (8 bytes each) # of prefixes: 112,286 (4 bytes each) # of blocks: 71,034 (64 bytes each) 5792 nodes of 256 entries (4 bytes each) # of prefixes: 120,635 (45 bits each) Statistics KB 2,859 KB 1,147 KB 2,446.8 KB 1,993 KB 1,104 KB 4,695 KB 5,792 KB KB Memory size # of level-1 pointers: 13,317 # of level-2 pointers: 461 (4 bytes each) Branch factor: 16 Fill factor: bit segmentation table: entries (4 byte each) 8-bit segmentation table N/A Segmentation SFT LC trie Compressed 16-x Binary trie BSD trie Binary range Multiway range Original 8888 Original table Scheme Routing table: 120,635 prefixes.  

Lookup latency 1/12 1/5 1/3 8/32 8/26 1/18 1/6 # of memory access (min/max) SFT LC trie Compressed 16-x Binary trie BSD trie Binary range Multiway range Scheme th percentile (cycles) th percentile (cycles) th percentile (cycles) Average lookup time (cycles) Average lookup time (  s)  

Other results for compressed 16-x Memory sizes of compressed 16-x is not proportional to the number of prefixes in the routing table  Number of prefixes with length longer than 24 Therefore, difficult to predict the required amount of memory for compressed 16-x Therefore, we will only compare our proposed scheme with SFT in this paper Oix-80kOix-120kOix-150k 89,088120,635151,511 1,838 Kbytes1,147 KB3,182 Kbytes

Proposed Data Structure The binary trie is the simplest data structure Reason for high memory usage in binary trie  space for the left and right pointers.  8 bytes for left and right pointers  1 byte for next port number,  a total of 9 bytes Example routing table: 120,635 entries  ~370K trie nodes in a binary trie. 370K x 9 bytes = 3,330K = 3.3 Mbytes

Pre-allocated array of trie nodes If all the trie nodes are organized in an array, we can use node indices of the array as pointers. the trie nodes are physically stored in a sequential array but are logically structured in the binary trie. Assuming no more than one million trie nodes,  20 bits are sufficient for a pointer.  40 bits are required for two pointers plus 8 bits for the next port number.  A total of 6 bytes is required for a trie node. 370K nodes needs around 2.2 Mbytes. Note: LC trie implementation used this idea.

Proposed Data Structure (clustering) Simple technique to reduce memory size: Clustering  Nodes in a cluster are layout in an array  Node size can be reduced by using only one global pointer and other pointers are local indices to the array corresponding to the cluster.

Proposed Data Structure (clustering) a cluster consists of only one trie node: Node size can be reduced by using only one global pointer. If both left and right children exist, two additional bits are sufficient to indicate whether left and right children exist. The node size is thus reduced from 48 (20*2+8) bits to 30 (20+2+8) bits. (Notice that the number of trie nodes remains the same.) Thus, for the binary trie of 370K trie nodes, the total memory requirement is now 1.4 Mbytes.

Proposed Data Structure (clustering) Generalization:  The nodes in a subtree of k levels are grouped into a cluster such that accessing any node inside the cluster is based on the local indices of the node inside the cluster.  Figure I shows the generic data structure for global and local nodes (GNode and LNode). Gnode and Lnode take 31+4k bits and 12+2k bits, respectively. The number N is of size one bit more than M because every node in the cluster can be local node and only the nodes in the bottom of k levels can be global node.

Proposed Data Structure (clustering) The number of local nodes increases when k increases.  What is the good choice for k in order to have minimum memory requirement for building the binary trie? The following table shows the memory requirements with different k using a large routing table. We can see that the best value for k is 3. In fact, k = 2 or 4 are also very good. The table also shows the number of internal nodes that are not prefixes, called non-prefix nodes.

Memory usage for clustering k levels of the binary trie # of Non-prefix nodes Memory Usage (Mbytes)

Multi-bit trie and Hashing Multi-bit trie: improve lookup performance but increasing required memory  Overhead: unused pointers Consider a list of eight 4-bit numbers, 0000, 0001, 0010, 0011, 0100, 1000, 1001, and A traditional hash function:  H(b n-1 … b 0 ) = ∑b i *2 i  a perfect hash, not minimal 0…00 2 n-1 …21 b i =0 b i =1

We can construct arrays  V 0 = [3,0,0,1] and  V 1 = [0,3,2,0]. We have  H(0000) = 4,  H(0001) = 3,  H(0010) = 6,  H(0011) = 5,  H(0100) = 7,  H(1000) = 1,  H(1001) = 0,  H(1011) = 2.

A near-minimal perfect hash function The form of the proposed hash function, H, for an n-bit number (b n-1 … b 0 ) is defined as H(b n-1 … b 0 ) = |∑ V b i [i] |, where b i = 0 or 1 and V 0 [n-1 … 0] and V 1 [n-1 … 0] are two precomputed arrays  at least one of V 0 [i] and V 1 [i] is zero and  the values of the non-zero elements are in the range of - MinSize to MinSize,where MinSize = min(H_Size - 1, 2 n-1 ).

A near-minimal perfect hash function Computing V 0 [n-1 … 0] and V 1 [n-1 … 0]: The construction algorithm for arrays V 0 and V 1 is based on exhaustive search. Briefly, for each cell in arrays V 0 and V 1, a number in range - MinSize to MinSize is tried one at a time until the hash values of keys are unique.

A near-minimal perfect hash function Step 1: sort the keys based on the frequencies of the occurrences of 0 or 1 starting from dimension 0 to n – 1. Now assume the keys are in the order of k 0, …, k m-1 after sorting. In the next two steps the keys are processed in this order.

A near-minimal perfect hash function Step 2: compute the cells in arrays V 0 and V 1 that the current key controls. The key, b n-1 … b 0, has the control on V bi [i] if V bi [i] is not yet controlled by one of preceding keys for i = n-1 to 0.

A near-minimal perfect hash function Step 3: Use the following rules to assign a number in range (- MinSize ) to MinSize to each cell controlled by the current key.  If the hash value, H(b n-1 … b 0 ), of the key is taken by preceding keys or larger than MinSize or smaller than -MinSize, every cell must be re- assigned a new number.  If no number can be found after exhausting all the possible numbers for the cells controlled by the key, we will backtrack to the previous key by reassigning it a new number and continues the same procedure

Analysis of the 4-bit hash table Since the building a hash uses the exhaustive search, it is not feasible for a large n. Therefore, we select the hash table of size n = 4 as the building block for creating a large routing table. We use the exhaustive search to check whether finding a minimal perfect hash function is possible for a set of N 4-bit numbers, where N = 1 to 16.

Analysis of the hash table We find out that the minimal perfect hash function exists for all cases except some rare cases when N = 10 or 11. The perfect hash function with the minimal hash size increased by one exists for these rare cases when N = 10 or 11.

Example: minimal perfect hash

Example: no minimal perfect hash The size of the hash table is one more than that of the minimal prefect hash table. 

Lager n The advantage is as follows. The size of the hash table of 8-bit addresses is 10 bytes (8x8+8+8 bits) and thus can fit in a cache block of 16 bytes or larger. However, the obvious drawback is the time- consuming computation of the hash table.  the existence of the minimal perfect hash can not be known in advance,  exponential compute time

4-bit building blocks Additional Data Structure for IP Lookup Store Count and prefix 15(4) 8(9) Numerical representation (1000, 4311)=20bits

The Routing table

Building and updating Similar to the 4-bit trie except the some internal nodes in each 4-bit subtrie are replaced by the recursive hash tables of 4-bit addresses. Maintain the 4-bit trie and then compute the corresponding hash tables and the associated pointer tables. When a prefix is deleted from or added in the routing table, the 4-bit trie is then updated and thus the corresponding part of the proposed routing table can be changed accordingly.

IP lookup performance Based on the distribution of prefix lengths, 99.9% of the routing entries have the prefix length less than or equal to 24. Thus, the number of the memory references is from 1 to 8. The hash tables on level-16 are used to compute the index of level-16 pointer array for accessing the level-24 hashing tables if needed. The operations are similar for level-24 hash tables and pointer table. Notice that we have counted twice of the memory references in accessing the level-16 and level-24 hash tables because accessing two hash tables of 4-bit addresses is needed.

Optimizations: Level compression Level compression  The level-8 pointer array can be combined with the left-most hash table of the level-16 hash table, as marked as ‘*a’ in Figure II.  Similarly, the level-16 pointer table can be combined with the left-most hash table of level-24 hash table, as marked as ‘*b’.  Additionally, the level-24 pointer table can be combined with level-32 port table, marked as ‘*c’, but with a possible waste of memory space. With these optimizations, the total number of memory accesses becomes 1 to 5.

Optimizations: Maptable Pre-computed hash tables by exhaustive search  217 different hash tables for all possible combinations of N= bit values  (1111, index to the array of pre-computed hash tables): 12 bits (1111, index to array of 16 pre-computed hashed values): avoid computing the hashed values One more memory access to maptable

Performance Evaluation Simulation methodology  Input traffic: randomly permutated IP parts of the prefixes in the routing table  Measure the lookup latency one IP at a time  Use the Pentium Instruction rdtsc to get the time in clock cycles

Performance Evaluation

Average lookup latencies in clock cycles

Performance Evaluation Proposed + LC Proposed + LC + Maptable SFT

Performance Evaluation Figure 8: IP lookup latency for Oix-120k routing table. Proposed + LC Proposed + LC+maptable Oix-120K

Conclusions and future work a new hash table to compact the routing table Hash tables of 4-bit addresses are used as the building blocks to construct hierarchical routing table. Simulations showed  Smaller than any existing schemes  Faster than most schemes except the compressed 16-x scheme Future work: hashing without 4-bit expansion and fast algorithm for n larger than 4

The End Thank you