1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,

Slides:

Advertisements

Similar presentations

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta

Advertisements

The Assembly Language Level

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

1 A TCAM-based solution for integrated traffic anomaly detection and policy filtering Author: Zhijun Wang, Hao Che, Jiannong Cao, Jingshan Wang Publisher:

Outline Introduction Related work on packet classification Grouper Performance Empirical Evaluation Conclusions.

A Ternary Unification Framework for Optimizing TCAM-Based Packet Classification Systems Author: Eric Norige, Alex X. Liu, and Eric Torng Publisher: ANCS.

BTrees & Bitmap Indexes

Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

1 Layered Interval Codes for TCAM-based Classification Author: Anat Bremler-Barr, David Hay, Danny Hendler Publisher: IEEE INFOCOM 2009 Presenter: Chun-Yi.

Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)

1 Memory-Efficient 5D Packet Classification At 40 Gbps Authors: Ioannis Papaefstathiou, and Vassilis Papaefstathiou Publisher: IEEE INFOCOM 2007 Presenter:

Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.

1 Partition Filter Set for Power- Efficient Packet Classification Authors: Haibin Lu, MianPan Publisher: IEEE GLOBECOM 2006 Present: Chen-Yu Lin Date:

Parallel-Search Trie-based Scheme for Fast IP Lookup

1 Range Encoding Cheng-Chien Su. 2 Outline DRES: Dynamic Range Encoding Scheme for TCAM Coprocessors  Hao Che, Zhijun Wang, Kai Zheng, Bin Liu  IEEE.

Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.

1 Performance Improvement of Two-Dimensional Packet Classification by Filter Rephrasing Department of Computer Science and Information Engineering National.

Two stage packet classification using most specific filter matching and transport level sharing Authors: M.E. Kounavis *,A. Kumar,R. Yavatkar,H. Vin Presenter:

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Martin Austin Motoyama 1 Randy H. Katz 1 1 EECS.

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

Hashing General idea: Get a large array

Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)

Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.

Chapter 9 Classification And Forwarding. Outline.

Existing Range Encoding Schemes Presenter: Kai-Yang, Liu Date: 2011/11/23.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Real-Time Concepts for Embedded Systems Author: Qing Li with Caroline Yao ISBN: CMPBooks.

Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory

CoPTUA: Consistent Policy Table Update Algorithm for TCAM without Locking Zhijun Wang, Hao Che, Mohan Kumar, Senior Member, IEEE, and Sajal K. Das.

Layered Interval Codes for TCAM-based Classification David Hay, Politecnico di Torino Joint work with Anat Bremler-Barr (IDC), Danny Hendler (BGU) and.

Authors: Yi Wang, Tian Pan, Zhian Mi, Huichen Dai, Xiaoyu Guo, Ting Zhang, Bin Liu, and Qunfeng Dong Publisher: INFOCOM 2013 mini Presenter: Chai-Yi Chu.

CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.

CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:

Packet Classifiers In Ternary CAMs Can Be Smaller Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison) Jia Wang.

Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:

1. Outline Introduction Related work on packet classification Grouper Performance Analysis Empirical Evaluation Conclusions 2/42.

EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

1 Fast packet classification for two-dimensional conflict-free filters Department of Computer Science and Information Engineering National Cheng Kung University,

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:

Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.

PC-TRIO: A Power Efficient TACM Architecture for Packet Classifiers Author: Tania Banerjee, Sartaj Sahni, Gunasekaran Seetharaman Publisher: IEEE Computer.

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

1 Bit Weaving: A Non-Prefix Approach to Compressing Packet Classifiers in TCAMs Author: Chad R. Meiners, Alex X. Liu, and Eric Torng Publisher: IEEE/ACM.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

1 Using Network Coding for Dependent Data Broadcasting in a Mobile Environment Chung-Hua Chu, De-Nian Yang and Ming-Syan Chen IEEE GLOBECOM 2007 Reporter.

1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Hierarchical packet classification using a Bloom filter and rule-priority tries Source : Computer Communications Authors : A. G. Alagu Priya 、 Hyesook.

Author : Lynn Choi, Hyogon Kim, Sunil Kim, Moon Hae Kim Publisher/Conf : IEEE/ACM TRANSACTIONS ON NETWORKING Speaker : De yu Chen Data :

1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

DRES: Dynamic Range Encoding Scheme for TCAM Coprocessors 2008 YU-ANTL Lab Seminar June 11, 2008 JeongKi Park Advanced Networking Technology Lab. (YU-ANTL)

Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.

Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Scalable Multi-Match Packet Classification Using TCAM and SRAM

Publisher : TRANSACTIONS ON NETWORKING Author : Haoyu Song, Jonathan S

Compact DFA Structure for Multiple Regular Expressions Matching

Worst-Case TCAM Rule Expansion

Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI

Presentation transcript:

1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers, 2008 Presenter: Chen – Yu Lin Date: July, 01, 2008

2 Outline Introduction and goal of DRES Rule implementation in TCAM Range encoding Encoded range update process Performance evaluation

3 Introduction and Goal of DRES(1/2) DRES is proposed to significantly improve the TCAM storage efficiency for range matching. A rule that involves multiple range fields will cause a multiplicative expansion of the rule expressed in TCAM. Our statistical analysis of real world rule databases shows that the TCAM storage efficiency can be as low 16% due to the existence of a significant number of rules with port ranges. Rule encoding: –Use a bit to represent a range in a field. Hence, each rule can be translated to a sequence of encoded bits.

4 Introduction and Goal of DRES (2/2) Search key encoding: –A search key based on the information extracted from the header is preprocessed to generate an encoded search key. Range selection: –Selects the ranges to be encoded to maximize the TCAM storage efficiency. Database update: –Minimize its impact on the rule matching process. The P 2 C rule encoding schemes are the most effective schemes as they can encode N nonoverlapping ranges using only log 2 (N+1) bits

5 Rule implementation in TCAM (1/4) TCAM Coprocessor: –It works as a look aside processor for packet classification on behalf of a network processing unit (NPU) or network processor. When a packet is to be classified, an NPU generates a search key based on the information extracted from the packet header and passes it to the TCAM coprocessor for classification.

6 Rule implementation in TCAM (2/4) Noncompact ranges: –Ranges that cannot be exactly implemented using one rule entry in a TCAM. –Ex : { >1023 } – it needs six rule entries to expressed. Compact ranges: –Ranges that can be exactly implemented in one rule entry in TCAM. –Ex : { <1024 } ********** 1024 ~ ~ ~ ~ ~ ~ 65535

7 Rule implementation in TCAM (3/4) R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7

8 Rule implementation in TCAM (4/4) L1 L bit 24 bit

9 Range encoding (1/12) The details in this section: –How rules and search key are encoded. –How ranges are selected for encoding. Subsections in this section: –Structure of encoded rule and encoded search key –TCAM-based search key encoding process –Dynamic range selection algorithm –Code vector and index vector encoding algorithm

10 Range encoding (2/12) Structures of encoded rule / search key(1/2) –Instead of replacing a rule field altogether by a sequence of code bits, we design a hybrid encoding approach for DRES. –The hybrid encoding approach retains all of the fields in a rule and appends a sequence of code bits, called the code vector. –Due to the slotted TCAM structure, there will usually be some free bits left in each rule entry. (24 free bits in our example)

11 Range encoding (3/12) Structures of encoded rule / search key(2/2) Rule No encoded range Any encoded range The code vector is wild carded and the rule itself remains unchanged. That field was wild carded and the corresponding code vector is encoded based on the encoding rules

12 Range encoding (4/12) TCAM-based search key encoding process(1/3) –Assume that m k ranges from the k th rule field (for k = 1,2,…,K) in a rule database are selected for encoding. –Then, K search key fields matching against the corresponding K range tables must be done to generate an index vector and, hence, an encoded search key. –We propose using the TCAM coprocessor itself for sequential search key encoding. –Note that each range in a range table must be represented by multiple TCAM entries, and the corresponding intermediate index vector must be duplicated for every entry belonging to the same range.

13 Range encoding (5/12) TCAM-based search key encoding process(2/3) K+1 tables are allocated in TCAM.

14 Range encoding (6/12) TCAM-based search key encoding process(3/3) –In summary, a rule table lookup with range encoding requires K range table lookups for search key encoding, plus one encoded rule table lookup. –We quantify the performance impact of using TCAM for sequential search key encoding. TCAM runs at 133 MHz, that is 133 million lookups / second. Wire-speed forwording at a 10 Gbps line rate, up to 31.3 million packet. Each packet allowed to have 133/31.3 = 4.28 TCAM lookups. –If both the source and destination port fields have range to be encoded, that is K = 2, each PF table matching requires 4 TCAM lookups.

15 Range encoding (7/12) Dynamic range selection algorithm(1/3) –We use the bitmap scheme to encode ranges, that is, each unique range is mapped to a unique bit. –Figure shows selecting m ranges for encoding out of n ranges. # subranges need to exactly implemented the range # rule entries to implement all of the rules that contain the range Encoding gain: # rule entries that can be eliminated if the range is encoded

16 Range encoding (8/12) Dynamic range selection algorithm(2/3) –(1) The value of E and G are calculated. Range with the maximum G is selected as the first range for encoding. Suppose that R 1 is selected. –(2) E and G for all of the ranges, except for R 1, are updated. Then, the range with the maximum G is chosen to be the second encoded range. The computational complexity for this algorithm is O(nm).

17 Range encoding (9/12) Dynamic range selection algorithm(3/3) –There are a total of n = 7 ranges in both destintion and source port fields. R1 = { }R2 = { } R3 = { }R4 = {> 1023} R5 = { } R6 = {>1023}R7 = { } m=3 Destination field

18 Range encoding (10/12) Code vector and Index vector encoding algorithm(1/3) –In this paper, the bit-map range encoding algorithms are fully leveraged for code vector and index vector encoding. –The most efficient BM algorithm is the P 2 C algorithm, which allows N ranges to be encoded by using only log 2 (N+1) bits in best case. –In BM, each bit in a code vector is assigned to a specific encoded range, which can come from any field in a rule. Suppose that the code vector has 8 bits and the i th is assigend to R i.. –The code vector for R 1 is 1*******.

19 Range encoding (11/12) Code vector and Index vector encoding algorithm(2/3) –Ranges from different fields must be encoded using different range table. –Similarly to the code vector, the i th bit in the index vector is assigned to range R i. –The encoding rules used to generate the index vectors are as follow 1. For R i, the i th bit in the index vector must be set to If R i is a subrange of R j, its index vector must have its j th bit set to R r1,r2,…,rn for n overlapping ranges. R r1, R r2,…,R rn needs to be expressed as a separate range if it is a new range other than any existing encoded ranges. 4. All other bits in the index vector must be set to The weight or match priority for a range is equal to the number of 1s in the corresponding index vector.

20 Range encoding (12/12) Code vector and Index vector encoding algorithm(3/3) Assume that NPU generates a search key sk = { , , 1025, 1028, 17}, the index vector of source port / destination field are { }, { }. The final index vector is { }.

21 Encoded range update process(1/9) The details in this section: –We propose a lock-free encoded range update algorithm. –Which allows the encoded range update and the search key / PF table lookup processes to occur simultaneously without impact the lookup performance. –The basic idea is to maintain consistent and error-free rule and range tables throughout the update process, thus eliminating that need for locking the tables. Subsections of this section –Encoding a newly selected range –Releasing encoded ranges –Encoded range update delay

22 Encoded range update process(2/9) Updating a TCAM database without locking may generate two possible types of incorrect TCAM lookups. –Erroneous: If a TCAM rule gets a match while the rule or its corresponding action is partially updated. –Inconsistent: When a match takes place in the middle of a database update process and there is no guarantee of table consistency until the process finishes. In general, each TCAM slot has a valid bit field associated with it. The key to avoiding erroneous lookup is to avoid directly overwriting rule fields and/or the corresponding action when that rule entry is active.

23 Encoded range update process(3/9) Any write operation for a rule/action over an existing rule/action must be decomposed into a write process including 3 operations: –Inactivate the rule. –Write thr rule/action. –Activate the rule. Any operations to move a rule-action pair to a new TCAM-associated memory location must be decomposed into a move process including: –Using a write process to write the pair to the new location. –Inactivate the rule at old location.

24 Encoded range update process(4/9) Encoding a newly selected range(1/2) –For a newly selected range to be encoded, the range that appeared in any rule in the original encoded rule table in the TCAM is exactly implemented. –In our algorithm, the range table is updated first, followed by the rule table updated. –Note that only the range table associated with the field to which the newly selected range belongs needs to be encoded.

25 Encoded range update process(5/9) Encoding a newly selected range(2/2) –There 2 steps for the range table update. 1. Consistently move the ranges and their index vectors from top(bottom) to bottom(top) while leaving the entries for the newly selected range and corresponding subranges empty. 2. Write the newly selected range, the associated subranges, and their index vectors to the preallocated locations in decreasing priority order.

26 Encoded range update process(6/9) Write L 2 to a new location L2L2 L2L2 Delete the rule entries at its old locations *Example of update a rule

27 Encoded range update process(7/9) Releasing encoded ranges(1/2) –If no free bit is left in the index and code vector, the encoded range with the least encoding gain is unencoded to release a free bit. –To unencode a range, the corresponding field in a rule with this encoded range needs to exactly implemented, which increases the number of rule entries in the table. –However, the increased number of rule entries must less than the reduced number of rule entries by encoding a newly selected range. –To release an encoded range, the rule table is updated first, followed by the range table updated.

28 Encoded range update process(8/9) Releasing encoded ranges(2/2) –For the rule table update: Changes the encoded range into an exactly implemented range in all of the rule entries having this encoded range. –For the range table update: Both the encoded range and the derived subranges need to be deleted.

29 Encoded range update process(9/9) Encoded range update delay(1/1) –We only consider the rule table update delay for doing the encoded range update. –Assume there are N er rule entries in the rule table. –All of the rule entries in the table are moved once for adding a newly encoded range and once for releasing an encoded range. Hence, the number of rule entry writes and deletes is 2N er for each encoded range update. Assume write and delete cost 100ns For a table with rule entries, the update delay is 0.02 seconds.

30 Performance evaluation (1/5) The performance of DRES is evaluated and compared with Liu’s algorithm, called the CE algorithm, based on four real-world five-tuple PF databases.

31 Performance evaluation (2/5) Frequency of source port, destination port, and in both port. # subrange to exactly implemented the range

32 Performance evaluation (3/5) Each rule entry has 24 free bits, which is much larger than 7, the maximum number of unique ranges found in the four database. Hence, no extra slot is needed for range encoding. For the four databases, the sizes of the range table in the source (destination) port are 12(12), 12(12), 29(20), 22(10) (in slot). In practice, due to the possible encoded range updates, a range table must be configured to be much larger than the maximum size 29. –Assume that 60 slots are allocated for each range table.

33 Performance evaluation (4/5) After encoding ranges in both source and destination fields in DRES, each rule takes one TCAM rule entry.

34 Performance evaluation (5/5) If only the source (destination) port range is encoded, the number of TCAM entries for the encoded rules of 4 databases are as follows: –389(389) –243(243) –516(762) –2124(1595) In summary, DRES can significantly improve the overall TCAM storage efficiency for range matching.