Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,

Slides:



Advertisements
Similar presentations
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
Advertisements

August 17, 2000 Hot Interconnects 8 Devavrat Shah and Pankaj Gupta
Fast Updating Algorithms for TCAMs Devavrat Shah Pankaj Gupta IEEE MICRO, Jan.-Feb
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Low Power TCAM Forwarding Engine for IP Packets Authors: Alireza Mahini, Reza Berangi, Seyedeh Fatemeh and Hamidreza Mahini Presenter: Yi-Sheng, Lin (
1 Author: Ioannis Sourdis, Sri Harsha Katamaneni Publisher: IEEE ASAP,2011 Presenter: Jia-Wei Yo Date: 2011/11/16 Longest prefix Match and Updates in Range.
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
An Efficient Hardware-based Multi-hash Scheme for High Speed IP Lookup Department of Computer Science and Information Engineering National Cheng Kung University,
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture Rabi Mahapatra Texas A&M University;
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Overview Introduction The Level of Abstraction Organization & Architecture Structure & Function Why study computer organization?
Applied research laboratory David E. Taylor Users Guide: Fast IP Lookup (FIPL) in the FPX Gigabit Kits Workshop 1/2002.
LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
GLOBECOM (Global Communications Conference), 2012
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
1 ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3.
Routing Prefix Caching in Network Processor Design Huan Liu Department of Electrical Engineering Stanford University
EE3A1 Computer Hardware and Digital Design
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
PARALLEL-SEARCH TRIE- BASED SCHEME FOR FAST IP LOOKUP Author: Roberto Rojas-Cessa, Lakshmi Ramesh, Ziqian Dong, Lin Cai Nirwan Ansari Publisher: IEEE GLOBECOM.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Author: Heeyeol Yu and Rabi Mahapatra
A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.
Memory-Efficient IPv4/v6 Lookup on FPGAs Using Distance-Bounded Path Compression Author: Hoang Le, Weirong Jiang and Viktor K. Prasanna Publisher: IEEE.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Fast Lookup for Dynamic Packet Filtering in FPGA REPORTER: HSUAN-JU LI 2014/09/18 Design and Diagnostics of Electronic Circuits & Systems, 17th International.
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Evaluating and Optimizing IP Lookup on Many Core Processors Author: Peng He, Hongtao Guan, Gaogang Xie and Kav´e Salamatian Publisher: International Conference.
On-Chip Logic Minimization Roman Lysecky & Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 DESIGN AND EVALUATION OF A PIPELINED FORWARDING ENGINE Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
IP Address Lookup Masoud Sabaei Assistant professor Computer Engineering and Information Technology Department, Amirkabir University of Technology.
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
Programmable Logic Devices
Packet Forwarding.
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
CS 31006: Computer Networks – The Routers
Transport Layer Systems Packet Classification
Network Core and QoS.
Jason Klaus Supervisor: Duncan Elliott August 2, 2007 (Confidential)
Scalable Memory-Less Architecture for String Matching With FPGAs
Packet Classification Using Coarse-Grained Tuple Spaces
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Ameer M.S. Abdelhadi*, Guy G.F. Lemieux+, and Lesley Shannon*
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Network Core and QoS.
Presentation transcript:

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University, Japan) Koji YAMAMOTO (Renesas Design Corporation, Japan) Yasuto KURODA, Kazunari INOUE (Renesas Electronics Corporation, Japan) 1

Outline Background Objective Proposed hardware architecture Hardware architecture evaluation FPGA implementation Hardware evaluation Conclusion 2

What is TCAM? TCAM = Ternary Content Addressable Memory Feature – Very high speed searching – Input data for matching, output memory address – 3 rd matching state of “don’t care” in addition 1s and 0s Application – Looking up the routing table in IP routers Input Output 3 Addr.Prefix *.* * * …… Routing table 3

TCAM problems Manufacturing cost – $/bit is 4 times more expensive than SRAM. Power consumption – All logical gates must be energized for every search. Capacity – Expensive price-per-bit-ratio and power-saving activities – Hard to pursue denser TCAM 4 Search performance Manufacturing cost Power Consumption Capacity Requirements HighLow High TCAM High Low

Objective Propose a new hardware architecture – Focus on the address lookup in the routing table of routers – RAM-based design – Named “Custom Memory” Hardware design of the Custom Memory Verify the effectiveness of the Custom Memory – Effectiveness of our architecture – Dramatically reduce its cost and power consumption Implementation to the FPGA 5 SpeedCostPowerCapacity Custom MemoryHighLow High TCAMHigh Low

Design concepts Divide the memory area into equal-sized tables – Low power RAM-based design – Low cost, low power, high capacity Lookup operation by single access – High search performance Same physical user interface as TCAM – Aim to replace the TCAM in the market 6 SpeedCostPowerCapacityInterface Custom MemoryHighLow HighSame as TCAM

Architectural overview 7 Command Address IP addr. Prefix Table #0 Table #1 ・・・ Search device #0 RAM Table # -1 ・・ ・ Search device #1 Search device #N Custom Memory Same physical user interface as TCAM Same physical user interface as TCAM RAM based design Divide into subtables Comparator

Search device partitioning How to decide a device to store? 8 Search device #0 (prefix length 8) ・・ ・ Search device #1 (prefix length 9) Search device #N (prefix length 32) Partitioning based on prefix length / / / / / /32 Example

Table partitioning How to decide a table to store? – bits in prefix are extracted for “index bits”. – Remainder bits are stored. How to determine the index bits? 9 Extract last bits from prefix Example ( =8) ・・・ empty ・・・ ・・・ empty ・・・ empty ・・・ empty RAM # -1 # -2 # 1 # /16 → Remainder bits Index bits Search device (prefix length 16) # 1

Search operation 10 Search Command Destination IP Address Table #0 Table #1 ・・・ Search device (prefix length 8) RAM Table # -1 ・・ ・ Custom Memory Comparator Input-output controller Index calculator Destination IP Address Table # Search device (prefix length 9) Search device (prefix length 32) ・・・ LPM comparator Hit address

Evaluation of partitioning Which bits are better to use as index bits? – Distribution of table is affected to the cost. Evaluation metric – Maximum number of prefixes in the table 11 ・・・ RAM ・・・ Comp. ・・・ Comp. word lines comparators Table # # of prefixes in table Extract last bits from prefix

Effectiveness of indexing – Top k bits: using the top bits for index bits – proposal: using the last bits for index bits – bottom: ideal value (unrealizable) 12 Prefix length Max # of prefixes in table ( )

FPGA implementation ALTRA Stratix IV GX FPGA Development Kit Verilog-HDL Parameters – 4 search devices – 256 tables/device – 128 prefixes/table 13 Table #0 Table #1 ・・・ Search device #0 RAM Table # 255 Search device #1 Search device #3 Comparator Search device #2 128 prefixes

Hardware evaluation 14 Search performancePower consumption (mA) Chip area (ratio) Custom Memoryevery clock (125MHz)6.53 (52%)62% TCAMevery clock (360MHz) % RAM Comp. ・・・ RAM Comp. ・・・ RAM ・・・ RAM Comp. ・・・ Comp. TCAM ・・・ RAM ・・・ Comp. ・・・ Comp. word lines comparators RAM Custom Memory Operation area RAM (4k bits Array, Vdd=1.0V, Room Temp. 125Msps)

FPGA experiment – Examine the hardware operation – Use a raw data (BGP routing table) 15

Conclusion Design RAM-based fast forwarding engine – Hardware architecture – FPGA implementation Reduce the costs and power – 62% cost (compare with TCAM) – 52% power consumption (compare with TCAM) Future work – Implementation parameter optimization – Handling of the table overflow 16