Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,

Similar presentations


Presentation on theme: "Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,"— Presentation transcript:

1 Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University, Japan) Koji YAMAMOTO (Renesas Design Corporation, Japan) Yasuto KURODA, Kazunari INOUE (Renesas Electronics Corporation, Japan) 1

2 Outline Background Objective Proposed hardware architecture Hardware architecture evaluation FPGA implementation Hardware evaluation Conclusion 2

3 What is TCAM? TCAM = Ternary Content Addressable Memory Feature – Very high speed searching – Input data for matching, output memory address – 3 rd matching state of “don’t care” in addition 1s and 0s Application – Looking up the routing table in IP routers Input 192.168.101.1 Output 3 Addr.Prefix 1192.168.*.* 2192.168.100.* 3192.168.101.* …… Routing table 3

4 TCAM problems Manufacturing cost – $/bit is 4 times more expensive than SRAM. Power consumption – All logical gates must be energized for every search. Capacity – Expensive price-per-bit-ratio and power-saving activities – Hard to pursue denser TCAM 4 Search performance Manufacturing cost Power Consumption Capacity Requirements HighLow High TCAM High Low

5 Objective Propose a new hardware architecture – Focus on the address lookup in the routing table of routers – RAM-based design – Named “Custom Memory” Hardware design of the Custom Memory Verify the effectiveness of the Custom Memory – Effectiveness of our architecture – Dramatically reduce its cost and power consumption Implementation to the FPGA 5 SpeedCostPowerCapacity Custom MemoryHighLow High TCAMHigh Low

6 Design concepts Divide the memory area into equal-sized tables – Low power RAM-based design – Low cost, low power, high capacity Lookup operation by single access – High search performance Same physical user interface as TCAM – Aim to replace the TCAM in the market 6 SpeedCostPowerCapacityInterface Custom MemoryHighLow HighSame as TCAM

7 Architectural overview 7 Command Address IP addr. Prefix Table #0 Table #1 ・・・ Search device #0 RAM Table # -1 ・・ ・ Search device #1 Search device #N Custom Memory Same physical user interface as TCAM Same physical user interface as TCAM RAM based design Divide into subtables Comparator

8 Search device partitioning How to decide a device to store? 8 Search device #0 (prefix length 8) ・・ ・ Search device #1 (prefix length 9) Search device #N (prefix length 32) Partitioning based on prefix length 6.0.0.0/8 24.128.0.0/9 62.30.0.0/16 112.63.240/20 184.128.191.0/24 232.95.225.1/32 Example

9 01011000 Table partitioning How to decide a table to store? – bits in prefix are extracted for “index bits”. – Remainder bits are stored. How to determine the index bits? 9 Extract last bits from prefix Example ( =8) 01101011 01011000 ・・・ empty 01001111 ・・・ 00110111 ・・・ empty ・・・ empty ・・・ empty RAM # -1 # -2 # 1 # 0 154.1.0.0/16 →10011010.00000001 Remainder bits Index bits Search device (prefix length 16) 10011010 # 1

10 Search operation 10 Search Command Destination IP Address Table #0 Table #1 ・・・ Search device (prefix length 8) RAM Table # -1 ・・ ・ Custom Memory Comparator Input-output controller Index calculator Destination IP Address Table # Search device (prefix length 9) Search device (prefix length 32) ・・・ LPM comparator Hit address

11 Evaluation of partitioning Which bits are better to use as index bits? – Distribution of table is affected to the cost. Evaluation metric – Maximum number of prefixes in the table 11 ・・・ RAM ・・・ Comp. ・・・ Comp. word lines comparators Table # # of prefixes in table Extract last bits from prefix

12 Effectiveness of indexing – Top k bits: using the top bits for index bits – proposal: using the last bits for index bits – bottom: ideal value (unrealizable) 12 Prefix length Max # of prefixes in table ( )

13 FPGA implementation ALTRA Stratix IV GX FPGA Development Kit Verilog-HDL Parameters – 4 search devices – 256 tables/device – 128 prefixes/table 13 Table #0 Table #1 ・・・ Search device #0 RAM Table # 255 Search device #1 Search device #3 Comparator Search device #2 128 prefixes

14 Hardware evaluation 14 Search performancePower consumption (mA) Chip area (ratio) Custom Memoryevery clock (125MHz)6.53 (52%)62% TCAMevery clock (360MHz)12.43100% RAM Comp. ・・・ RAM Comp. ・・・ RAM ・・・ RAM Comp. ・・・ Comp. TCAM ・・・ RAM ・・・ Comp. ・・・ Comp. word lines comparators RAM Custom Memory Operation area RAM (4k bits Array, Vdd=1.0V, Room Temp. 125Msps)

15 FPGA experiment – Examine the hardware operation – Use a raw data (BGP routing table) 15

16 Conclusion Design RAM-based fast forwarding engine – Hardware architecture – FPGA implementation Reduce the costs and power – 62% cost (compare with TCAM) – 52% power consumption (compare with TCAM) Future work – Implementation parameter optimization – Handling of the table overflow 16


Download ppt "Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,"

Similar presentations


Ads by Google