Presentation is loading. Please wait.

Presentation is loading. Please wait.

1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern Supercomputers Bob Newhall & Dan Lenoski Cisco Systems, Routing.

Similar presentations


Presentation on theme: "1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern Supercomputers Bob Newhall & Dan Lenoski Cisco Systems, Routing."— Presentation transcript:

1 1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern Supercomputers Bob Newhall & Dan Lenoski Cisco Systems, Routing Technology Group NORDUnet 2003, Reykjavik – August 2003

2 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 222 Agenda Traditional Routers and Supercomputers Modern Routers and Supercomputers Comparison of Subsystems Conclusions

3 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 333 What’s a Router? Traditionally… PCI Bus1 PCI Bus2 PA-6 PA-4 PA-2 PA-5 PA-3 PA-1 I/O Bus PCI Bus0 ROM Flas h NVRAM Con/Aux PB FE PCMCIA-2 CPU Bus PB System Controller System Controller SDRAM (256 MB) SDRAM (256 MB) CPU MIPS CPU MIPS Secondary Cache SRAM Secondary Cache SRAM PCMCIA-1 Architecturally, routers have been like normal computers except: - Mechanical form factors, especially for IO - Embedded forwarding and routing SW

4 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 444 What’s a Supercomputer? Traditionally… Cray Y-MP 250 Gbyte/sec of interconnect bandwidth Cray Y-MP C90

5 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 555 Evolution of High-End Routers Increasing bandwidth of external connections:  T1 -> DS3 -> OC3 -> OC12 -> OC48 -> OC192 -> OC768  1mbit/sec -> 40 gbit/sec Line speed increases require changes in router architecture to remove the central memory bottleneck and replace with distributed memories and central interconnect fabric

6 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 666 Evolution of High-End Routers Increased computational power for routing, forwarding and feature processing Larger systems (more line cards) desired by end customers to exploit DWDM capabilities and simplify operation of POPs

7 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 777 What’s a High-End Router today? Switch Fabric Route Processor(s) Linecards (8-16) T1 to OC-192 Interfaces Distributed Architecture with Crossbar Switch Fabric Multi-Gigabit Switching Capacity

8 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 888 The next-generation of High-End Routers Switch Fabric Route Processor(s) Linecards (100’s to 1000’s) T1 to OC-768 Interfaces Multi-Terabit Switching Capacity Multi-Chassis, Distributed Architecture with Multi-Stage Switch Fabric

9 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 999 Evolution of Supercomputers Move from globally clocked, ECL vector processors to distributed-memory uP based multiprocessors  250MHz C90 to 1-2GHz Pentium 4, Alpha, Power3 This architecture change driven by:  Complexity and economics of building highest performance processors  Commoditization of smaller-scale computers  Not driven by programming desires of end-users Note that state-of-the-art processors can generate less than 10Gbit/sec of communication data

10 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 10 What’s a Supercomputer today? ASCII White at LLNL 8K processors in 512 nodes, 12TFLOPS Interconnect has connection BW of 1TByte/Sec Diagram and photo from LLNL ASCII webpage

11 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 11 Major components of a Router Distributed Control Plane  Used to run routing protocols (= dist. computer) Distributed Data Plane  Packet Processing: Examine L2-L7 protocol information (Determine QoS, VPN ID, policy, etc.)  Packet Forwarding: Make appropriate routing, switching, and queuing decisions System Interconnect  Control Plane – can be combined with data plane or dedicated  Data Interconnect – at least sum of external BW required

12 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 12 Major components of a Supercomputer Distributed Control / Computational nodes  Small number of processor nodes (4-16) with local memory Distributed IO Subsystem  Typically tied to subset of nodes, but if fully distributed these can be viewed as sync/source of external bandwidth similar to router external connections System interconnect  BW driven primarily by data sharing requirements and often limited by CPU’s ability to generate data

13 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 13 Router – Supercomputer Analogy High-End RouterSupercomputer Route ProcessorsCPU Nodes Line CardsI/O Nodes Switch FabricInterconnection Network

14 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 14 Route Processors ~ CPU Nodes Route Processors execute routing protocols and maintain routing and forwarding information bases  Large networks dictate gigabytes of memory to hold routing and interface database  Also require high-peak computation rates to reconverge network topology and download table updates to line cards  1000 MIPs per eight 40Gbit/sec interfaces for control plane CPU nodes in supercomputer run applications and source and sync processor communication traffic  1-2 Gflops and 1000 MIPs per processor  1-2 Gbytes of memory per processor

15 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 15 Router Line Card ~ SC I/O Node Packet forwarding, classification and feature processing require complex look-ups and queuing decisions be made on a per packet basis  Even with HW assist (TCAMs, etc.) approximately 500 instructions per packet  At 40Gbps and minimum size packet => 100MPPS  Total of 50,000 MIPS / 40Gbps line rate Queuing and TCP/IP congestion semantics imply 200millisec of buffering on ingress and egress .2sec x 40Gbps x 2 = 16Gbits = 2Gbyte / 40Gbps line rate  Fragmentation usually typically requires 4x BW queuing 40Gbps => 160Gpbs per queue x 2 (I & E) => 320Gbps

16 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 16 Table SRAM Fwd/Class TCAMs RTT Buffer Mem (1GB)+ pointer SRAM Distributed Memory Router Line Card Input Queuing Receive Fwd Engine Control CPU Mem Control Linecard Control CPU Fabric Re-Assem. Transmit Fwd Engine Output Queuing L2 Buffering Optics To Fabric From Fabric Framer RTT Buffer Mem (1GB)+ pointer SRAM Table SRAM Fwd/Class TCAMs 512+MB DRAM

17 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 17 Supercomputer I/O Nodes Disk and network attachment dominate requirements Computational requirements on data typically limits effective throughput 52 nodes of 512 on ASCII-White each with appox. 1-2Gbyte/sec per node of IO BW Data must be moved from IO to local node memory and then IPC’d to other computational nodes  Limited by node to interconnect BW limits

18 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 18 Router Switch Fabric ~ SC Interconnect Network Critical design parameters are:  Throughput  Traffic Isolation  Fault-Tolerance Router switch fabric must have over-speed of fabric BW to line BW to provide traffic isolation and deal with packet fragmentation  Minimum 1.5x with at least 2x line rate desirable  Gbps per 40Gbps line rate Depending size of system – topology varies from  Crossbar  Multistage Network (e.g., Benes, Clos)  Must be symmetric – all-to-all (like old-style Supercomputer)

19 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 19 Supercomputer Interconnect Network Critical parameters are:  Throughput  Latency (end-to-end) Actual supercomputers interconnects vary substantially, but usually <1Gbyte/sec per processor Topology Varies, but generally exploits locality  Hypercube  Torus or Mesh  Multi-stage networks

20 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 20 Overall Comparison Feature512 Linecard 40Gpbs/LC Router 512 node, 8K ASCII-White SuperComputer Control MIPS64 GIPS8000 GIPS Data MIPS25600 GIPSN/A Total Memory Storage 1024 Gbytes4096 Gbytes Total Memory Bandwidth 20 Tbyte/sec8 Tbyte/sec Interconnect Bandwidth 4 Tbyte/sec2 Tbyte/sec

21 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 21 Overall Technology Required Traditionally, networking equipment exploited off- the-shelf silicon, FPGA, standard ASIC technology High-end routers with OC-192 support approaching supercomputers  0.25u and 0.18u ASICs shipped in early 2001 High-end routers with OC-768 support require the leading edge of technology  ASICs using 0.13u technology and >1500pin packages  Latest memory technology Rambus, FCRAM and RLDRAM, QDR SRAM  Power per rack comparable to the 9.5KW for IBM’s SP2

22 NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. 22 Conclusions Explosive data rates and optics capabilities have pushed router technology tremendously in the last decade  From embedded single-board computers in the 80’s  To distributed-memory computers with specialized forwarding, queuing and feature processing capabilities In nearly every metric of system technology, today’s high-end routers match or exceed the capability of an equivalent supercomputer In addition, high-end routers have a critical requirement of system fault-tolerance Going forward, advances in high-end routers and supercomputers are technology-limited

23 23NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. Thank you! Bob Newhall,


Download ppt "1NORDUnet 2003 © 2003, Cisco Systems, Inc. All rights reserved. High-end Routers & Modern Supercomputers Bob Newhall & Dan Lenoski Cisco Systems, Routing."

Similar presentations


Ads by Google