Presentation is loading. Please wait.

Presentation is loading. Please wait.

P. R. Schulz, University of MannheimNov. 4th PDCS20021 ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz.

Similar presentations


Presentation on theme: "P. R. Schulz, University of MannheimNov. 4th PDCS20021 ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz."— Presentation transcript:

1 P. R. Schulz, University of MannheimNov. 4th PDCS20021 ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz schulz@uni-mannheim.de Computer Architecture Group University of Mannheim, Germany

2 P. R. Schulz, University of MannheimNov. 4th PDCS20022 Presentation Outline Design Considerations and Goals Basic Architecture of ATOLL Optimization for Performance and Cost Special features of ATOLL Performance results Future Developments and Conclusion

3 P. R. Schulz, University of MannheimNov. 4th PDCS20023 ATOLL SAN Design considerations for ATOLL: mDesign for highest performance and lowest cost mMinimization of communication latency mOptimization of bandwidth for small and large messages mRealization of basic communication functions in hardware mSimplification of program access to the NIC mAvoiding software overhead

4 P. R. Schulz, University of MannheimNov. 4th PDCS20024 ATOLL NIC Design goals on ATOLL: äIntegration of all network components to a single chip å the external switch moves onto the NIC äProvides 4 replicated independant NI devices on the host side to serve 2/4-way SMP nodes without OS intervention ä4 bidirectional Link Ports to SAN äUser level communication äHardware message handler äMany support functions for parallel processing (atomic message startup, thread synchronization,...)

5 P. R. Schulz, University of MannheimNov. 4th PDCS20025 ATOLL Basic Architecture ATOLL-Chip 4,5 Mio transistors 0.18u CMOS process 5,7 x 5,7 mm Chip Fastest and Second Biggest Design of a European University

6 P. R. Schulz, University of MannheimNov. 4th PDCS20026 ATOLL HW Architecture PCI Interface ä64bit/66,100,133MHz PCI-X 1.0 compliant äruns also as 32bit/33MHz PCI interface (3.3V) ämaster (DMA) and slave (PIO) functionality äcapable of combining several transactions into one burst if applicable

7 P. R. Schulz, University of MannheimNov. 4th PDCS20027 ATOLL HW Architecture Host Port (Network Interface) äfour fully featured devices ärunning at 250 MHz äPIO Mode for efficient send/receive of small messages utilizing write- combining and read-prefetching äDMA engines for autonomous transfer of large messages äsmall NI context of two cache lines fully loadable (virtual interfaces)

8 P. R. Schulz, University of MannheimNov. 4th PDCS20028 ATOLL HW Architecture 4 x 4 bi-directional Crossbar äfully integrated network switch on-chip ärunning at 250 MHz ä2 GBytes/s bisection bandwidth äfully pipelined, wormhole routing äfall-through latency of 6 cycles (24ns) äreverse flow control through crossbar

9 P. R. Schulz, University of MannheimNov. 4th PDCS20029 ATOLL HW Architecture Link Interface äbidirectional byte-wide LVDS Links (2 x 250 MBytes/s) ärunning at 250 MHz äreverse flow control characters are exchanged to prevent buffer overflow äCRC protection & automatic retransmission for 64 byte link packets lguaranteed message delivery after injection into network

10 P. R. Schulz, University of MannheimNov. 4th PDCS200210 ATOLL 2d Torus Topology Example Node with an ATOLL NIC All topologies fitting to the 4 interconnects are supported...

11 P. R. Schulz, University of MannheimNov. 4th PDCS200211 ATOLL Tree Topology Example NIC

12 P. R. Schulz, University of MannheimNov. 4th PDCS200212 Optimization for Performance and Cost regarding cost: åwormhole philosophy eliminates memory on NIC ålink cables and connectors (HD-68pin), PCB, chip package (custom BGA) are highly optimized for routability => ONLY 2+2 layer PCB, single layer package åLVDS signalling => high speed, low power, low EMI åI/O cells (LVDS, PCI-X) designed by partner university åfree standard cell lib (VST, 0.18um) ålow cost backend service, wire-length driven, traditional design flow

13 P. R. Schulz, University of MannheimNov. 4th PDCS200213 Optimization for Performance and Cost regarding performance: åHardware retransmission => low software overhead åPCI-X => high performance node interface åUser-level communication (multiple devices) => low latency åHigh clock frequency (250MHz) => high bandwidth (2GB/s) åLow latency (3 clock cycles for xbar arbitration) åNO kernel traps, IRQs when accessing the device and NO polling on PCI bus åmirroring important status registers in main memory using cache coherence

14 P. R. Schulz, University of MannheimNov. 4th PDCS200214 Optimization for Performance and Cost

15 P. R. Schulz, University of MannheimNov. 4th PDCS200215 Special Hardware Features regarding performance and cost: åprogrammable clock period (14MHz steps) => speed grades åcables with controlled impedance and low skew => transmission lines characteristics => wave pipelining ådouble pumped data on the cables => only one frequency, no phase shift

16 P. R. Schulz, University of MannheimNov. 4th PDCS200216 ATOLL Bandwidth Link utilization 100% = 250MByte/s ~225 MByte/s >100 MByte/s message size [bytes] link utilization [%]

17 P. R. Schulz, University of MannheimNov. 4th PDCS200217 ATOLL Latency ONLY 27 clock cycles (~100 ns) latency per hop. Test system: P3-1000 (Serverworks) PCI 66/64bit ATOLL@245MHz

18 P. R. Schulz, University of MannheimNov. 4th PDCS200218 Cost Comparision PerformanceCost Fast-Ethernet Myrinet 2000 ATOLL 16Gb/s 2GB/s 4Gb/s 0.5GB/s 100Mb/s 12MB/s 4x0.3x 1xNIC + 1x 4 port Switch ~ $2700 1xNIC ~ $900 1GB/s $1000 4xNIC + 1x 4 port Switch ~ $540 Fast-Ethernet Myrinet 2000 ATOLL ATOLL:Cost-effectivness of 4 x (1/0.3) = 12 x of Myrinet

19 P. R. Schulz, University of MannheimNov. 4th PDCS200219 ATOLL-Team Ulrich Brüning Lambert Schälicke Patrick R. Schulz Holger Fröning Lars Rzymianowicz Uni Mannheim LS Rechnerarchitektur Uni Kaiserslautern LS Schaltungstechnik Prof. Tielert Mark Wegener Thanks to: IMEC Belgium Carl Das Layout Backend Service I/O Cells Basic Architecture HW Implementation Architectural Enhancements SUN Microsystems Synopsys

20 P. R. Schulz, University of MannheimNov. 4th PDCS200220 Future Development Future of ATOLL Hardware-Development optical Link Interconnect based on a high performance SERDES chip (2 x 250 MB/s to 2.5 Gb/s) short distance (up to 100m) serial optical interconnect plug compatible to electrical interface very cost effective implementation ATOLL 2 500 MHz clock higher dimensional Crossbar for multidimensional IN structures multithreaded cached host interface memory management support command extension for direct memory operations (put, get, …) => MPI-2

21 P. R. Schulz, University of MannheimNov. 4th PDCS200221 Conclusion èRadical new design approach leads to a single chip solution integrating a whole network on a chip. èLow budget design implemented from architecture to the chip. èIt’s now reality (We are lucky: It’s first time right)

22 P. R. Schulz, University of MannheimNov. 4th PDCS200222 ATOLL: A New Contender in the System Area Network Market further information: www.atoll-net.de schulz@uni-mannheim.de Thank you for your attention! Questions?

23 P. R. Schulz, University of MannheimNov. 4th PDCS200223 Chip Photo

24 P. R. Schulz, University of MannheimNov. 4th PDCS200224 Interconnect


Download ppt "P. R. Schulz, University of MannheimNov. 4th PDCS20021 ATOLL ATOLL - Performance And Cost Optimization of a SAN Interconnect Dipl.-Inf. Patrick R. Schulz."

Similar presentations


Ads by Google