Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Wegner, DESY CHEP03, 25 March 2003 1 LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig.

Similar presentations


Presentation on theme: "Peter Wegner, DESY CHEP03, 25 March 2003 1 LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig."— Presentation transcript:

1 Peter Wegner, DESY CHEP03, 25 March 2003 1 LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig (DESY Hamburg) CHEP03, 25 March 2003 Category 6: Lattice Gauge Computing Motivation PC Cluster @DESY Benchmark architectures DESY Cluster E7500 systems Infiniband blade servers Itanium2 Benchmark programs, Results Future Conclusions, Acknowledgements

2 Peter Wegner, DESY CHEP03, 25 March 2003 2 PC Cluster Motivation LQCD, Stream Benchmark, Myrinet Bandwidth 32/64-bit Dirac Kernel, LQCD (Martin Lüscher, (DESY) CERN, 2000): P4, 1.4 GHz, 256 MB Rambus, using SSE1(2) instructions incl. cache pre- fetch Time per lattice point: 0.926 micro sec (1503 Mflops [32 bit arithmetic]) 1.709 micro sec (814 Mflops [64 bit arithmetic]) Stream Benchmark, Memory Bandwidth: P4(1.4 GHz, PC800 Rambus): 1.4 … 2.0 GB/s PIII (800MHz, PC133 SDRAM) : 400 MB/s PIII(400 MHz, PC133 SDRAM) : 340 MB/s Myrinet, external Bandwidth: 2.0+2.0 Gb/s optical-connection, bidirectional, ~240 MB/s sustained

3 Peter Wegner, DESY CHEP03, 25 March 2003 3 Benchmark Architectures - DESY Cluster Hardware NodesMainboard Supermicro P4DC6 2 x XEON P4, 1.7 (2.0) GHz, 256 (512) kByte Cache 1 Gbyte (4x 256 Mbyte) RDRAM IBM 18.3 GB DDYS-T18350 U160 3.5” SCSI disk Myrinet 2000 M3F-PCI64B-2 Interface NetworkFast Ethernet Switch Gigaline 2024M, 48x100BaseTX ports + GIGAline2024 1000BaseSX-SC Myrinet Fast Interconnect M3-E32 5 slot chassis, 2xM3-SW16 Line cards Installation Zeuthen: 16 dual CPU nodes, Hamburg: 32 dual CPU nodes

4 800MB/s 64 bit PCI P64H >1GB/s 64 bit PCI 800MB/s P64H Benchmark Architectures DESY Cluster i860 chipset problem 133 MB/s I ICH2 MCH + + + + PCI Slots, (33 MHz, 32bit) 4 USB ports LAN Connection Interface ATA 100 MB/s (dual IDE Channels) 6 channel audio 10/100 Ethernet Intel® Hub Architecture 266 MB/s 3.2 GB/s XeonProcessor Dual Channel RDRAM* AGP4XGraphics 400MHz System Bus XeonProcessor PCI Slots (66 MHz, 64bit) MRH MRH Up to 4 GB of RDRAM bus_read (send) = 227 MBytes/s bus_write (recv) = 315 MBytes/s of max. 528 MBytes/s External Myrinet bandwidth: 160 Mbytes/s 90 Mbytes/s bidirectional

5 Peter Wegner, DESY CHEP03, 25 March 2003 5 Benchmark Architectures – Intel E7500 chipset

6 Peter Wegner, DESY CHEP03, 25 March 2003 6 Benchmark Architectures - E7500 system Par-Tec (Wuppertal) 4 Nodes:Intel(R) Xeon(TM) CPU 2.60GHz 2 GB ECC PC1600 (DDR-200) SDRAM Super Micro P4DPE-G2 Intel E7500 chipset PCI 64/66 2 x Intel(R) PRO/1000 Network Connection Myrinet M3F-PCI64B-2

7 Peter Wegner, DESY CHEP03, 25 March 2003 7 Benchmark Architectures Leibniz-Rechenzentrum Munich (single cpu tests): Pentium IV 3,06GHz. with ECC Rambus Pentium IV 2,53GHz. with Rambus 1066 memory Xeon, 2.4GHz. with PC2100 DDR SDRAM memory (probably FSB400) Megware: 8 nodes dual XEON, 2.4GHz, E7500 2GB DDR ECC memory Myrinet2000 Supermicro P4DMS-6GM University of Erlangen: Itanium2, 900MHz, 1.5MB Cache, 10GB RAM zx1 chipset (HP)

8 Peter Wegner, DESY CHEP03, 25 March 2003 8 Benchmark Architectures - Infiniband Megware: 10 Mellanox ServerBlades Single Xeon 2.2 GHz 2 GB DDR RAM ServerWorks GC-LE Chipsatz InfiniBand 4X HCA RedHat 7.3, Kernel 2.4.18-3 MPICH-1.2.2.2 und OSU-Patch für VIA/InfiniBand 0.6.5 Mellanox Firmware 1.14 Mellanox SDK (VAPI) 0.0.4 Compiler GCC 2.96

9 Peter Wegner, DESY CHEP03, 25 March 2003 9 Dirac Operator Benchmark (SSE) 16x16 3, single P4/XEON CPU Dirac operator Linear Algebra MFLOPS

10 Peter Wegner, DESY CHEP03, 25 March 2003 10 Parallel (1-dim) Dirac Operator Benchmark (SSE), even-odd preconditioned, 2 x 16 3, XEON CPUs, single CPU performance MFLOPS Myrinet2000 i860: 90 MB/s E7500: 190 MB/s

11 Peter Wegner, DESY CHEP03, 25 March 2003 11 Parallel (1-dim) Dirac Operator Benchmark (SSE), even-odd preconditioned, 2 x 16 3, XEON CPUs, single CPU performance, 2, 4 nodes Single nodeDual node SSE2non-SSESSE2non-SSE 446328 (74%)330283 (85%) blockingnon-blocking I/O 308367 (119%) Parastation3 software non-blocking I/O support (MFLOPS, non-SSE): Performance comparisons (MFLOPS):

12 Peter Wegner, DESY CHEP03, 25 March 2003 12 Maximal Efficiency of external I/O MFLOPs (without communication) MFLOPS (with communication) Maximal Bandwidth Efficiency Myrinet (i860), SSE 57930790 + 900.53 Myrinet/GM (E7500), SSE 631432190 + 1900.68 Myrinet/ Parastation (E7500), SSE 675446181 + 1810.66 Myrinet/ Parastation (E7500), non-blocking, non-SSE 406368 hidden 0.91 Gigabit, Ethernet, non-SSE 390228100 + 1000.58 Infiniband non-SSE 370297210 + 2100.80

13 Peter Wegner, DESY CHEP03, 25 March 2003 13 Parallel (1-dim) Dirac Operator Benchmark (SSE), even-odd preconditioned, 2 x 16 3, XEON/Itanium2 CPUs, single CPU performance, 4 nodes 4 single CPU nodes, Gbit Ethernet, non-blocking switch, full duplex P4 (2.4 GHz, 0.5 MB cache) SSE:285 MFLOPS88.92 + 88.92 MB/s non-SSE:228 MFLOPS75.87 + 75.87 MB/s Itanium2 (900 MHz, 1.5 MB cache) non-SSE:197 MFLOPS63.13 + 63.13 MB/s

14 Peter Wegner, DESY CHEP03, 25 March 2003 14 Infiniband interconnect Link: High Speed Serial 1x, 4x, and 12x Host Channel Adapter: Protocol Engine Moves data via messages queued in memory Switch: Simple, low cost, multistage network Target Channel Adapter: Interface to I/O controller SCSI, FC-AL, GbE,... I/O Cntlr TCA Sys Mem CPU Mem Cntlr Host Bus HCA Link Switch Link Sys Mem HCA Mem Cntlr Host Bus CPU TCA I/O Cntlr http://www.infinibandta.org up to 10GB/s Bi-directional Chips :IBM, Mellanox PCI-X cards: Fujitsu, Mellanox, JNI, IBM

15 Peter Wegner, DESY CHEP03, 25 March 2003 15 Infiniband interconnect

16 Peter Wegner, DESY CHEP03, 25 March 2003 16 Parallel (2-dim) Dirac Operator Benchmark (Ginsparg-Wilson- Fermions), XEON CPUs, single CPU performance, 4 nodes Infiniband vs Myrinet performance, non-SSE (MFLOPS): XEON 1.7 GHz Myrinet, i860 chipset XEON 2.2 GHz Infiniband, E7500 chipset 32-Bit64-Bit32-Bit64-Bit 8x8 3 lattice, 2x2 processor grid 370281697477 16x16 3 lattice, 2x4 processor grid 338299609480

17 Peter Wegner, DESY CHEP03, 25 March 2003 17 Future - Low Power Cluster Architectures ?

18 Peter Wegner, DESY CHEP03, 25 March 2003 18 Future Cluster Architectures - Blade Servers ? NEXCOM – Low voltage blade server 200 low voltage Intel XEON CPUs (1.6 GHz – 30W) in a 42U Rack Integrated Gbit Ethernet network Mellanox – Infiniband blade server Single XEON Blades connected via a 10 Gbit (4X) Infiniband network MEGWARE, NCSA, Ohio State University

19 Peter Wegner, DESY CHEP03, 25 March 2003 19 Conclusions PC CPUs have an extremely high sustained LQCD performance using SSE/SSE2 (SIMD+pre-fetch), assuming a sufficient large local lattice Bottlenecks are the memory throughput and the external I/O bandwidth, both components are improving (Chipsets: i860  E7500  E705  …, FSB: 400MHz  533 MHz  667 MHz  …, external I/O: Gbit-Ethernet  Myrinet2000  QSnet  Inifiniband  …) Non-blocking MPI communication can improve the performance by using adequate MPI implementations (e.g. ParaStation) 32-bit Architectures (e.g. IA32) have a much better price performance ratio than 64-bit architectures (Itanium, Opteron ?) Large low voltage dense blade clusters could play an important role in LQCD computing (low voltage XEON, CENTRINO ?, …)

20 Peter Wegner, DESY CHEP03, 25 March 2003 20 Acknowledgements We would like to thank Martin Lüscher (CERN) for the benchmark codes and the fruitful discussions about PCs for LQCD, and Isabel Campos Plasencia (Leibnitz-Rechenzentrum Munich), Gerhard Wellein (Uni Erlangen), Holger Müller (Megware), Norbert Eicker (Par-Tec), Chris Eddington (Mellanox) for the opportunity to run the benchmarks on their clusters.


Download ppt "Peter Wegner, DESY CHEP03, 25 March 2003 1 LQCD benchmarks on cluster architectures M. Hasenbusch, D. Pop, P. Wegner (DESY Zeuthen), A. Gellrich, H.Wittig."

Similar presentations


Ads by Google