Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance measurement with ZeroMQ and FairMQ

Similar presentations


Presentation on theme: "Performance measurement with ZeroMQ and FairMQ"— Presentation transcript:

1 Performance measurement with ZeroMQ and FairMQ
Mohammad Al-Turany 20/02/15 CWG13 Meeting

2 Zero MQ performance tests suite
Zero MQ deliver some tools to measure bandwidth and latency of the network, following executables are build by default and located in the perf subdirectory local_lat Remote_lat local_thr remote_thr 20/02/15 CWG13 Meeting

3 ØMQ performance tests suite
Latency Test consists of local_lat and remote_lat. These are to be placed on two boxes that you wish to measure latency between. We did not perform this test up to know!! $ local_lat tcp://eth0: $ remote_lat tcp:// : message size: 1 [B] roundtrip count: average latency: [us] latency reported is the one-way latency 20/02/15 CWG13 Meeting

4 ØMQ performance tests suite
Throughput Test consists of local_thr and remote_thr. These are to be placed on two boxes that you wish to measure latency between. $local_thr tcp://eth0: $remote_thr tcp:// : message size: 1 [B] message count: mean throughput: [msg/s] mean throughput: [Mb/s] 20/02/15 CWG13 Meeting

5 Running the Zero MQ performance test on the DAQ test cluster
20/02/15 CWG13 Meeting

6 Running the Zero MQ performance test on the DAQ test cluster
20/02/15 CWG13 Meeting

7 Running the Zero MQ performance test on the DAQ test cluster
20/02/15 CWG13 Meeting

8 Running the Zero MQ performance test on the DAQ test cluster
20/02/15 CWG13 Meeting

9 Performance test with FairMQ FLP 2 EPN
aidrefma02 aidrefma01 Push-Pull pattern Message size= 10 Mbyte Throughput = 2,6 Gbyte/s 20/02/15 CWG13 Meeting

10 Performance test with FairMQ FLP 2 EPN
aidrefma02 aidrefma01 Push-Pull pattern Message size= 10 Mbyte Throughput = 3,7 Gbyte/s 20/02/15 CWG13 Meeting

11 Performance test with FairMQ FLP 2 EPN
aidrefma03 aidrefma01 Push-Pull pattern Message size= 10 Mbyte Throughput = 4,8 Gbyte/s 20/02/15 CWG13 Meeting

12 A node that use 3(4) cores to receive data via Ethernet or IPoverIB at a rate of more than 4 GByte/s, ist still usable for reconstruction? 20/02/15 CWG13 Meeting

13 STREAM: Sustainable Memory Bandwidth in High Performance Computers
A simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels.  Specifically designed to work with datasets much larger than the available cache on any given system, so that the results are (presumably) more indicative of the performance of very large, vector style applications. 20/02/15 CWG13 Meeting

14 Stream Settings This system uses 8 bytes per array element Array size = (elements), Offset = 0 (elements) Memory per array = MiB (= 1.5 GiB). Total memory required = MiB (= 4.5 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. Number of Threads requested = 12 Number of Threads counted = 12 20/02/15 CWG13 Meeting

15 STREAM is intended to measure the bandwidth from main memory
20/02/15 CWG13 Meeting

16 Performance and bandwidth test with FairMQ FLP 2 EPN
aidrefma02 CERN: DAQ Lab system: 40 G Ethernet Dual socket Intel Sandy Bridge-EP, dual 2.90GHz, 2x8 hw cores - 32 threads, 64GB RAM Function Best Rate MB/s Avg time Min time Max time Copy: Scale: Add: Triad:     name        kernel                  bytes/iter      FLOPS/iter     COPY:       a(i) = b(i)                 16              0     SCALE:      a(i) = q*b(i)               16              1     SUM:        a(i) = b(i) + c(i)          24              1     TRIAD:      a(i) = b(i) + q*c(i)     24              2   CWG13 Meeting 20/02/15

17 Performance and bandwidth test with FairMQ FLP 2 EPN
aidrefma01 aidrefma02 FLP EPN CERN: DAQ Lab system: 40 G Ethernet Dual socket Intel Sandy Bridge-EP, dual 2.90GHz, 2x8 hw cores - 32 threads, 64GB RAM 8 MB Masseges 4.7 Gbyte/s Function Best Rate MB/s Copy: Scale: Add: Triad: -16 % -18 % -15 % CWG13 Meeting 20/02/15

18 Performance and bandwidth test with FairMQ FLP 2 EPN
CPU Time in seconds needed to simulate 1000 events, 10 proton in FairRoot example 3 aidrefma01 aidrefma02 FLP EPN Run 12 processes Without MQ With 4 MB Messages With 8 MB Messages 54 61 68 58 64 66 62 56 57 55 63 67 60 65 57,3 62,1 61,2 5% 4% Geant Geant 4 MB Masseges 4.5 Gbyte/s 8 MB Masseges 4.7 Gbyte/s Geant Geant Geant Geant Geant CERN: DAQ Lab system: 40 G Ethernet Dual socket Intel Sandy Bridge-EP, dual 2.90GHz, 2x8 hw cores - 32 threads, 64GB RAM Geant Geant Geant Geant Geant Geant CWG13 Meeting 20/02/15

19 Performance and bandwidth test with FairMQ FLP 2 EPN
CPU Time in seconds needed to simulate 1000 events, 100 proton in FairRoot example 3 aidrefma01 aidrefma02 FLP EPN Run 12 processes Without MQ With 8 MB Messages 565 605 573 615 570 598 603 602 563 601 619 576 616 574 606 567 609 577 595 570.2 605.6 6% Geant Geant 8 MB Masseges 4.7 Gbyte/s 2.8 TByte total data transfer Geant Geant Geant Geant Geant CERN: DAQ Lab system: 40 G Ethernet Dual socket Intel Sandy Bridge-EP, dual 2.90GHz, 2x8 hw cores - 32 threads, 64GB RAM Geant Geant Geant Geant Geant Geant CWG13 Meeting 20/02/15

20 Backup and Discussion 20/02/15 CWG13 Meeting

21 Run on STREAM version $Revision: 5.10 $ This system uses 8 bytes per array element. Array size = (elements), Offset = 0 (elements) Memory per array = 76.3 MiB (= 0.1 GiB). Total memory required = MiB (= 0.2 GiB). Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of microseconds. (= clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. Function Best Rate MB/s Avg time Min time Max time Copy: Scale: Add: Triad: 20/02/15 CWG13 Meeting


Download ppt "Performance measurement with ZeroMQ and FairMQ"

Similar presentations


Ads by Google