Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Scaling of parsparsecircuit3.c Tim Warburton.

Similar presentations


Presentation on theme: "Parallel Scaling of parsparsecircuit3.c Tim Warburton."— Presentation transcript:

1 Parallel Scaling of parsparsecircuit3.c Tim Warburton

2 1 process per node In these tests we only use one out of two processors per node.

3 blackbear: 16 processors, 16 nodes

4 Apart from the mpi_allreduce calls, this is an almost perfect picture of parallelism

5 2 Processes Per Node We use both processors on each node

6 blackbear 8 nodes, 16 processes Notice, the prevelance of waitany. Clearly this code is not working as well as it does when running with 1 process per node.

7 blackbear 8 nodes, 16 processes (zoom in) I suspect that the threaded mpi communicators for the unblocked isend and irecv are competing for cpu time with the user code. Also – there could be competition for the memory bus and the network bus between the processors.

8 Timings for M=1024 (N=1024^2) (blackbear –O3) nodesNprocswallclock time 1219.4909 249.85369 485.01486 8163.19801 16323.77791 1119.2675 2210.2486 445.43999 882.79451 16 1.43782

9 Timings for Two Processes Per Nodes on Los Lobos nodesNprocswallclock time 12 8.9453 24 4.47474 48 2.17246 816 1.15644 Timings courtesy of Zhaoxian Zhou


Download ppt "Parallel Scaling of parsparsecircuit3.c Tim Warburton."

Similar presentations


Ads by Google