Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,

Similar presentations


Presentation on theme: "SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,"— Presentation transcript:

1 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09, 2008 Mahidhar Tatineni (SDSC) Lonnie Crosby (NICS) John Cazes (TACC)

2 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Overview of MPCUGLES Code MPCUGLES is an unstructured grid large eddy simulation code (written in f90/MPI), developed by Prof. Mahesh Krishnan’s group at the University of Minnesota, which can be used for very complex geometries. The incompressible flow algorithm employs a staggered approach with face- normal velocities stored at the centroids of faces, velocity and pressure stored at cell-centroids. The non-linear terms are discretized such that discrete energy conservation is imposed. The code also uses the HYPRE library (developed at LLNL) which is a set of high performance preconditioners to help solve sparse linear systems of equations which are part of the main algorithm. MPCUGLES has been run at scale using upto 2048 cores and 50 million control volumes, on the Blue Gene (SDSC), DataStar (SDSC), Ranger (TACC) and Kraken (NICS).

3 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO General Requirements Grid, initial condition generation and partitioning for the runs is done using the METIS software. For the larger grids the experimental metis-5.0pre1 version is required (Previous ASTA project uncovered a problem with metis-4.0 version for large scale cases). The I/O in the code is done using NETCDF. Each processor writes its own files in the NETCDF format. There is no MPI-IO or parallel netcdf requirement. HYPRE library (from LLNL) of high performance preconditioners that features parallel multigrid methods for both structured and unstructured grid problems. Compiled with version 1.8.2b. The algebraic multigrid (HYPRE_BoomerAMG) solver is used from the library. The MPCUGLES code also has the option of using a conjugate-gradient method as an alternative.

4 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Porting to Ranger and Kraken The code was recently ported to both the available track 2 systems (Ranger and Kraken). Compiling the code on both machines was relatively straightforward. Both Ranger and Kraken had the netcdf libraries already installed. The needed versions of the Hypre library (v 1.8.2b) and METIS (v 5.0pre1) were easy to install on both machines. The grid and initial condition generation codes are currently serial. For the current scaling studies they were run on Ranger (1 proc/node, 32GB) or DataStar (1 proc/p690 node; 128GB). This is a potential bottleneck for larger runs (>50 million CVs) and part of the current AUS project will be focused on parallelizing this part so that much larger grid sizes can be considered.

5 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Performance on Ranger Strong Scaling (257^3 grid) Weak Scaling (64k CVs/task) Cores4-way8-way 162298s (2-way) - 321004s- 64577s633s 128353s494s 256304s503s 512-678s CoresTotal CVs4-way8-way 162097152287s308s 324194304417s453s 648388608396s433s 12816777216353s494s 25633554432560s-

6 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Performance on Kraken Strong Scaling (257^3 grid) Weak Scaling (64k CVs/task) Cores1-way2-way 16-- 32-- 64514s- 128285s365s 256187s280s 512157s268s CoresTotal CVs1-way2-way 162097152275s301s 324194304365s405s 648388608337s379s 12816777216285s365s 25633554432428s -

7 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Comments on Performance Strong scaling for 16 million control volumes case is o.k. upto 256 cores on Ranger and 512 cores on Kraken. The primary factor is the network bandwidth available per core (higher on Kraken). Overall the code scales o.k. if there are ~32-64K CVs per task. This is consistent with previous results on DataStar. The code should exhibit good weak scaling based on the communication pattern seen in older runs (mostly nearest neighbor). The results are o.k. up to 256 cores but show a jump in run times after that. One of the problems is that the underlying solver might be taking longer to converge as the number of CVs increases (this is not a isotropic problem … wall bound channel flow). Weak scaling runs for 65K CVs/task and above 512 cores are restricted due to grid size limitations at this point. Needs to be addressed.

8 SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Future Work Near term: Redo the weak scaling runs with an isotropic run to see if that helps avoid the extra computations needed by the underlying solver. Run at larger processor counts on both Ranger and Kraken with profiling / performance tools to analyze the performance. Long term: Parallelize the initial condition and grid generation parts to enable scaling to much larger processor counts. Investigate the performance implications of changing the underlying linear solver and see if any improvements can be made. For example the CG algorithm scales much better (tests on Kraken already show this) but takes longer to converge (tradeoff).


Download ppt "SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Advanced User Support for MPCUGLES code at University of Minnesota October 09,"

Similar presentations


Ads by Google