Presentation is loading. Please wait.

Presentation is loading. Please wait.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services.

Similar presentations


Presentation on theme: "N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services."— Presentation transcript:

1 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services

2 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 2 Overview Focus is on Fortran programs using MPI for communication Outline common pitfalls: –f90 vs. xlf Fortran compiler –Cray vs. IBM MPI library –Math libraries –System libraries –I/O

3 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 3 f90 vs. xlf - Main Differences f90 –compiles for parallel (MPI) automatically –accepts file suffix.f90,.F90 –default optimization is -O2 –allows access to full memory on a PE by default xlf –compiler is accessed by several names, each name “packages” options together –by default, only file suffix.f and.F allowed –default is no optimization –restricted amount of memory available by default

4 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 4 xlf Compiler Options Compiler name can have three parts: –optional prefix “mp” indicates MPI library is automatically linked –compiler name, xlf, xlf90, or xlf95 indicates language mode –optional postfix “_r” indicates threads, or OpenMP capability Example: –mpxlf90 - Fortran 90 language compiler with MPI library available –mpxlf_r - Fortran 77 language compiler with MPI library, threads, and OpenMP capability available. If you want to use MPI I/O, the thread capable compiler must be used.

5 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 5 xlf Compiler Options To use different file suffixes, e.g..f90 and.F90: –-qsuffix=f=f90,F=F90 For optimization we recommend: –-O3 -qtune=pwr3 -qarch=pwr3 -qstrict xlf defaults to 32 Kbytes for stack space and 128 Mbyte for heap space. To increase to maximums of 256 Mbyte for stack, and 2 Gbyte for heap: –-bmaxstack:0x10000000 -bmaxstack:0x80000000

6 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 6 Default Datatypes Double Complex is a language extension Assume -dp flag for f90 xlf compiler has -qrealsize=8 to promote all default reals and real constants to 8 bytes. Also, -qintsize=8 to promote all integers and logicals.

7 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 7 Available Datatypes Fortran 77 “*” syntax is also available to explicitly define a datatype

8 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 8 MPI Differences Different default datatypes between T3E and SP More error checking of arguments on the SP Default amount of buffering is different Different subset of MPI I/O implemented

9 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 9 Available MPI Datatypes

10 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 10 Default MPI Datatypes

11 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 11 MPI - Argument Checking T3E MPI library has several collective routines which do not check arguments in accordance with the MPI standard. The SP does check arguments. Examples: –MPI_Bcast “count” argument is not checked for consistency on T3E –MPI_Gatherv array of “counts” is not checked for consistency on T3E

12 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 12 MPI - Buffering If your program depends on the buffering of standard MPI Sends and Receives, you may see different behavior between the T3E and the SP. Classic case:... if (mype.eq.0) then call mpi_send(buf,count,type,1,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,0,tag,MPI_COMM_WORLD,status,ierr) else if (mype.eq.1) then call mpi_send(buf,count,type,0,tag,MPI_COMM_WORLD,ierr) call mpi_recv(buf,count,type,1,tag,MPI_COMM_WORLD,status,ierr) end if...

13 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 13 MPI - Buffering On the T3E, a message up to 4 Kbyte are buffered. This can be changed by setting the environment variable MPI_BUFFER_MAX. On the SP, the default size depends on the number of processors: 1 to 164096 17 to 322048 33 to 641024 65 to 128512 127 to 256256 257 and over128 This can be changed by setting the environment variable MP_EAGER_LIMIT.

14 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 14 Cray SciLib and IBM ESSL Both vendors provide libraries of commonly used Linear Algebra subroutines On the T3E this is linked by default, on the SP use “-lessl” These libraries are faster then the public domain BLAS, LAPACK, etc.

15 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 15 Using BLAS BLAS levels 1 through 3 are completely compatible between the two machines Note which precision of BLAS is being called: –On the T3E real*8 a(n), b(n), x … x = sdot(n,a,1,b,1) –On the SP real*8 a(n), b(n), x … x = ddot(n,a,1,b,1)

16 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 16 Using BLAS Instead of changing program source, loader options can be used to map one routine to another To resolve a call to sdot by a call to ddot on the SP: xlf -o a.out -brename:sdot,ddot b.f To resolve a call to ddot by a call to sdot on the T3E: f90 -o a.out -Wl”-Dequiv(DDOT)=SDOT” b.f

17 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 17 LAPACK routines Most other linear algebra routines in Cray SciLib and IBM ESSL are compatible with LAPACK. In ESSL there are a few incompatibilities (x may be C, D, S, Z): xGEEV xSPEV xSPSV xHPEV xHPSV xGEGV xSYGV Use installed LAPACK library for these.

18 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 18 ScaLAPACK library Cray SciLib and IBM PESSL support pieces of the standard ScaLAPACK library. Check precision of routines: –For real*8 on the T3E, routines start “PS” –For real*8 on the SP, routines start “PD” On the SP, you must call BLACS_GET followed by either BLACS_GRIDINIT or BLACS_GRIDMAP. On the T3E, only a call to one of the latter two routines is required. Public domain ScaLAPACk is also installed on both machines.

19 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 19 System Libraries Generally, any routines which interact with the operating system, and provide extensions to the Fortran language. Cray provides very many such routines. Some are available on the SP, for example:

20 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 20 System Libraries A more comprehensive list is available at: http://hpcf.nersc.gov/computers/SP/port.html Some routines have changed names and slightly different arguments. There are sometimes identically or similarly named routines on the SP which are designed to be called from C only. Calling them from Fortran will cause unexpected behavior. For example, calling exit instead of exit_ will cause the program to end without flushing any Fortran I/O buffer.

21 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 21 Fortran I/O Unformatted I/O –The primitive datatypes on the T3E and SP are compatible (provided they are of the same length), but control words inserted by Fortran language i/o layer prevent transferability of sequential access files. –Direct access files can be freely transferred between the two machines, as can MPI I/O files. Namelist Input/Output –Users familar with the assign -f77 on the T3E, which causes an old- style namelist input to be written or read, can set the following environment variable on the SP to obtain the same effect: setenv XLFRTEOPTS="namelist=old"

22 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 22 Further Information T3E and SP webpages and software webpages contain further information and links to vendor documentation: http://hpcf.nersc.gov/computers http://hpcf.nersc.gov/software


Download ppt "N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Porting from the Cray T3E to the IBM SP Jonathan Carter NERSC User Services."

Similar presentations


Ads by Google