Presentation is loading. Please wait.

Presentation is loading. Please wait.

N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services.

Similar presentations


Presentation on theme: "N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services."— Presentation transcript:

1 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services

2 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 2 T3E Overview T3E is a set of Processing Elements (PE) connected by a fast 3D torus. PEs do not have local disk All PEs access all filesystems equivalently Path for I/O generally looks like: –user buffer space –system buffer space –I/O device buffer space

3 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 3 Filesystems /usr/tmp –fast –subject to 14 day purge, not backed up –check quota with quota -s /usr/tmp (usually 75Gb and 6000 inodes) $TMPDIR –fast –purged at end of job or session –shares quota with /usr/tmp $HOME –slower –permanent, backed up –check quota with quota (usually 2Gb and 3500 inodes)

4 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 4 Types of I/O Language I/O: Fortran or C (ANSI or POSIX) Cray FFIO library (can be used from Fortran or C) MPI I/O Cray extensions to Fortran and C I/O (mostly for compatibility with PVP systems)

5 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 5 I/O Strategies - Exclusive access files Each PE reads and writes to a separate file –Language I/O –MPI I/O –Increase language I/O performance with FFIO library (C must use POSIX style calls)

6 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 6 I/O Strategies - Communication and I/O PE One PE coordinates reading and writing and communicates data back and forth between other PEs via message passing –Language I/O –MPI I/O –Increase language I/O performance with FFIO library

7 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 7 I/O Strategies - Shared files All PEs read and write the same file simultaneously –Language I/O with FFIO library global layer –MPI I/O –Language I/O with FFIO library global layer and Cray extensions for additional flexibility

8 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 8 Cray FFIO library FFIO is a set of I/O layers tuned for different I/O characteristics Buffering of data (configurable size) Caching of data (configurable size) Available to regular Fortran I/O without reprogramming Available for C through POSIX-like calls, e.g. ffopen, ffwrite

9 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 9 The assign command the assign command controls –controls which FFIO layer is active –striping across multiple partitions –lots more scope of assign –File name –Fortran unit number –File type (e.g. all sequential unformatted files)

10 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 10 assign Examples read and write to file restart.file from all PEs by using the FFIO library global layer assign -F global:128:2 f:restart.file use the FFIO library bufa layer to improve performance for file opened on Fortran unit 10 assign -F bufa:128:2 u:10 use the FFIO library bufa layer to improve performance for all unformatted sequential Fortran files assign -F bufa:128:2 g:su

11 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 11 assign Examples To see all active assigns assign -V To remove all active assigns assign -R

12 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 12 bufa FFIO layer bufa is an asynchronous buffering layer performs read-ahead, write-behind specify buffer size with -F bufa:bs:nbufs where bs is the buffer size in units of 4Kbyte blocks, and nbufs is the number of buffers buffer space increases your applications memory requirements

13 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 13 global FFIO layer global is a caching and buffering layer which enables multiple PEs to read and write to the same file if one PE has already read the data, an additional read request from another PE will result in a remote memory copy file open is a synchronizing event By default, all PEs must open a global file, this can be changed by calling GLIO_GROUP_MPI(comm) specify buffer size with -F global:bs:nbufs where bs is the buffer size in units of 4Kbyte blocks, and nbufs is the number of buffers per PE

14 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 14 File positioning with the global FFIO layer Positioning of a read or write is your responsibility File pointers are private Fortran –Use a direct access file, and read/write(rec=num) –Use Cray extensions setpos and getpos to position file pointer (not portable) C –Use ffseek

15 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 15 FFIO considerations Examples above use an unblocked file structure, normal Fortran files are blocked. To read the file without the global or bufa layers you must use assign -s unblocked f:filename bufa and global do not allow backspace, or skipping over a partially read record. You can allow this behavior by using the cos layer in addition to bufa or global, but then setpos doesn’t work. assign -s cos:128,bufa:128:2 f:filename

16 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 16 More on FFIO There are many other FFIO layers, some pretty obscure –cache and cachea layers, good for random access files man intro_ffio for a terse description Cray Publication - Application Programmer’s I/O Guide

17 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 17 More on assign Many text processing options Switch between Fortran 77 and Fortran 90 namelist File pre-allocation File striping

18 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 18 Further Information I/O on the T3E Tutorial by Richard Gerber at http://home.nersc.gov/training/tutori als Cray Publication - Application Programmer’s I/O Guide Cray Publication - Cray T3E Fortran Optimization Guide man assign

19 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 19 MPI I/O Part of MPI-2 Interface for High Performance Parallel I/O –data partitioning –collective I/O –asynchronous I/O –portability and interoperability

20 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 20 MPI I/O Definitions An MPI file is an ordered collection of MPI types. A file may be opened individually or collectively by a group of processes The fileview defines a template for accessing the file and is used to partition the file amongst processes

21 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 21 Fileviews A fileview is composed of three pieces: –a displacement (in bytes) form the beginning of the file –an elementary datatype (etype), which is the unit of data access and positioning within the file – an filetype, which defines a template for accessing the file. A filetype can contain etypes or holes of the same extent as etypes.

22 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 22 Fileviews (cont.) The filetype pattern is repeated, “tiling” the file Only the non-empty slots are available to read or write

23 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 23 Fileview (cont.) Each process can have a different filetype Process 0 Process 1 Process 2

24 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 24 MPI_File_set_view Called after MPI_File_open to set fileview MPI_File_set_view(fh, disp, etype, filetype, datarep, info) –fh is a file handle –disp, etype, and filetype define the fileview –datarep is one of “native”, “internal”, or “external32” –info is a set of hints to optimize performance

25 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 25 MPI Info object An info object bundles up a set of parameters integer finfo call MPI_Info_create(finfo, ierr) call MPI_Info_set(finfo, ‘access_style’, ‘write_mostly’, ierr) MPI I/O defines a set of parameters used to help optimize I/O performance MPI_Info_null can be used instead of an info object

26 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 26 Open and Close MPI_File_open(comm, filename, amode, info, fh) –comm, open is collective over this communicator –filename, string or character variable –file access mode: MPI_MODE_RDONLY, MPI_MODE_RDWR etc. –info object, used to pass hints to open –file handle MPI_File_close(fh)

27 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 27 Utility routines MPI_File_delete MPI_File_set_size MPI_File_preallocate MPI_File_set_info

28 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 28 Query routines MPI_File_get_size MPI_File_get_group MPI_File_get_amode MPI_File_get_info MPI_File_get_view

29 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 29 Data access routines Positioning –Explicit, each call has an offset –Individual, each PE maintains an individual file pointer –Shared, the file pointer is maintained globally Synchronism –Blocking, routine returns when complete –Non-blocking, must call a termination routine to ensure completion Coordination –Non-collective –Collective

30 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 30 Summary of access routines

31 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 31 Summery of access routines (cont.) MPI_File_seek MPI_File_get_position MPI_File_get_byte_offset MPI_File_seek_shared (collective) MPI_File_get_position_shared

32 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 32 T3E Implementation No shared file pointers No non-blocking collective (split collective) SPR filed on non-blocking read Work in progress

33 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 33 Examples All the program fragments are available as working programs on the T3E Do “module load training”, then look in $EXAMPLES/mpi_io All examples are of a distributed dot product –initialize data with random numbers –compute dot product of whole vector –write out data into a shared file –read back in and check dot product PE 0PE 1PE 2

34 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 34 Naming convention First letter is positioning: explicit, individual, or shared Second letter is synchronism: blocking or non-blocking Third letter is coordination: non-collective or collective ebn.f90 is the explicit, blocking non-collective example There are several “ibn” examples dealing with different fileviews

35 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 35 Filetype Example Process 0 Process 1 Process 2

36 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 36 Filetype Example filemode = MPI_MODE_RDWR + MPI_MODE_CREATE call MPI_INFO_CREATE(finfo, ierr) call MPI_INFO_SET(finfo, 'access_style','write_mostly',ierr) call MPI_FILE_OPEN(MPI_COMM_WORLD, 'vector', filemode,& finfo, fhv, ierr) call MPI_TYPE_CREATE_SUBARRAY(1, m*nprocs, m, m*me,& MPI_ORDER_FORTRAN, MPI_REAL, mpi_fileslice, ierr) disp=0 call MPI_FILE_SET_VIEW(fhv, disp, MPI_REAL, mpi_fileslice,& 'native', MPI_INFO_NULL, ierr)

37 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 37 Individual, blocking, non-collective call MPI_FILE_WRITE(fhv, b, m, MPI_REAL, status, ierr) lresult=sdot(m, b, 1, b, 1) call MPI_REDUCE(lresult, result, 1, MPI_REAL, MPI_SUM, 0,& MPI_COMM_WORLD, ierr) if (me.eq.0) then write(6,*) 'dot product: ', result end if ! zero vector and read it back in b=0.0 disp=0 call MPI_FILE_SEEK(fhv, disp, MPI_SEEK_SET, ierr) call MPI_FILE_READ(fhv, b, m, MPI_REAL, status, ierr)

38 N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 38 Further Information on MPI I/O MPI-The Complete Reference –Volume 1, The MPI Core –Volume 2, The MPI Extensions


Download ppt "N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 I/O Strategies for the T3E Jonathan Carter NERSC User Services."

Similar presentations


Ads by Google