Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.

Similar presentations


Presentation on theme: "Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial."— Presentation transcript:

1 Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group help@hdfgroup.org 5/19/20081SCICOMP 14 Tutorial

2 Outline Overview of Basic HDF5 concept Overview of Parallel HDF5 design MPI-IO vs. Parallel HDF5 Overview of Parallel HDF5 programming model The benefits of using Parallel HDF5 Situations where parallel HDF5 may not work well 5/19/20082SCICOMP 14 Tutorial

3 Overview of Basic HDF5 Concept 5/19/20083SCICOMP 14 Tutorial

4 What is HDF5? File format for managing any kind of data Software (library and tools) for accessing data in that format Especially suited for large and/or complex data collections Platform independent C, F90, C++, Java APIs 5/19/20084SCICOMP 14 Tutorial

5 Example HDF5 file “/” (root) “/foo” Raster image palette 3-D array 2-D array Raster image lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table 5/19/20085SCICOMP 14 Tutorial

6 compressed extendable Metadata for Fred Dataset “Fred” File A File B Data for Fred Special Storage Options chunked compressed extendable Split file Better subsetting Access time; extendable Improves storage efficiency, Transmission speed Arrays can be extended in any direction Metadata in one file, raw data in another

7 5/19/2008SCICOMP 14 Tutorial7 Virtual File I/O Layer Allows HDF5 format address space to map to disk, the network, memory, or a user-defined device Network File FamilyMPI I/OMemory Virtual file I/O drivers Memory Stdio File Family File “Storage” … …

8 Overview of Parallel HDF5 Design 5/19/20088SCICOMP 14 Tutorial

9 PHDF5 Requirements MPI programming PHDF5 files compatible with serial HDF5 files Shareable between different serial or parallel platforms Single file image to all processes One file per process design is undesirable Expensive post processing Not useable by different number of processes Standard parallel I/O interface Must be portable to different platforms 5/19/20089SCICOMP 14 Tutorial

10 PHDF5 Implementation Layers Application Parallel computing system (IBM AIX) Compute node I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Compute node Disk architecture & layout of data on disk PHDF5 built on top of standard MPI-IO API 5/19/200810SCICOMP 14 Tutorial

11 Parallel Environment Requirements MPI with MPI-IO. E.g., MPICH2 ROMIO Vendor’s MPI-IO: IBM,SGI etc. Parallel file system. E.g., GPFS Lustre 5/19/200811SCICOMP 14 Tutorial

12 MPI-IO vs. HDF5 MPI-IO is an Input/Output API. It treats the data file as a “linear byte stream” and each MPI application needs to provide its own file view and data representations to interpret those bytes. All data stored are machine dependent except the “external32” representation. External32 is defined in Big Endianness Little endian machines have to do the data conversion in both read or write operations. 64-bit sized data types may lose information. 5/19/200812SCICOMP 14 Tutorial

13 MPI-IO vs. HDF5 Cont. HDF5 is a self-described data management software. It stores the data and metadata according to the HDF5 data format definition. Each machine can store the data in its own native representation for efficient I/O. Any necessary data representation conversion is done by the HDF5 library automatically. 64-bit sized data types may not lose information. 5/19/200813SCICOMP 14 Tutorial

14 Programming Restrictions Most PHDF5 APIs are collective PHDF5 opens a parallel file with a communicator Returns a file-handle Future access to the file via the file-handle All processes must participate in collective PHDF5 APIs Different files can be opened via different communicators 5/19/200814SCICOMP 14 Tutorial

15 Examples of PHDF5 API Examples of PHDF5 collective API File operations: H5Fcreate, H5Fopen, H5Fclose Objects creation: H5Dcreate, H5Dopen, H5Dclose Objects structure: H5Dextend (increase dimension sizes) Array data transfer can be collective or independent Dataset operations: H5Dwrite, H5Dread 5/19/200815SCICOMP 14 Tutorial

16 PHDF5 API Languages C and F90 language interfaces Platforms supported: Most platforms with MPI-IO supported IBM SP, Linux clusters, Cray XT3, SGI Altix 5/19/200816SCICOMP 14 Tutorial

17 How to Compile PHDF5 Applications h5pcc – HDF5 C compiler command Similar to mpicc h5pfc – HDF5 F90 compiler command Similar to mpif90 To compile: % h5pcc h5prog.c % h5pfc h5prog.f90 5/19/200817SCICOMP 14 Tutorial

18 Overview of Parallel HDF5 Programming Model 5/19/200818SCICOMP 14 Tutorial

19 Creating and Accessing a File Programming model HDF5 uses access template object (property list) to control the file access mechanism General model to access HDF5 file in parallel: Setup MPI-IO access template (access property list) Open File Access Data Close File 5/19/200819SCICOMP 14 Tutorial

20 Parallel File Create ->36 plist_id = H5Pcreate(H5P_FILE_ACCESS); ->37 H5Pset_fapl_mpio(plist_id, comm, info); ->42 file_id = H5Fcreate(H5FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, plist_id); 49 /* 50 * Close the file. 51 */ 52 H5Fclose(file_id); 54 MPI_Finalize(); 5/19/200820SCICOMP 14 Tutorial

21 Writing and Reading Hyperslabs Programming model Distributed memory model: data is split among processes PHDF5 uses hyperslab model Each process defines memory and file hyperslabs Each process executes partial write/read call Collective calls Independent calls 5/19/200821SCICOMP 14 Tutorial

22 P0 P1 File Hyperslab Example Writing dataset by columns 5/19/200822SCICOMP 14 Tutorial

23 Writing Dataset by Column P1 P0 File Memory block[1] Block[0] P0 offset[1] P1 offset[1] stride[1] dimsm[0] dimsm[1] 5/19/200823SCICOMP 14 Tutorial

24 Writing Dataset by Column 85 /* 86 * Each process defines hyperslab in * the file 88 */ 89 count[0] = 1; 90 count[1] = dimsm[1]; 91 offset[0] = 0; 92 offset[1] = mpi_rank; 93 stride[0] = 1; 94 stride[1] = 2; 95 block[0] = dimsm[0]; 96 block[1] = 1; 97 98 /* 99 * Each process selects hyperslab. 100 */ 101 filespace = H5Dget_space(dset_id); 102 H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride, count, block); 5/19/200824SCICOMP 14 Tutorial

25 Writing Dataset by Column P1 P0 File Memory block[1] Block[0] P0 offset[1] P1 offset[1] stride[1] dimsm[0] dimsm[1] 5/19/200825SCICOMP 14 Tutorial

26 96 /* Create property list for collective dataset write. */ 98 plist_id = H5Pcreate(H5P_DATASET_XFER); ->99 H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE); 100 101 status = H5Dwrite(dset_id, H5T_NATIVE_INT, 102 memspace, filespace, plist_id, data); 104 H5Dclose(dset_id); Dataset collective Write 5/19/200826SCICOMP 14 Tutorial

27 My PHDF5 Application I/O is slow If my application I/O performance is slow, what can I do? Use larger I/O data sizes Independent vs. Collective I/O Specific I/O system hints Increase I/O bandwidth 5/19/200827SCICOMP 14 Tutorial

28 Independent Vs Collective Access User reported Independent data transfer mode was much slower than the Collective data transfer mode Data array was tall and thin: 230,000 rows by 6 columns : 230,000 rows : 5/19/200828SCICOMP 14 Tutorial

29 # of RowsData Size (MB) Independent (Sec.) Collective (Sec.) 163840.258.261.72 327680.5065.121.80 655361.00108.202.68 1229181.88276.573.11 1500002.29528.153.63 1803002.75881.394.12 Independent vs. Collective write 6 processes, IBM p-690, AIX, GPFS 5/19/200829SCICOMP 14 Tutorial

30 Independent vs. Collective write (cont.) 5/19/200830SCICOMP 14 Tutorial

31 Parallel Tools ph5diff Parallel version of the h5diff tool h5perf Performance measuring tools showing I/O performance for different I/O API 5/19/200831SCICOMP 14 Tutorial

32 ph5diff A parallel version of the h5diff tool Supports all features of h5diff An MPI parallel tool Manager process (proc 0) coordinates each the remaining processes (workers) to “diff” one dataset at a time; collects any output from each worker and prints them out. Works best if there are many datasets in the files with few differences. Available in v1.8. 5/19/200832SCICOMP 14 Tutorial

33 h5perf An I/O performance measurement tool Test 3 File I/O API POSIX I/O (open/write/read/close…) MPIO (MPI_File_{open,write,read,close}) PHDF5 H5Pset_fapl_mpio (using MPI-IO) H5Pset_fapl_mpiposix (using POSIX I/O) 5/19/200833SCICOMP 14 Tutorial

34 APIs that applications can use to achieve better performance H5Pset_dxpl_mpio_chunk_opt H5Pset_dxpl_mpio_chunk_opt_num H5Pset_dxpl_mpio_chunk_opt_ratio H5Pset_dxpl_mpio_collective_opt 5/19/200834SCICOMP 14 Tutorial

35 The benefits of Using HDF5 Self-describing Allow tools to access the file without knowledge of applications that produces it Flexible design Move between system architectures and between serial and parallel applications Flexible application control Application can choose the way on how to store data in HDF5 to achieve better performance Advanced features Support complex selections More user control for performance tuning 5/19/200835SCICOMP 14 Tutorial

36 Situations where parallel HDF5 may not work well We don’t support BlueGene Misuse HDF5 can cause bad performance Chunking storage http://www.hdfgroup.uiuc.edu/papers/papers/ParallelIO/HDF5- CollectiveChunkIO.pdf Collective IO 5/19/200836SCICOMP 14 Tutorial

37 Questions? 5/19/200837SCICOMP 14 Tutorial

38 Questions for audiences Any suggestions on general improvement or new features for parallel HDF5 support? Any suggestions on other tools for parallel HDF5? 5/19/200838SCICOMP 14 Tutorial

39 Useful Parallel HDF Links Parallel HDF information site http://hdfgroup.org/HDF5/PHDF5/ Parallel HDF5 tutorial available at http://hdfgroup.org/HDF5/Tutor/ HDF Help email address help@hdfgroup.org 5/19/200839SCICOMP 14 Tutorial


Download ppt "Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial."

Similar presentations


Ads by Google