Presentation is loading. Please wait.

Presentation is loading. Please wait.

On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon.

Similar presentations


Presentation on theme: "On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon."— Presentation transcript:

1 On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon

2 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 2 On evaluating GPFS  Short description  Metadata evaluation fdtree  Bandwidth evaluation Bonnie Iozone IODD IOP

3 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 3 GPFS description http://www.ncsa.uiuc.edu/UserInfo/Data/filesystems/index.html General Parallel File System (GPFS) is a parallel file system package developed by IBM. History: Originally developed for IBM's AIX operating system then ported to Linux Systems. Features: Appears to work just like a traditional UNIX file system from the user application level. Provides additional functionality and enhanced performance when accessed via parallel interfaces such as MPI-I/O. High performance is obtained by GPFS by striping data across multiple nodes and disks. Striping is performed automatically at the block level. Therefore, all files (larger than the designated block size) will be striped. Can be deployed in NSD or SAN configurations. Clusters hosting a GPFS file system can allow other clusters at different geographical locations to mount that file system.

4 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 4 GPFS (Simple NSD Configuration)

5 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 5 GPFS evaluation (metadata) fdtree  Used for testing the metadata performance of a file system  Create several directories and files, in several levels Used on:  Computers: noco-xyz  Storage systems: Local, GPFS

6 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 6 fdtree [local,NFS,GPFS]

7 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 7 fdtree on GPFS (Scenario 1) ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs... Scenario 1:  several nodes,  several process per node,  different subtrees,  many small files P1 PmPm … nodex … … …

8 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 8 fdtree on GPFS (scenario 1)

9 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 9 fdtree on GPFS (Scenario 2) ssh {x,...} fdtree.bash -l 1 -d 1 -f 1000 -s 500 -o /gpfs... Scenario 2:  several nodes,  one process per node,  same subtree,  many small files P1 PxPx … … nodex

10 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 10 fdtree on GPFS (scenario 2)

11 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 11 Metadata cache on GPFS ‘client’ hpc13782 noco186.nec 304$ time ls -als | wc -l 894 real 0m0.466s user 0m0.010s sys 0m0.052s Working in a GPFS directory with 894 entries ls –las need to get each file attribute from GPFS metadata server In a couple of seconds, the contents of the cache seams disappear hpc13782 noco186.nec 305$ time ls -als | wc -l 894 real 0m0.222s user 0m0.011s sys 0m0.064s hpc13782 noco186.nec 306$ time ls -als | wc -l 894 real 0m0.033s user 0m0.009s sys 0m0.025s hpc13782 noco186.nec 307$ time ls -als | wc -l 894 real 0m0.034s user 0m0.010s sys 0m0.024s

12 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 12 fdtree results Main conclusions  Contention at directory level: If two o more process from a parallel application need to write data, please be sure each one use different subdirectories from GPFS workspace  Better results than NFS (but lower that the local file system)

13 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 13 GPFS performance (bandwidth) Bonnie  Read and write a 2 GB file  Write, rewrite and read Used on:  Computers: Cacau1 Noco075  Storage systems: GPFS

14 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 14 Bonnie on GPFS [write + re-write] GPFS over NFS

15 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 15 Bonnie on GPFS [read] GPFS over NFS

16 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 16 GPFS performance (bandwidth) Iozone  Write and read with several file size and access size  Write and read bandwidth Used on:  Computers: Noco075  Storage systems: GPFS

17 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 17 Iozone on GPFS [write]

18 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 18 Iozone on GPFS [read]

19 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 19 GPFS evaluation (bandwidth) IODD  Evaluation of disk performance by using several nodes: disk and networking  A dd-like command that can be run from MPI Used on:  2, and 4 nodes, 4, 8, 16, and 32 process (1, 2, 3, and 4 per node) that write a file of 1, 2, 4, 8, 16, and 32 GB  By using both, POSIX interface and MPI-IO interface next ->

20 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 20 How IODD works… a b.. n P1P2 PmPm … … nodex = 2, 4 nodes  processm = 4, 8, 16, and 32 process (1, 2, 3, 4 per node) file sizen = 1, 2, 4, 8, 16 and 32 GB nodex

21 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 21 IODD on 2 nodes [MPI-IO]

22 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 22 IODD on 4 nodes [MPI-IO]

23 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 23 Differences by using different APIs GPFS (2 nodes, POSIX) GPFS (2 nodes, MPI-IO)

24 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 24 IODD on 2 GB [MPI-IO, = directory]

25 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 25 IODD on 2 GB [MPI-IO, ≠ directory]

26 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 26 IODD results Main conclusions  The bandwidth decrease with the number of processes per node Beware of multithread application with medium-high I/O bandwidth requirements for each thread  It is very important to use MPI-IO because this API let users get more bandwidth  The bandwidth decrease with more than 4 nodes too With large files, the metadata management seams not to be the main bottleneck

27 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 27 GPFS evaluation (bandwidth) IOP  Get the bandwidth obtained by writing and reading in parallel from several processes  The file size is divided between the process number so each process work in an independent part of the file Used on:  GPFS through MPI-IO (ROMIO on Open MPI)  Two nodes writing a 2 GB files in parallel On independent files (non-shared) On the same file (shared)

28 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 28 How IOP works… 2 nodes  m = 2 process (1 per node) n = 2 GB file size a a..b b..x x.. P1P2 PmPm … … File per process (non-shared) a b.. x a b.. x … a b.. x P1P2 PmPm … Segmented access (shared) n n

29 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 29 IOP: Differences by using shared/non-shared

30 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 30 IOP: Differences by using shared/non-shared

31 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 31 GPFS writing in non-shared files GPFS writing in a shared file

32 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 32 GPFS writing in shared file: the 128 KB magic number

33 acaldero @ arcos.inf.uc3m.es HPC-Europa (HLRS) 33 IOP results Main conclusions  If several process try to write to the same file but on independent areas then the performance decrease  With several independent files results are similar on several tests, but with shared file are more irregular  Appears a magic number: 128 KB Seams that at that point the internal algorithm changes and it increases the bandwidth


Download ppt "On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon."

Similar presentations


Ads by Google