On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon.

Slides:

Advertisements

Similar presentations

A Proposal of Capacity and Performance Assured Storage in The PRAGMA Grid Testbed Yusuke Tanimura 1) Hidetaka Koie 1,2) Tomohiro Kudoh 1) Isao Kojima 1)

Advertisements

Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.

1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.

Parallel I/O A. Patra MAE 609/CE What is Parallel I/O ? zParallel processes need parallel input/output zIdeal: Processor consuming/producing data.

ManeFrame File Systems Workshop Jan 12-15, 2015 Amit H. Kumar Southern Methodist University.

By Ali Alskaykha PARALLEL VIRTUAL FILE SYSTEM PVFS PVFS Distributed File System:

Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.

Operating Systems Parallel Systems (Now basic OS knowledge)

What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.

How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.

File Systems (2). Readings r Silbershatz et al: 11.8.

Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Interposed Request Routing for Scalable Network Storage Darrell Anderson, Jeff Chase, and Amin Vahdat Department of Computer Science Duke University.

1 A Look at PVFS, a Parallel File System for Linux Will Arensman Anila Pillai.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

File System Benchmarking

Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.

1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.

Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.

Optimizing Performance of HPC Storage Systems

Small File File Systems USC Jim Pepin. Level Setting  Small files are ‘normal’ for lots of people Metadata substitute (lots of image data are done this.

Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Comparing Disk Benchmark Tools Chris Brew FNAL. Why? We are about to start testing a number of Storage Solutions We are about to start testing a number.

Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.

1 Interface Two most common types of interfaces –SCSI: Small Computer Systems Interface (servers and high-performance desktops) –IDE/ATA: Integrated Drive.

Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.

Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh

Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.

20-22 September 1999 HPSS User Forum, Santa Fe CERN IT/PDP 1 History  Test system HPSS 3.2 installation in Oct 1997 IBM AIX machines with IBM 3590 drives.

Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

Jacquard: Architecture and Application Performance Overview NERSC Users’ Group October 2005.

Large Scale Parallel File System and Cluster Management ICT, CAS.

DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

Server Performance, Scaling, Reliability and Configuration Norman White.

Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.

1 The NERSC Global File System NERSC June 12th, 2006.

CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, Sam Liston,

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.

KIT – University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association STEINBUCH CENTRE FOR COMPUTING - SCC

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.

© Copyright 2004 Instrumental, Inc I/O Types and Usage in DoD Henry Newman Instrumental, Inc/DOD HPCMP/DARPA HPCS May 24, 2004.

Parallel IO for Cluster Computing Tran, Van Hoai.

GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.

Background Computer System Architectures Computer System Software.

1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.

An Introduction to GPFS

New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,

EGEE is a project funded by the European Union under contract IST GPFS General Parallel File System INFN-GRID Technical Board – Bologna 1-2.

EGEE is a project funded by the European Union under contract IST Test di GPFS a Catania IV Workshop INFN Grid – Bari Ottobre

IRODS Advanced Features Michael Wan

GPFS Parallel File System

High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.

Compute and Storage For the Farm at Jlab

Parallel Virtual File System (PVFS) a.k.a. OrangeFS

Large Scale Test of a storage solution based on an Industry Standard

Distributed System Structures 16: Distributed Structures

Hadoop Technopoints.

Lecture 4: File-System Interface

Presentation transcript:

On evaluating GPFS Research work that has been done at HLRS by Alejandro Calderon

arcos.inf.uc3m.es HPC-Europa (HLRS) 2 On evaluating GPFS  Short description  Metadata evaluation fdtree  Bandwidth evaluation Bonnie Iozone IODD IOP

arcos.inf.uc3m.es HPC-Europa (HLRS) 3 GPFS description General Parallel File System (GPFS) is a parallel file system package developed by IBM. History: Originally developed for IBM's AIX operating system then ported to Linux Systems. Features: Appears to work just like a traditional UNIX file system from the user application level. Provides additional functionality and enhanced performance when accessed via parallel interfaces such as MPI-I/O. High performance is obtained by GPFS by striping data across multiple nodes and disks. Striping is performed automatically at the block level. Therefore, all files (larger than the designated block size) will be striped. Can be deployed in NSD or SAN configurations. Clusters hosting a GPFS file system can allow other clusters at different geographical locations to mount that file system.

arcos.inf.uc3m.es HPC-Europa (HLRS) 4 GPFS (Simple NSD Configuration)

arcos.inf.uc3m.es HPC-Europa (HLRS) 5 GPFS evaluation (metadata) fdtree  Used for testing the metadata performance of a file system  Create several directories and files, in several levels Used on:  Computers: noco-xyz  Storage systems: Local, GPFS

arcos.inf.uc3m.es HPC-Europa (HLRS) 6 fdtree [local,NFS,GPFS]

arcos.inf.uc3m.es HPC-Europa (HLRS) 7 fdtree on GPFS (Scenario 1) ssh {x,...} fdtree.bash -f 3 -d 5 -o /gpfs... Scenario 1:  several nodes,  several process per node,  different subtrees,  many small files P1 PmPm … nodex … … …

arcos.inf.uc3m.es HPC-Europa (HLRS) 8 fdtree on GPFS (scenario 1)

arcos.inf.uc3m.es HPC-Europa (HLRS) 9 fdtree on GPFS (Scenario 2) ssh {x,...} fdtree.bash -l 1 -d 1 -f s 500 -o /gpfs... Scenario 2:  several nodes,  one process per node,  same subtree,  many small files P1 PxPx … … nodex

arcos.inf.uc3m.es HPC-Europa (HLRS) 10 fdtree on GPFS (scenario 2)

arcos.inf.uc3m.es HPC-Europa (HLRS) 11 Metadata cache on GPFS ‘client’ hpc13782 noco186.nec 304$ time ls -als | wc -l 894 real 0m0.466s user 0m0.010s sys 0m0.052s Working in a GPFS directory with 894 entries ls –las need to get each file attribute from GPFS metadata server In a couple of seconds, the contents of the cache seams disappear hpc13782 noco186.nec 305$ time ls -als | wc -l 894 real 0m0.222s user 0m0.011s sys 0m0.064s hpc13782 noco186.nec 306$ time ls -als | wc -l 894 real 0m0.033s user 0m0.009s sys 0m0.025s hpc13782 noco186.nec 307$ time ls -als | wc -l 894 real 0m0.034s user 0m0.010s sys 0m0.024s

arcos.inf.uc3m.es HPC-Europa (HLRS) 12 fdtree results Main conclusions  Contention at directory level: If two o more process from a parallel application need to write data, please be sure each one use different subdirectories from GPFS workspace  Better results than NFS (but lower that the local file system)

arcos.inf.uc3m.es HPC-Europa (HLRS) 13 GPFS performance (bandwidth) Bonnie  Read and write a 2 GB file  Write, rewrite and read Used on:  Computers: Cacau1 Noco075  Storage systems: GPFS

arcos.inf.uc3m.es HPC-Europa (HLRS) 14 Bonnie on GPFS [write + re-write] GPFS over NFS

arcos.inf.uc3m.es HPC-Europa (HLRS) 15 Bonnie on GPFS [read] GPFS over NFS

arcos.inf.uc3m.es HPC-Europa (HLRS) 16 GPFS performance (bandwidth) Iozone  Write and read with several file size and access size  Write and read bandwidth Used on:  Computers: Noco075  Storage systems: GPFS

arcos.inf.uc3m.es HPC-Europa (HLRS) 17 Iozone on GPFS [write]

arcos.inf.uc3m.es HPC-Europa (HLRS) 18 Iozone on GPFS [read]

arcos.inf.uc3m.es HPC-Europa (HLRS) 19 GPFS evaluation (bandwidth) IODD  Evaluation of disk performance by using several nodes: disk and networking  A dd-like command that can be run from MPI Used on:  2, and 4 nodes, 4, 8, 16, and 32 process (1, 2, 3, and 4 per node) that write a file of 1, 2, 4, 8, 16, and 32 GB  By using both, POSIX interface and MPI-IO interface next ->

arcos.inf.uc3m.es HPC-Europa (HLRS) 20 How IODD works… a b.. n P1P2 PmPm … … nodex = 2, 4 nodes  processm = 4, 8, 16, and 32 process (1, 2, 3, 4 per node) file sizen = 1, 2, 4, 8, 16 and 32 GB nodex

arcos.inf.uc3m.es HPC-Europa (HLRS) 21 IODD on 2 nodes [MPI-IO]

arcos.inf.uc3m.es HPC-Europa (HLRS) 22 IODD on 4 nodes [MPI-IO]

arcos.inf.uc3m.es HPC-Europa (HLRS) 23 Differences by using different APIs GPFS (2 nodes, POSIX) GPFS (2 nodes, MPI-IO)

arcos.inf.uc3m.es HPC-Europa (HLRS) 24 IODD on 2 GB [MPI-IO, = directory]

arcos.inf.uc3m.es HPC-Europa (HLRS) 25 IODD on 2 GB [MPI-IO, ≠ directory]

arcos.inf.uc3m.es HPC-Europa (HLRS) 26 IODD results Main conclusions  The bandwidth decrease with the number of processes per node Beware of multithread application with medium-high I/O bandwidth requirements for each thread  It is very important to use MPI-IO because this API let users get more bandwidth  The bandwidth decrease with more than 4 nodes too With large files, the metadata management seams not to be the main bottleneck

arcos.inf.uc3m.es HPC-Europa (HLRS) 27 GPFS evaluation (bandwidth) IOP  Get the bandwidth obtained by writing and reading in parallel from several processes  The file size is divided between the process number so each process work in an independent part of the file Used on:  GPFS through MPI-IO (ROMIO on Open MPI)  Two nodes writing a 2 GB files in parallel On independent files (non-shared) On the same file (shared)

arcos.inf.uc3m.es HPC-Europa (HLRS) 28 How IOP works… 2 nodes  m = 2 process (1 per node) n = 2 GB file size a a..b b..x x.. P1P2 PmPm … … File per process (non-shared) a b.. x a b.. x … a b.. x P1P2 PmPm … Segmented access (shared) n n

arcos.inf.uc3m.es HPC-Europa (HLRS) 29 IOP: Differences by using shared/non-shared

arcos.inf.uc3m.es HPC-Europa (HLRS) 30 IOP: Differences by using shared/non-shared

arcos.inf.uc3m.es HPC-Europa (HLRS) 31 GPFS writing in non-shared files GPFS writing in a shared file

arcos.inf.uc3m.es HPC-Europa (HLRS) 32 GPFS writing in shared file: the 128 KB magic number

arcos.inf.uc3m.es HPC-Europa (HLRS) 33 IOP results Main conclusions  If several process try to write to the same file but on independent areas then the performance decrease  With several independent files results are similar on several tests, but with shared file are more irregular  Appears a magic number: 128 KB Seams that at that point the internal algorithm changes and it increases the bandwidth