1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.

Slides:



Advertisements
Similar presentations
Globus FTP Evaluation test Catania – 10/04/2001Antonio Forte – INFN Torino.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
1 GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
GridFTP: File Transfer Protocol in Grid Computing Networks
Distributed components
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
11 DICOM Image Communication in Globus-Based Medical Grids Michal Vossberg, Thomas Tolxdorff, Associate Member, IEEE, and Dagmar Krefting Ting-Wei, Chen.
GridPP meeting Feb 03 R. Hughes-Jones Manchester WP7 Networking Richard Hughes-Jones.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
GridFTP Guy Warner, NeSC Training.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Descriptive Data Analysis of File Transfer Data Sudarshan Srinivasan Victor Hazlewood Gregory D. Peterson.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Standard FTP and GridFTP protocols for international data transfer in Pamela Satellite Space Experiment R. Esposito 1, P. Mastroserio 1, F. Taurino 1,2,
DISTRIBUTED COMPUTING
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Storage Tank in Data Grid Shin, SangYong(syshin, #6468) IBM Grid Computing August 23, 2003.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Optimization Flow Control—I: Basic Algorithm and Convergence Present : Li-der.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
LFC Replication Tests LCG 3D Workshop Barbara Martelli.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
An Introduction to GPFS
Mass Storage Systems for the Large Hadron Collider Experiments A novel approach based on IBM and INFN software A. Cavalli 1, S. Dal Pra 1, L. dell’Agnello.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Validation tests of CNAF storage infrastructure Luca dell’Agnello INFN-CNAF.
LCG 3D Distributed Deployment of Databases
Experiences with http/WebDAV protocols for data access in high throughput computing
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Introduction to Data Management in EGI
Grid Computing.
Thoughts on Computing Upgrade Activities
The INFN Tier-1 Storage Implementation
Research Data Archive - technology
TYPES OFF OPERATING SYSTEM
GridTorrent Framework: A High-performance Data Transfer and Data Sharing Framework for Scientific Computing.
Presentation transcript:

1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E. Ronchieri¹, V. Sapunenko¹, A. Sartirana², D. Vitlacil, S. Zani INFN CNAF, Bologna, Italy, GRIF, Paris, France, Abstract Many High Energy Physics experiments must transfer large volumes of data. Therefore, the maximization of data throughput is a key issue, requiring detailed analysis and setup optimization of the underlying infrastructure and services. In Grid computing, the data transfer protocol called GridFTP is widely used for efficiently transferring data in conjunction with various types of file systems. We focus on the interaction and performance issues in a setup, which combines GridFTP server with the IBM General Parallel File System (GPFS), adopted for providing storage management and capable of handling petabytes of data and billions of files. A typical issue is the size of the data blocks read from disk used by the GridFTP server version 2.3, which can potentially impair the data transfer threshold achievable with an IBM GPFS data block. We propose an experimental deployment of GridFTP server characterized by being on a Scientific Linux Cern 4 (SLC4) 64-bit platform, having GridFTP server and IBM GPFS over a Storage Area Network (SAN) infrastructure aimed to improve data throughput and to serve distributed remote Grid sites. We present the results of data- transfer measurements, such as CPU load, network utilization, data read and write rates, obtained performing several tests at INFN Tier1 where the described deployment has been setup. During this activity, we have verified a significant improvement of the GridFTP performances (of almost 50%) on SLC4 64-bit over SAN saturating the Gigabit with a very low CPU load. Conclusions Test Descriptions Several tests were performed with the aim to independently evaluate the performances of the different layer of a typical GPFS-based storage system. All test have been performed varying the values for the relevant parameters, as the number of parallel files transferred and the number of streams per file. Tests can be divided into three groups: Group 1: GPFS bare performance on a SAN node. This test simply measures the performance for a simple file copy on a node included in the SAN. Files can be read, written or contemporaneously read and written on the GPFS file system. Up to 5 parallel file copies were tested. Group 2: GridFTP performances on a SAN enabled FTP server. This test measures read (or write) performances of FTP transfers from (or to) GPFS storage on a FTP server included in the SAN. Up to 20 parallel files and 10 stream per file were tested. Group 3: FTP transfers among 2 SAN enabled FTP servers. This test measures the performances for FTP transfers on the 1Gb + 1Gb link between a couple of servers both included in the SAN. Both unidirectional and bidirectional transfers were tested, up to 20 parallel files and 10 stream per file. Tests were performed on a 32TB GPFS file system. Servers were SAN enabled SLC4 64-bit GridFTP (globus 2.3), 2 CPUs quad-core, 16GB of memory. Tests Group 1: results Tests Group 2: results Tests Group 3: results T IER - 1 The tests allowed to collect a lot of useful information on the behavior and performances in accessing a 'typical size' file on a GPFS storage by direct POSIX access or by FTP. On the left part of the poster you may find plots regarding some of the most relevant tests performed. Group 1: Tests from group 1 measured the bare read/write performance from a node included in the SAN. The GPFS showed unidirectional read/write performances up to MB/s. Contemporaneous reading and writing from the file system can be sustained at ~300 [MB/s]. The latter performance seems to smoothly decrease to 150 [MB/s] as the number of parallel files increase up to 5. Group 2: Tests from group 2 measured the read/write performance to/from a single FTP server included in the SAN. Performances vary from [MB/s] read/write rate with 1-2 parallel transfers down to [MB/s] with 5-10 parallel transfers. This seems to be fairly independent from the number of streams used in a single FTP transfer. Group 3: Tests from group 3 measured the transfers via LAN between two SAN nodes FTP servers (both reading and writing on the same GPFS file system). Unidirectional transfers from between the 2 servers can be sustained saturating the 1 Gb Ethernet link. This is independent from the number of parallel transfers and if stream per file. Bidirectional transfer among the two servers showed as well to be able to saturate the two 1 Gb network interfaces with a ~120 [Mb/s] read/write performance. The saturation actually takes place for 5 or more parallel transfers. With a single transfer the overall read/write rate is ~80 [MB/s]. The performance dependency on the number of parallel files can be explained by the usage of the operative system buffer: this needs further investigation. Network Throughput ≈ 0 [B/s] (GridFTP directly linked to FC) Network Throughput ≈ 0 [B/s] Type Tests NMAv. of gpfs thr. [MB] in w Av. of gpfs thr [MB] in r gpfs -> gpfs gpfs -> /tmp /tmp -> gpfs Fig. 1 - Average of GPFS throughput performance for "cp" write operations Fig. 2 – Average of GPFS throughput performance for “cp” read operations Fig. 3 – Average of GPFS throughput performance for globus-url-copy write operations Fig. 4 – Average of GPFS throughput performance for globus-url-copy read operations Fig. 5 – read/write GPFS throughput with 1,5,10 parallel transfers and 1 stream per transfer Fig. 6 – read/write GPFS throughput with 10 parallel transfers and 10 stream per transfer Fig. 7 – rec/send network throughput with 10 parallel transfers and 10 stream per transfer Fig. 8 – rec/send network throughput with 10 parallel transfers and 10 stream per transfer from/to the 2 GridFTP servers Fig. 9 – read/write GPTS throughput with 10 parallel transfers and 10 stream per transfer from/to the 2 GridFTP servers Fig. 10 – read/write GPFS throughput with 1,5,10 parallel transfers and 20 streams per transfer from/to the 2 GridFTP servers