GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid

Slides:



Advertisements
Similar presentations
The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Advertisements

1 Reliable File Transfer Service Ravi K Madduri Argonne National Laboratory, University of Chicago.
GridFTP Challenges In Data Transport John Bresnahan Argonne National Laboratory The University of Chicago.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Data Grids Darshan R. Kapadia Gregor von Laszewski
GridFTP: File Transfer Protocol in Grid Computing Networks
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Grid Services at NERSC Shreyas Cholia Open Software and Programming Group, NERSC NERSC User Group Meeting September 17, 2007.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
ORNL is managed by UT-Battelle for the US Department of Energy Globus: Proxy Lifetime Endpoint Lifetime Oak Ridge Leadership Computing Facility.
GridFTP Guy Warner, NeSC Training.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
Reliable Data Movement Framework for Distributed Science Environments Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Why GridFTP? l Performance u Parallel TCP streams, optimal TCP buffer u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster.
Secure, Collaborative, Web Service enabled and Bittorrent Inspired High-speed Scientific Data Transfer Framework.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Reliable Data Movement Framework for Distributed Petascale Science Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
High Performance GridFTP Transport of Earth System Grid (ESG) Data 1 Center for Enabling Distributed Petascale Science.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Reliable File Transfer: Lessons Learned Bill Allcock, ANL Ravi Madduri, ANL.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
EMI is partially funded by the European Commission under Grant Agreement RI Common Authentication Library Daniel Kouril, for the CaNL PT EGI CF.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
FILE TRANSFER SPEEDS OVER HTTP AND FTP Yibiao Li 06/01/2009 Christmas Meeting 2008/09.
Computing Clusters, Grids and Clouds Globus data service
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Enabling High Speed Data Transfer in High Energy Physics
Part Three: Data Management
Presentation transcript:

GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid Wantao Liu1,2 Raj Kettimuthu2,3, Brian Tieman3, Ravi Madduri2,3, Bo Li1, and Ian Foster2,3 1Beihang University, Beijing, China 2The University of Chicago, Chicago, USA 3Argonne National Laboratory, Argonne, USA

Outline GridFTP overview GridFTP Challenges Commonly used GridFTP clients Zero configure GUI client Experimental results

GridFTP A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol We also supply a reference implementation: Server Client tools (globus-url-copy) Development Libraries Multiple independent implementations can interoperate University of Virginia and Fermi Lab have home grown servers that work with ours. Lots of people have developed clients independent of the Globus Project. Widely used high performance data movement protocol. Based on FTP protocol - defines extensions for high-performance operation and security. Standardized through Open Grid Forum (OGF)

GridFTP Two channel protocol like FTP Control Channel Data Channel Communication link (TCP) over which commands and responses flow Low bandwidth; encrypted and integrity protected by default Data Channel Communication link(s) over which the actual data of interest flows High Bandwidth; authenticated by default; encryption and integrity protection optional Supports both client/server transfers and 3rd party transfers (remote client can initiate a transfer between 2 servers)

Striping GridFTP offers a powerful feature called striped transfers (cluster-to-cluster transfers) If each node at the source and destination has a 1 Gbps NIC and if the capacity of the network connecting the source and destination cluster is on the order of 10 Gbps, this configuration can be used to achieve throughputs close to 10 Gbps

GridFTP Servers Around the World Created by Lydia Prieto ; G. Zarrate; Anda Imanitchi (Florida State University) using MaxMind's GeoIP technology (http://www.maxmind.com/app/ip-locate).

GridFTP in production Many Scientific communities rely on GridFTP High Energy Physics – tiered data movement infrastructure for the LHC computing Grid LIGO routinely uses GridFTP to move 1 TB a day Southern California Earthquake Center (SCEC), Earth Systems Grid (ESG), Relativistic Heavy Ion Collider (RHIC), European Space Agency, BBC use GridFTP for data movement GridFTP facilitates an average of more than 5 million data transfers every day

Challenges Past success Current and future Standard – big selling point for adoption Throughput – GridFTP was sold on speed Robustness – has to work all the time Current and future Ease-of-use Zero configuration clients Firewall Scalable Extensible

Globus-url-copy globus-url-copy [options] srcURL dstURL Commonly used command line scriptable client globus-url-copy [options] srcURL dstURL URL format - protocol://[user:pass@][host]/path Users can do client/server and 3rd party transfers using globus-url-copy

Other clients UberFTP Reliable file transfer service Custom clients using globus C and Java client libraries All these clients require non-trivial configuration Security setup None of these clients provide graphical user interface

GridFTP GUI Zero configuration Fault tolerant Drag and drop Zero configuration Integrated with myproxy Automatically trusts the CAs part of IGTF distribution Fault tolerant Transfer status monitoring Optimized for performance

Snapshot of the GUI

Fault tolerant Better fault tolerance than other GridFTP clients Like other clients, GUI can recover from transient server and network failures Globus-url-copy can not recover from its own failures GUI can recover from its own failures Unlike RFT, stores information on the local file system

Lots of small files Scientific experiments produce huge volume of data the individual file size is modest, on the order of kilobytes or megabytes hundreds of thousands of files to transfer every day the size of the entire dataset is tremendous, from hundreds of gigabytes to hundreds of terabytes

Advanced Photon Source Advanced Photon Source at Argonne dozens of samples may be acquired for one experiment every day each sample generates about 2,000 raw data files after processing, each sample produces additional 2,000 reconstructed files each file is 8 to 16 MB in size

Lots of small files Transfer threads pool Move multiple files concurrently Maximize the utilization of network bandwidth Improve the transfer performance Two windows for status information Directory window lists all directories and their transfer status File window lists all files under the active directory

Experiment Setup We conducted all of our experiments using TeraGrid NCSA nodes and the University of Chicago nodes GridFTP GUI is compared with scp and globus-url-copy TCP is configured as the underlying data transport protocol

Experiment Results The time consumed to transfer a single file Eight file sizes were used: 1 MB, 5 MB, 10 MB, 50 MB, 100 MB, 300 MB, 500 MB, and 1000 MB Data was moved from NCSA to the University of Chicago

Experiment Results(cont.) Data transfer time of a directory with tens of thousands of small files Each file is 1 MB Directories of 10,000, 20,000, 30,000, 40,000 and 50,000 files are created Data flowed from the University of Chicago to NCSA

Questions