Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.

Similar presentations


Presentation on theme: "Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago."— Presentation transcript:

1 Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago

2 What is GridFTP? l High-performance, reliable data transfer protocol optimized for high-bandwidth wide-area networks l Based on FTP protocol - defines extensions for high- performance operation and security l We supply a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries l Multiple independent implementations can interoperate u Fermi Lab and U. Virginia have home grown servers that work with ours.

3 GridFTP l Two channel protocol like FTP l Control Channel u Communication link (TCP) over which commands and responses flow u Low bandwidth; encrypted and integrity protected by default l Data Channel u Communication link(s) over which the actual data of interest flows u High Bandwidth; authenticated by default; encryption and integrity protection optional

4 Globus GridFTP l Performance u Parallel TCP streams u Non TCP protocol such as UDT u Order of magnitude greater l Cluster-to-cluster data movement u Another order of magnitude l Support for reliable and restartable transfers l Multiple security options u Anonymous, password, SSH, GSI l Modular and easy to optimize for various storage u HPSS, SRB

5 Cluster-to-Cluster transfers Control node Data node

6 Performance l Mem. transfer between Urbana, IL and San Diego, CA

7 Performance l Disk transfer between Urbana, IL and San Diego, CA

8 Users l HEP community is basing its entire tiered data movement infrastructure for the LHC computing Grid on GridFTP l Southern California Earthquake Center (SCEC), Laser Interferometer Gravitational Wave Observatory (LIGO), Earth Systems Grid (ESG) use GridFTP for data movement l European Space Agency, Disaster Recovery Center in Japan move large volumes of data using GridFTP l An average of more than 2 million data transfers happen with GridFTP every day

9 New Features l GUI client l SSH security for GridFTP l GridFTP over UDT l Pipelining l Multicasting / Overlay Routing l Scalability l Lotman Storage plugin l Anomaly and bottleneck detection using Netlogger

10 A GUI client for GridFTP l An alpha version is available at http://www.globus.org/cog/demo/ l Java web start application l Integrated with myproxy-logon u Certificates can be completely hidden from the user l If certificates are in place, proxy can be generated through the GUI l Provides support for RFT as well

11 SSH Security for GridFTP sshd Client GridFTP Server Port 22 ROOT USER ssh Stdin/out

12 SSH Security for GridFTP l Client support for using SSH is automatically enabled l On the server side (where you intend the client to remotely execute a server) u setup-globus-gridftp-sshftp -server l In order to use SSH as a security mechanism, the user must provide urls that begin with sshftp:// as arguments. u globus-url-copy sshftp:// : / file:/ u is the port in which sshd listens on the host referred to by (the default value is 22).

13 GridFTP over UDT l GridFTP uses XIO for network I/O operations l XIO presents a POSIX-like interface to many different protocol implementations GSI TCP Default GridFTP GridFTP over UDT GSI UDT

14 GridFTP over UDT Argonne to NZ Throughput in Mbit/s Argonne to LA Throughput in Mbit/s Iperf – 1 stream19.774.5 Iperf – 8 streams40.3117.0 GridFTP mem TCP – 1 stream16.463.8 GridFTP mem TCP – 8 streams40.2112.6 GridFTP disk TCP – 1 stream16.359.6 GridFTP disk TCP – 8 streams37.4102.4 GridFTP mem UDT179.3396.6 GridFTP disk UDT178.6428.3 UDT mem201.6432.5 UDT disk162.5230.0

15 Lots of Small Files (LOSF) Problem l Traditional transfer pattern SenderReceiver Client Send Receive Data ACK

16 Pipelining l Allow many outstanding transfer requests l Send next request before previous completes u Latency is overlapped with the data transfer l Backward compatible u Wire protocol doesn’t change u Client side sends commands sooner

17 Pipelining Traditional Pipelining l Significant performance improvement for LOSF File Request 1 File Request 2 File Request 3 DATA 1 DATA 2 DATA 3 ACK 1 ACK 2 ACK 3 File Request 1 File Request 2 File Request 3 DATA 1 DATA 2 DATA 3 ACK 1 ACK 2 ACK 3

18 Multicast / Overlay Routing l Enable GridFTP to transfer single data set to many locations or act as an intermediate routing node

19 Scalability l Data nodes can be added dynamically - need more throughput, add more data nodes Control node Data node

20 Storage Plugin l Destination storage might run out of space in the middle of a GridFTP transfer l Lotman - tool from univ. of wisconsin that manages storage l Developed plugin for GridFTP to interact with Lotman l Space availability (for individual file transfers) determined ahead of transfers to Lotman enabled storage

21 GridFTP with Lotman GridFTP Server Client Lotman

22 Anomaly and Bottleneck Detection using Netlogger l GridFTP server can be instrumented with Netlogger l Log messages which can be post processed using Netlogger tools l Fine grained disk and net I/O characteristics can then be visualized and analyzed

23 Reliable File Transfer Service ( RFT) l GridFTP - on demand transfer service u Not a queuing service l RFT - GridFTP client u Queues requests u Orchestrates transfers on client’s behalf u Third party transfers u Interacts with many GridFTP servers u Retry requests on failure u Recovers from GridFTP and RFT service failures

24 RFT RFT Service RFT Client SOAP Messages Notifications (Optional) GridFTP Server GridFTP Server CC DC Persistent Store

25 RFT - Connection Caching l Control channel connections (and thus the data channels associated with it) are cached to reuse later (by the same user) RFT Service GridFTP Server GridFTP Server CC DC

26 RFT - Connection Caching l Reusing connections eliminate authentication overhead on the control and data channels l Measured performance improvement for jobs submitted using Condor-G l For 500 jobs - each job requiring file stageIn, stageOut and cleanup (RFT tasks) u 30% improvement in overall performance u No timeout due to overwhelming connection requests to GridFTP servers


Download ppt "Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago."

Similar presentations


Ads by Google