Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,

Similar presentations


Presentation on theme: "The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,"— Presentation transcript:

1 The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.

2 12/04/07National Tsing Hua University2 Outline l Motivation l GridFTP Protocol l Globus GridFTP design and architecture l Performance l New features l Summary

3 12/04/07National Tsing Hua University3 Motivation

4 12/04/07National Tsing Hua University4 Motivation l Science is increasingly data-driven l Geographically distributed communities of scientists need to access and analyze large amounts of data u Simulation science applications such as climate modeling u Experimental science applications such as high-energy physics l Rapid increase in raw capacity of wide area network - feasible to move large amounts of data across WAN

5 12/04/07National Tsing Hua University5 Motivation l NSF TeraGrid links large clusters and storage systems at nine sites u With a network providing up to 30 Gbit/s l In principle, move data across this network at > 3 Gbyte/s or 10 Tbyte/hr l In practice orchestration of such transfers is technically challenging u Exploit parallelism in multiple dimension u Deal with failures of various sorts

6 12/04/07National Tsing Hua University6 Motivation l Effective end-to-end data transfers thus demand a systems approach u File systems, computers, network interfaces, network protocols - managed in an integrated fashion to meet performance & robustness goals. l These considerations motivate the work that I describe here u Design, implementation & evaluation of a modular & extensible data transfer system suitable for wide area and high-performance environments.

7 12/04/07National Tsing Hua University7 The GridFTP Protocol

8 12/04/07National Tsing Hua University8 What is GridFTP? l A secure, robust, fast, efficient, standards based, widely accepted data transfer protocol l A Protocol u Multiple Independent implementation can interoperate l This works. Fermi Lab has an implementation with their DCache system and U. Virginia has a.Net implementation that work with ours. l Lots of people have developed clients independent of the Globus Project. l The Globus Toolkit supplies a reference implementation: u Server u Client tools (globus-url-copy) u Development Libraries

9 12/04/07National Tsing Hua University9 GridFTP: The Protocol l Existing standards u RFC 959: File Transfer Protocol u RFC 2228: FTP Security Extensions u RFC 2389: Feature Negotiation for the File Transfer Protocol u Draft: FTP Extensions u GridFTP: Protocol Extensions to FTP for the Grid l Grid Forum Recommendation l GFD.20 l http://www.ggf.org/documents/GWD-R/GFD-R.020.pdf

10 12/04/07National Tsing Hua University10 Understanding GridFTP l GridFTP (and normal FTP) use (at least) two separate socket connections l Control Channel u Command/Response u Basic file system operations eg. mkdir, delete etc u Used to establish data channels l Data channel u Pathway over which file is transferred u Many different underlying protocols can be used l MODE command determines the protocol

11 12/04/07National Tsing Hua University11 Simple Two Party Transfer l GridFTP (and normal FTP) has 3 distinct components: u Client and server protocol interpreters which handle control channel protocol u Data Transfer Process which handles the accessing of actual data and its movement via the data channel DTPServer PI Client PI DTP Standard FTP Client Standard FTP Server

12 12/04/07National Tsing Hua University12 Simple Third Party Transfer l Client initiates data transfer between 2 servers l Client forms CC with 2 servers. l Commands routed through the client to establish DC between the two servers. l Data flows directly between servers u Client is notified by each server PI when the transfer is complete DTPServer PI DTPServer PI Client PI

13 12/04/07National Tsing Hua University13 Control Channel Establishment l Server listens on a well-known port (2811) l Client form a TCP Connection to server l Banner message l Authentication u Anonymous u Clear text USER /PASS u Base 64 encoded GSI handshake - Grid Security Infrastructure based on PKI l Accepted / Rejected

14 12/04/07National Tsing Hua University14 Data Channel Establishment l PASV command u Sent to the passive side of the transfer u Listen for a connection and reply with the contact information (IP address and port) l PORT IP,PORT u Actively establish a connection to a given passive listener

15 12/04/07National Tsing Hua University15 Data Channel Establishment DTP Server PI Client PI Connect IP:PORT

16 12/04/07National Tsing Hua University16 Data Channel Protocols l MODE Command u Allows the client to select the data channel protocol l MODE S u Stream mode, no framing u Legacy RFC959 l MODE E u GridFTP extension u Parallel TCP streams u Data channel caching Descriptor Size Offset (8 bits) (64 bits) (64 bits)

17 12/04/07National Tsing Hua University17 What issues are we addressing? l Striping u Storage systems are often clusters, and we need to be able to utilize all of that parallelism l Collective Operations u Essentially, the striping should be invisible to the outside world l Uniform interface u Ideally, any data source can be treated the same way

18 12/04/07National Tsing Hua University18 What issues are we addressing? l Network Protocol Independence u TCP has well known issues with high Bandwidth-Delay Product networks u Need to be able to take advantage of aggressive protocols on circuits. l Diverse Failure Modes u Much happening under the covers, so must be resilient to failures l End-to-End Performance u We need to be able to manage performance for a wide range of resources

19 12/04/07National Tsing Hua University19 What did the GridFTP protocol add? l Extended Block Mode u Data is sent in “packets” with a header containing a 64 bit offset and length u Allows out-of-order reception of packets l Restart and Performance Markers u Allows for robust restart and perf monitoring l SPAS/SPOR u Striped PASV and striped PORT u Allows a list of IP/ports to be returned

20 12/04/07National Tsing Hua University20 What did the GridFTP Protocol add? l Data Channel Authentication u Needed since in third party transfer, you don’t know who will connect to the listener. l ESTO/ERET u Allows for additional processing on the data prior to storage/transmission u We use this for partial file transfers l SBUF/ABUF u Manual and automatic TCP buffer tuning l Options to set parallelism/striping parameters

21 12/04/07National Tsing Hua University21 Parallelism vs Striping

22 12/04/07National Tsing Hua University22 Architecture / Design of our Implementation

23 12/04/07National Tsing Hua University23 Data Channel Server PI DTP Description of transfer: completely server-internal communication. Protocol is unspecified and left up to the implementation. Server PI DTP Internal IPC API Client PI Info on transfer: restart markers, performance markers, etc. Server PI optionally processes these, then sends them to the client PI Control Channels Overall Architecture

24 12/04/07National Tsing Hua University24 Possible Configurations Control Data Typical Installation Control Data Separate Processes Striped Server Control Data Control Striped Server (future) Data

25 12/04/07National Tsing Hua University25 Data Storage Interface Data Processing Module Data Channel Protocol Module Data source or sink Data channel Data Transfer Processor

26 12/04/07National Tsing Hua University26 Data Storage Interface l This is a very powerful abstraction l Several can be available and loaded dynamically via the ERET/ESTO commands l Anything that can implement the interface can be accessed via the GridFTP protocol l We have implemented u POSIX file (used for performance testing) u HPSS (tape system; IBM) u Storage Resource Broker (SRB; SDSC) u NeST (disk space reservation; UWis/Condor)

27 12/04/07National Tsing Hua University27 Globus Extensible IO (XIO) System l XIO framework presents a standard open/close/read/write interface to many different protocol implementations u including TCP, UDP, HTTP -- and now UDT l The protocol implementations are called drivers. u A driver can be dynamically loaded and stacked by any Globus XIO application.

28 12/04/07National Tsing Hua University28 Network Protocol Globus XIO Approach Application Disk Network Protocol Special Device Globus XIO Driver

29 12/04/07National Tsing Hua University29 Globus XIO Framework l Moves the data from user to driver stack. l Manages the interactions between drivers. l Assist in the creation of drivers. u Asynchronous support. u Close and EOF Barriers. u Error checking u Internal API for passing operations down the stack. User API Framework Driver Stack Transform Transport

30 12/04/07National Tsing Hua University30 Performance Results

31 12/04/07National Tsing Hua University31 Experimental results l Three settings u LAN - 0.2 msec RTT and 622 Mbit/s u MAN - 2.2 msec RTT and 1 Gbit/s u WAN - 60 msec RTT and 30 Gbit/s u MAN - Distributed Optical Testbed in the Chicago area u WAN - TeraGrid link between NCSA in Illinois and SDSC in California - each individual has a 1Gbit/s bottleneck link

32 12/04/07National Tsing Hua University32 Experimental results - LAN

33 12/04/07National Tsing Hua University33 Experimental results - MAN

34 12/04/07National Tsing Hua University34 Experimental results - WAN

35 12/04/07National Tsing Hua University35 Memory to Memory Striping Performance

36 12/04/07National Tsing Hua University36 Disk to Disk Striping Performance

37 12/04/07National Tsing Hua University37 Scalability tests l Evaluate performance as a function of the number of clients l DiPerf test framework to deploy the clients l Ran server on a 2-processor 1125 MHz x86 machine running Linux 2.6.8.1 u 1.5 GB memory and 2 GB swap space u 1 Gbit/s Ethernet network connection and 1500 B MTU l Clients created on hosts distributed over PlanetLab and at the University of Chicago (UofC)

38 12/04/07National Tsing Hua University38 Scalability Results Left axis - load, response time, memory allocated Right axis - Throughput and CPU %

39 12/04/07National Tsing Hua University39 Scalability results l 1800 clients mapped in a round robin fashion on 100 PlanetLab hosts and 30 UofC hosts l A new client created once a second and ran for 2400 seconds u During this time, repeatedly requests the transfer of 10 Mbyte file from server’s disk to client’s /dev/null l Total of 150.7 Gbytes transferred in 15,428 transfers

40 12/04/07National Tsing Hua University40 Scalability results l Server sustained 1800 concurrent requests with 70% CPU and 0.94 Mbyte memory per request l CPU usage, throughput, response time remain reasonable even when allocated memory exceeds physical memory u Meaning paging is occuring

41 12/04/07National Tsing Hua University41 New features

42 12/04/07National Tsing Hua University42 SSH security mechanism for GridFTP l GridFTP traditionally uses GSI for establishing secure connections l In some situations, preferable to use SSH security mechanism l Leverages the fact that an SSH client can remotely execute programs by forming a secure connection with SSHD

43 12/04/07National Tsing Hua University43 SSHFTP Interactions sshd CPI GridFTP Server 2811 Port 22 ROOT USER ssh Stdin/out

44 12/04/07National Tsing Hua University44 GridFTP over UDT l UDT is an application-level data transport protocol that uses UDP to transfer data l Implement its own reliability and congestion control mechanisms l Achieves good performance on high-bandwidth, high-delay networks where TCP has significant limitations l GridFTP uses Globus XIO interface to invoke network I/O operations u Created an XIO driver for UDT reference implementation u Enabled GridFTP to use it as an alternate transport protocol

45 12/04/07National Tsing Hua University45 GridFTP over UDT Argonne to NZ Throughput in Mbit/s Argonne to LA Throughput in Mbit/s Iperf – 1 stream19.774.5 Iperf – 8 streams40.3117.0 GridFTP mem TCP – 1 stream16.463.8 GridFTP mem TCP – 8 streams40.2112.6 GridFTP disk TCP – 1 stream16.359.6 GridFTP disk TCP – 8 streams37.4102.4 GridFTP mem UDT179.3396.6 GridFTP disk UDT178.6428.3 UDT mem201.6432.5 UDT disk162.5230.0

46 12/04/07National Tsing Hua University46 Summary l The GridFTP protocol provides a good set of features for data movement requirements in the Grid. l The Globus implementation of this protocol provides a flexible design / architecture for integrating with different communities, storage systems, and protocols. l Our implementation is robust and performant over a range of environments.

47 12/04/07National Tsing Hua University47 Questions?


Download ppt "The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,"

Similar presentations


Ads by Google