Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:

Similar presentations


Presentation on theme: "Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:"— Presentation transcript:

1 Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors: Ian Foster, Carl Kesselman, Steve Tuecke, Ann Chervenak

2 The Globus Data Grid Two major components: 1. Data Transport and Access lCommon protocol lSecure, efficient, flexible, extensible data movement lFamily of tools supporting this protocol 2. Replica Management Architecture lSimple scheme for managing: lmultiple copies of files lcollections of files APIs, white papers: http://www.globus.org

3 GridFTP Data Transport and Access

4 GridFTP: Basic Approach l FTP is defined by several IETF RFCs l Start with most commonly used subset –Standard FTP: get/put etc., 3 rd -party transfer l Implement standard but often unused features –GSS binding, extended directory listing, simple restart l Extend in various ways, while preserving interoperability with existing servers –Parameter set/negotiate, parallel transfers (multiple TCP streams), striped transfers (multiple hosts), partial file transfers, automatic & manual TCP buffer setting, progress monitoring, extended restart

5 The GridFTP Family of Tools l Patches to existing FTP code –GSI-enabled versions of existing FTP client and server, for high-quality production code l Custom-developed libraries –Implement full GridFTP protocol, targeting custom use, high-performance l Custom-developed tools –Servers and clients with specialized functionality and performance

6 Family of Tools Patches to Existing Code l Patches to standard FTP clients and servers –gsi-ncftp: Widely used client –gsi-wuftpd: Widely used server –GSI modified HPSS pftpd –GSI modified Unitree ftpd l Provides high-quality, production ready, FTP clients and servers l Integration with common mass storage systems l Do not support the full GridFTP protocol

7 Family of Tools Custom Developed Libraries l Custom developed libraries –globus_ftp_control: Low level FTP driver >Client & server protocol and connection management –globus_ftp_client: Simple, reliable FTP client –globus_gass_copy: Simple URL-to-URL copy library, supporting (Grid-)ftp, http(s), file URLs l Implement full GridFTP protocol l Various levels of libraries, allowing implementation of custom clients and servers l Tuned for high performance on WAN

8 Family of Tools Custom Developed Programs l Simple production client –globus-url-copy: Simple URL-to-URL copy l Experimental FTP servers –Modified WUFTPD with parallel channels –Striped FTP server (ala.DPSS) –Firewall FTP proxy: Securely and efficiently allow transfers through firewalls

9 Replica Management Architecture

10 Replica Management l Maintain a mapping between logical names for files and collections and one or more physical locations l we define a replica to be a “managed copy of a file”. –The replica management system controls where and when copies are created, and provides information about where copies are located. However, the system does not make any statements about file consistency. In other words, it is possible for copies to get out of date with respect to one another, if a user chooses to modify a copy. l Based on the LDAP Protocol

11 A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk Array Disk Cache Application Replica Selection Multiple Locations NWS Selected Replica Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS Reliable Transport Reliable Replication

12 Replica Manager Components l Replica catalog definition –LDAP object classes for representing logical-to-physical mappings in an LDAP catalog l Low-level replica catalog API –globus_replica_catalog library –Manipulates replica catalog: add, delete, etc. –URL: http://www.globus.org l High-level reliable replication API –globus_replica_manager library –Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

13 Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: 1468762 Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: GridFTP UrlConstructor: GridFTP://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

14 Outstanding Issues l What write consistency should we support? l Methodology for handling updates l Access Control l Intermediate feedback required (callbacks) l Timing l Replicating the replica catalog l Replication of partial files l Alternate catalog views: files belong to more than one logical collection

15 Status l Grid FTP and Replica Catalog API and tools in alpha test l Applications with climate data, intended for production use. l Replica Management API under design l Grid based access control strategy under design

16 Globus Data-Intensive Services Architecture Replica Programs Library Program Legend globus-url-copy Custom Servers globus_gass_copy globus_ftp_client globus_ftp_control globus_commonGSI (security) globus_ioOpenLDAP client globus_replica_catalog globus_replica_manager Custom Clients globus_gass_transfer globus_gass Released In Alpha

17 The End


Download ppt "Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:"

Similar presentations


Ads by Google