Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.

Similar presentations

Presentation on theme: "Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003."— Presentation transcript:

1 Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003

2 Sources of Information / Support l Me –definitive source of information about GridFTP –Responsible for requirements gathering, feature prioritization, getting developer resources, directing the development work, etc.. l –Extensive archive that is worth searching –GridFTP developers monitor it and are good about answering, but not required. l Bugzilla –Used for submitting bugs

3 GridFTP Feature Set –GSI, Kerberos security –Third-party transfers –Parameter set/negotiate –Partial file access –Reliability/restart –Large file support –Data channel reuse –Defacto Standard on the Grid –Integrated instrumentation –Logging/audit trail –Parallel transfers –Striping –TCP Buffer size control –Policy-based access control –Server-side computation –Based on Standards

4 GridFTP at SC2000: Long-Running Dallas-Chicago Transfer SciNet Power Failure Other demos starting up (Congestion) Parallelism Increases (Demos) Backbone problems on the SC Floor DNS Problems Transition between files (not zero due to averaging)

5 Reliable File Transfer l Note that I said any *REMOTE* resource can fail l Local failure would mean loss of state since it is held in the clients memory. l Could modify the restart plug-in to write state to disk. l We opted for a service that accepts data transfer jobs and uses a database. l This provides increased robustness AND allows a client to initiate a long running job and not have to tie up the local computer to keep it running. l We call this server the Reliable File Transfer (RFT) service l One test ran 54 hours, moved 0.3 TB, and survived muliple failures both natural and intentional

6 GridFTP: Standards Based l Existing standards –RFC 949: File Transfer Protocol –RFC 2228: FTP Security Extensions –RFC 2389: Feature Negotiation for the File Transfer Protocol –Draft: FTP Extensions l New drafts –GridFTP: Protocol Extensions to FTP for the Grid >Grid Forum GridFTP Working Group >Submitted for public comment

7 GridFTP: Future Work l New Server Beta in August (wuftp replacement) w/ transport and security l Striping functionality and HPSS released in Q1/Q2 2004 with HPSS 5.2b and logging. l Other features based on demand. l Improved testing and documentation l Inclusion of Protocol extensions from GGF l Interface in server for policy engine. I.e., allocate one stripe per 100MB of file size l New web services control channel protocol l Utilization of Non-TCP network protocols l Bandwidth Limiting

8 Basic Layout of GridFTP for HPSS

9 eXtensible IO Library (xio) l Abstract away the transport layer l Define standard function signatures for Read/Write/Open/Close l Two types of drivers: transport and transform l Transport has to be the first pushed on the stack l Can have an arbitrary number of transform drivers

10 Transform Driver Example (gsi) l Open does the authentication and if specified via an attribute, delegation. l Read/Write could be a simple pass through or if requested might do encryption or integrity. l Close in this case is a no-op. l Kerberos *should* be easier. Simply pop gsi and push kerberos.

11 Planned xio drivers l Basics: TCP, UDP, file, gsi l GridFTP: Make it simple for an application to access files under the control of a GridFTP server. –Note that xio drivers can call xio drivers: The GridFTP driver will call sockets which will call TCP l MultiStream Data Channel Protocol l HTTP l SABUL l Rate Limiting

12 Transport Stack in Globus Reliable File Transfer Service New GridFTP Server Extensible IO System (under all of Globus) Client / User App can poke down the stack as necessary

13 Replica Management

14 Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location Location Logical File Feb 1998 Size: 1468762 Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsi nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: pub/pcmdi Logical Collection C02 measurements 1999

15 A Replica Location Service l A Replica Location Service (RLS) is a distributed registry service that records the locations of data copies and allows discovery of replicas l Maintains mappings between logical identifiers and target names –Physical targets: Map to exact locations of replicated data –Logical targets: Map to another layer of logical names, allowing storage systems to move data without informing the RLS l RLS was designed and implemented in a collaboration between the Globus project and the DataGrid project

16 LRC RLI LRC Replica Location Indexes Local Replica Catalogs LRCs contain consistent information about logical-to- target mappings on a site RLIs nodes aggregate information about LRCs Soft state updates from LRCs to RLIs: relaxed consistency of index information, used to rebuild index after failures Arbitrary levels of RLI hierarchy

17 A Flexible RLS Framework Five elements: 1. Consistent Local State: Records mappings between logical names and target names and answers queries 2. Global State with relaxed consistency: Global index supports discovery of replicas at multiple sites; relaxed consistency 3. Soft state mechanisms for maintaining global state: LRCs send information about their mappings (state) to RLIs using soft state protocols 4. Compression of state updates (optional): reduce communication and storage overheads 5. Membership service: for location of participating LRCs and RLIs and dealing with changes in membership

18 LRC RLI LRC Replica Location Indexes Local Replica Catalogs An RLS with No Redundancy, Partitioning of Index by Storage Sites

19 An RLS with Redundancy

20 Replica Location Service In Context l The Replica Location Service is one component in a layered data management architecture l Provides a simple, distributed registry of mappings l Consistency management provided by higher-level services

21 Components of RLS Implementation l Front-End Server – Multi-threaded – Supports GSI Authentication – Common implementation for LRC and RLI l Back-end Server – mySQL Relational Database – Holds logical name to target name mappings l Client APIs: C and Java

22 Implementation Features l Two types of soft state updates from LRCs to RLIs – Complete list of logical names registered in LRC – Bloom filter summaries of LRC l User-defined attributes – May be associated with logical or target names l Partitioning – Divide LRC soft state updates among RLI index nodes using pattern matching of logical names l Membership service –Static configuration only –Eventually use OGSA registration techniques

Download ppt "Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003."

Similar presentations

Ads by Google