Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:

Slides:



Advertisements
Similar presentations
Globus FTP Evaluation test Catania – 10/04/2001Antonio Forte – INFN Torino.
Advertisements

The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
Cross-site data transfer on TeraGrid using GridFTP TeraGrid06 Institute User Introduction to TeraGrid June 12 th by Krishna Muriki
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Data Grids Darshan R. Kapadia Gregor von Laszewski
GridFTP: File Transfer Protocol in Grid Computing Networks
Grid computing Globus GridFTP & Replica Management Robert Nickel BTU - Mathematik 01.Februar 2002.
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Basics Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
GridFTP Guy Warner, NeSC Training.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
10 May 2007 HTTP - - User data via HTTP(S) Andrew McNab University of Manchester.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
Grid Computing Environments Grid: a system supporting the coordinated resource sharing and problem-solving in dynamic, multi-institutional virtual organizations.
Data Intensive Computing on the Grid: Architecture & Technologies Presented by: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Globus – Part II Sathish Vadhiyar. Globus Information Service.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
7. Grid Computing Systems and Resource Management
Globus Presented by: Yayati Kasralikar for CPA 5937.
AERG 2007Grid Data Management1 Grid Data Management GridFTP Carolina León Carri Ben Clifford (OSG)
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
Data Management The European DataGrid Project Team
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
Current Globus Developments Jennifer Schopf, ANL.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
The Data Grid: Towards an architecture for Distributed Management
Evaluation of “data” grid tools
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors: Ian Foster, Carl Kesselman, Steve Tuecke, Ann Chervenak

The Globus Data Grid Two major components: 1. Data Transport and Access lCommon protocol lSecure, efficient, flexible, extensible data movement lFamily of tools supporting this protocol 2. Replica Management Architecture lSimple scheme for managing: lmultiple copies of files lcollections of files APIs, white papers:

GridFTP Data Transport and Access

GridFTP: Basic Approach l FTP is defined by several IETF RFCs l Start with most commonly used subset –Standard FTP: get/put etc., 3 rd -party transfer l Implement standard but often unused features –GSS binding, extended directory listing, simple restart l Extend in various ways, while preserving interoperability with existing servers –Parameter set/negotiate, parallel transfers (multiple TCP streams), striped transfers (multiple hosts), partial file transfers, automatic & manual TCP buffer setting, progress monitoring, extended restart

The GridFTP Family of Tools l Patches to existing FTP code –GSI-enabled versions of existing FTP client and server, for high-quality production code l Custom-developed libraries –Implement full GridFTP protocol, targeting custom use, high-performance l Custom-developed tools –Servers and clients with specialized functionality and performance

Family of Tools Patches to Existing Code l Patches to standard FTP clients and servers –gsi-ncftp: Widely used client –gsi-wuftpd: Widely used server –GSI modified HPSS pftpd –GSI modified Unitree ftpd l Provides high-quality, production ready, FTP clients and servers l Integration with common mass storage systems l Do not support the full GridFTP protocol

Family of Tools Custom Developed Libraries l Custom developed libraries –globus_ftp_control: Low level FTP driver >Client & server protocol and connection management –globus_ftp_client: Simple, reliable FTP client –globus_gass_copy: Simple URL-to-URL copy library, supporting (Grid-)ftp, http(s), file URLs l Implement full GridFTP protocol l Various levels of libraries, allowing implementation of custom clients and servers l Tuned for high performance on WAN

Family of Tools Custom Developed Programs l Simple production client –globus-url-copy: Simple URL-to-URL copy l Experimental FTP servers –Modified WUFTPD with parallel channels –Striped FTP server (ala.DPSS) –Firewall FTP proxy: Securely and efficiently allow transfers through firewalls

Replica Management Architecture

Replica Management l Maintain a mapping between logical names for files and collections and one or more physical locations l we define a replica to be a “managed copy of a file”. –The replica management system controls where and when copies are created, and provides information about where copies are located. However, the system does not make any statements about file consistency. In other words, it is possible for copies to get out of date with respect to one another, if a user chooses to modify a copy. l Based on the LDAP Protocol

A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk Array Disk Cache Application Replica Selection Multiple Locations NWS Selected Replica Performance Information and Predictions Replica Location 1Replica Location 2Replica Location 3 MDS Reliable Transport Reliable Replication

Replica Manager Components l Replica catalog definition –LDAP object classes for representing logical-to-physical mappings in an LDAP catalog l Low-level replica catalog API –globus_replica_catalog library –Manipulates replica catalog: add, delete, etc. –URL: l High-level reliable replication API –globus_replica_manager library –Combines calls to file transfer operations and calls to low-level API functions: create, destroy, etc.

Replica Catalog Structure: A Climate Modeling Example Logical File Parent Logical File Jan 1998 Logical Collection C02 measurements 1998 Replica Catalog Location jupiter.isi.edu Location sprite.llnl.gov Logical File Feb 1998 Size: Filename: Jan 1998 Filename: Feb 1998 … Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: GridFTP UrlConstructor: GridFTP://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical Collection C02 measurements 1999

Outstanding Issues l What write consistency should we support? l Methodology for handling updates l Access Control l Intermediate feedback required (callbacks) l Timing l Replicating the replica catalog l Replication of partial files l Alternate catalog views: files belong to more than one logical collection

Status l Grid FTP and Replica Catalog API and tools in alpha test l Applications with climate data, intended for production use. l Replica Management API under design l Grid based access control strategy under design

Globus Data-Intensive Services Architecture Replica Programs Library Program Legend globus-url-copy Custom Servers globus_gass_copy globus_ftp_client globus_ftp_control globus_commonGSI (security) globus_ioOpenLDAP client globus_replica_catalog globus_replica_manager Custom Clients globus_gass_transfer globus_gass Released In Alpha

The End