Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.

Slides:



Advertisements
Similar presentations
Intro to GridFTP John Bresnahan. CCI DPI Components Client Control Channel (CC) Path between client and server used to exchange all information needed.
Advertisements

The Globus Striped GridFTP Framework and Server Bill Allcock 1 (presenting) John Bresnahan 1 Raj Kettimuthu 1 Mike Link 2 Catalin Dumitrescu 2 Ioan Raicu.
October Dyalog File Server Version 2.0 Morten Kromberg CTO, Dyalog LTD Dyalog’13.
Chapter 17 Networking Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
GridFTP: File Transfer Protocol in Grid Computing Networks
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
Technical Architectures
Socket Programming.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
FTP File Transfer Protocol. Introduction transfer file to/from remote host client/server model  client: side that initiates transfer (either to/from.
GridFTP Guy Warner, NeSC Training.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
Part Three: Data Management 3: Data Management A: Data Management — The Problem B: Moving Data on the Grid FTP, SCP GridFTP, UberFTP globus-URL-copy.
Protocol Architectures. Simple Protocol Architecture Not an actual architecture, but a model for how they work Similar to “pseudocode,” used for teaching.
Globus Striped GridFTP Framework and Server Raj Kettimuthu, ANL and U. Chicago.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus GridFTP: What’s New in 2007 Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
Update on GridFTP-Lite Bill Allcock, ANL BNL Network Research PI Meeting 29 September, 2005.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Lecture 15 Introduction to Web Services Web Service Applications.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
December 1, 2005HDF & HDF-EOS Workshop IX P eter Cao, NCSA December 1, 2005 Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
The Globus GridFTP Framework and Server John Bresnahan, Mike Link and Raj Kettimuthu (Presenting) Math & Computer Science Division, Argonne National Laboratory,
File and Object Replication in Data Grids Chin-Yi Tsai.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
Globus GridFTP and RFT: An Overview and New Features Raj Kettimuthu Argonne National Laboratory and The University of Chicago.
UDT as an Alternative Transport Protocol for GridFTP Raj Kettimuthu Argonne National Laboratory The University of Chicago.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
The Socket Interface Chapter 21. Application Program Interface (API) Interface used between application programs and TCP/IP protocols Interface used between.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
GridFTP GUI: An Easy and Efficient Way to Transfer Data in Grid
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
GridFTP Richard Hopkins
Data Transport in the Grid Bill Allcock, ANL UvA Masters Class 15 September, 2005.
SOCKS By BITSnBYTES (Bhargavi, Maya, Priya, Rajini and Shruti)
A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC.
The Globus eXtensible Input/Output System (XIO): A protocol independent IO system for the Grid Bill Allcock, John Bresnahan, Raj Kettimuthu and Joe Link.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
FTP Client API FTP in embedded devices Implementing an FTP Client FTP Command APIs Other FTP Client APIs.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
GridFTP Guy Warner, NeSC Training Team.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
New Development Efforts in GridFTP Raj Kettimuthu Math & Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, U.S.A.
CEN6502, Spring Understanding the ORB: Client Side Structure of ORB (fig 4.1) Client requests may be passed to ORB via either SII or DII SII decide.
A Sneak Peak of What’s New in Globus GridFTP John Bresnahan Michael Link Raj Kettimuthu (Presenting) Argonne National Laboratory and The University of.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
The Data Grid: Towards an architecture for Distributed Management
MCA – 405 Elective –I (A) Java Programming & Technology
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Processes The most important processes used in Web-based systems and their internal organization.
University of Technology
Chapter 3: Windows7 Part 4.
Chapter 2: System Structures
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Distributed Systems Bina Ramamurthy 12/2/2018 B.Ramamurthy.
Outline Problem DiskRouter Overview Details Real life DiskRouters
Operating Systems Lecture 3.
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
Presentation transcript:

Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005

2 August 2005DIALOGUE Workshop2 What is GridFTP? l In Grid environments, access to distributed data is very important l Distributed scientific and engineering applications require: u Transfers of large amounts of data between storage systems, and u Access to large amounts of data by many geographically distributed applications and users for analysis, visualization etc l GridFTP - a secure, robust, efficient, standards based data transfer protocol l Features u Standard FTP get/put etc., Third-party control of data transfer

2 August 2005DIALOGUE Workshop3 What is GridFTP? u Grid Security Infrastructure and Kerberos support u Parallel data transfer (multiple transport streams between 2 network endpoints) u Striped data transfer (1 or more transport streams between m network endpoints on the sending side and n network endpoints on the receiving side) u Partial file transfer u Manual/Automatic control of TCP buffer sizes u Support for reliable and restartable data transfer

2 August 2005DIALOGUE Workshop4 New GT4 GridFTP Implementation l NOT based on wuftpd l Striping support has been added l Has IPV6 support included (EPRT, EPSV) l Extremely modular to allow integration with a variety of data sources (files, mass stores, etc.) l Based on Globus eXtensible Input/Output System (XIO) u Simple OCRW API for byte-stream IO

2 August 2005DIALOGUE Workshop5 New Server Architecture l GridFTP (and normal FTP) use (at least) two separate socket connections: u A control channel for carrying the commands and responses u A Data Channel for actually moving the data l GridFTP (and normal FTP) has 3 distinct components: u Client and server protocol interpreters which handle control channel protocol u Data Transfer Process which handles the accessing of actual data and its movement via the data channel

2 August 2005DIALOGUE Workshop6 New Server Architecture l Protocol Interpreter and Data Transfer Process can be (optionally) completely separate processes. l A single protocol interpreter can have multiple data transfer processes behind it. u This is how a striped server works. l Data Transfer Process is architecturally, 3 distinct pieces: u Protocol handler, Data Storage Interface and Data processing module

2 August 2005DIALOGUE Workshop7 New Server Architecture l The protocol handler - talks to the network and understands the data channel protocol l The Data Storage Interface (DSI) - provides an interface to data sources and sinks. l The data processing module - provides ability to manipulate the data prior to transmission. u currently handled via the DSI u In future we plan to make this a separate module

2 August 2005DIALOGUE Workshop8 The Data Storage Interface (DSI) l Number of storage systems in use by the scientific and engineering community u Distributed Parallel Storage System (DPSS) u High Performance Storage System (HPSS) u Distributed File System (DFS) u Storage Resource Broker (SRB) u HDF5 l Use incompatible protocols for accessing data and require the use of their own clients

2 August 2005DIALOGUE Workshop9 The Data Storage Interface (DSI) l It provides a modular pluggable interface to data storage systems. l Conceptually, the DSI is very simple. l DSI consist of several function signatures and a set of semantics. l When a new DSI is created, programmer implements the functions to provide the semantics associated with them.

2 August 2005DIALOGUE Workshop10 The Data Storage Interface (DSI) l The DSI author is not expected to know the intimate details involved in a GridFTP transfer. l There are a set of API functions provided that allow the DSI to interact with the server itself. l This API provides functions for reading and writing data to and from the network.

2 August 2005DIALOGUE Workshop11 The Data Storage Interface (DSI) l DSI could be given significant functionality, such as caching, proxy, backend allocation, etc.. l DSIs can be loaded and switched at runtime. l When the GridFTP server requires action from the storage system (be it data, meta-data, directory creation, etc) it passes a request to the loaded DSI module. l The DSI then services that request and notifies the server when it is finished.

2 August 2005DIALOGUE Workshop12 Developer Implemented Functions typedef struct globus_gfs_storage_iface_s { int descriptor; /* session initiating functions */ globus_gfs_storage_init_t init_func; globus_gfs_storage_destroy_t destroy_func; /* transfer functions */ globus_gfs_storage_transfer_t list_func; globus_gfs_storage_transfer_t send_func; globus_gfs_storage_transfer_t recv_func; globus_gfs_storage_trev_t trev_func; /* data conn funcs */ globus_gfs_storage_data_t active_func; globus_gfs_storage_data_t passive_func; globus_gfs_storage_data_destroy_t data_destroy_func; globus_gfs_storage_command_t command_func; globus_gfs_storage_stat_t stat_func; globus_gfs_storage_set_cred_t set_cred_func; globus_gfs_storage_buffer_send_t buffer_send_func; } globus_gfs_storage_iface_t;

2 August 2005DIALOGUE Workshop13 Striped Data Transfer FTP Client Data Channel Protocol Interpreter Master DSI Data Channel Slave DSI IPC Receiver IPC Link Master DSI Protocol Interpreter Data Channel IPC Receiver Slave DSI Data Channel IPC Link

2 August 2005DIALOGUE Workshop14 Master and Slave DSI l If you wish to support striping, you will need two DSIs l The Master DSI will be in the control process or front end. u Usually, this is relatively trivial and involves minor processing and then “passing” the command over the IPC channel to the slave DSI l The slave DSI does the real work. It typically implements the following functions: u send_func: This function is used to send data from the DSI to the server (get or RETR)

2 August 2005DIALOGUE Workshop15 Master and Slave DSI u recv_func: This function is used to receive data from the server (put or STOR) u stat_func: This function performs a unix stat, i.e. it returns file info. Used by the list function u command_func: This function handles simple (succeed/fail or single line response) file system operations such as mkdir, site chmod, etc. l The master should implement all functions. Besides the above functions, it implements: u active_func: This is for when the DSI will be doing a TCP connect. l The master figures out who gets what IP/port info and then passes it through.

2 August 2005DIALOGUE Workshop16 Master and Slave DSI u passive_func: The counter-part to the active_func when the DSI will be the listener u list_func: This should be passed through and will handle LIST, NLST, MLST, etc.. l There are also some utility functions the master should implement: u trev_func: This handles the restart and performance markers, but should be a simple pass through

2 August 2005DIALOGUE Workshop17 IPC Calls l These calls are how the master DSI “passes” the call to the slave DSI l These calls implement an internal protocol to transfer the necessary structures between the front end and the back end. l The IPC receiver receives the message and then invokes the appropriate DSI call.

2 August 2005DIALOGUE Workshop18 Helper Functions that should be used l When implementing the DSI functions, the following helper functions should be called: u _finished: This tells the server that a specific function (such as recv) has completed u register[read|write]: This is how file data is transferred between the DSI and the server. u bytes_written: This should be called anytime the DSI successfully completes a write to its own storage system. This allows performance and restart markers to be generated.

2 August 2005DIALOGUE Workshop19 Helper Functions that should be used u get_blocksize: This indicates the buffer size that you should exchange with the server via the register_[read|write]. u get_[read|write]_range: This tells the DSI which data it should be sending. l This handles striping (this DSI only needs to send a portion of the file), and partial files.

2 August 2005DIALOGUE Workshop20 Existing DSIs l DSIs do exist for: u File systems accessible via standard POSIX API u Storage Resource Broker (SRB) u High Performance Storage System (HPSS) and u NeST from the Condor team

2 August 2005DIALOGUE Workshop21 Summary l DSIs confer benefits to both the keepers of large datasets and the users of these datasets. l Dataset providers would gain a broader user base, because their data would be available to any client. l Dataset users would gain access to a broader range of storage systems and data.