Globus – Part II Sathish Vadhiyar. Globus Information Service.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
High Performance Computing Course Notes Grid Computing.
This product includes material developed by the Globus Project ( Introduction to Grid Services and GT3.
Data Grids Darshan R. Kapadia Gregor von Laszewski
GridFTP: File Transfer Protocol in Grid Computing Networks
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
GridFTP Introduction – Page 1Grid Forum 5 GridFTP Steve Tuecke Argonne National Laboratory.
A Computation Management Agent for Multi-Institutional Grids
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Grids and Globus at BNL Presented by John Scott Leita.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.
The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
File and Object Replication in Data Grids Chin-Yi Tsai.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Communicating Security Assertions over the GridFTP Control Channel Rajkumar Kettimuthu 1,2, Liu Wantao 3,4, Frank Siebenlist 1,2 and Ian Foster 1,2,3 1.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Basic Grid Projects - Globus Sathish Vadhiyar Sources/Credits: Project web pages, publications available at Globus site. Some of the figures were also.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Globus Presented by: Yayati Kasralikar for CPA 5937.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
The Data Grid: Towards an architecture for Distributed Management
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Globus —— Toolkits for Grid Computing
The Globus Toolkit™: Information Services
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Globus – Part II Sathish Vadhiyar

Globus Information Service

MDS Meta directory service, Monitoring and discovery service For publishing and accessing system and application data Can restrict access to MDS information by using GSI Interacts with local information services – hour- glass mechanism Provides caching to minimize transfer of upto- date information and lessen network overhead

MDS Integrates existing systems while providing uniform and extensible data model Uniform API Adopts data representation and API, query language and protocol from LDAP directory service Uses 2 protocols GRIP – for providing information about entities GRIP – for providing information about entities GRRP – for registering entities GRRP – for registering entities LDAP query language supports: Search Search Enquiry Enquiry subscription subscription

MDS Architecture GIIS – Grid Index Information Service GRIS – Grid Resource Information Service

MDS Support for multiple information service providers - information providers specified on a per attribute basis MDS Data: System information: architecture, OS System information: architecture, OS Network information Network information Load status Load status Additional information sent to GIIS by GRAM reporter Job status Job status Queue information Queue information Information viewed through web browser or web client commands

MDS Contains entries where each entry is associated with one or more attribute:value pairs Each entry associated with a distinguished name. Object class are associated with entries – for object types

Distinguished name example

Another Example

Distinguished names for Networks

Globus Data Grid

Data Grid Challenges: Petabytes and terabytes of data Petabytes and terabytes of data Query management to this huge data Query management to this huge data Cache management Cache management Providing gigabit/sec QoS Providing gigabit/sec QoS Coscheduling data transfers and computation Coscheduling data transfers and computation Selection of dataset replicas Selection of dataset replicas Maximize use of scarce storage, computation and network resources Maximize use of scarce storage, computation and network resources

Data Grid Motivation Application requirements: 1.A reliable secure high-performance data transfer protocol 2.Management of multiple copies of files and collections of files

Data Grid Architecture

GridFTP Secure file transfer over Grid Multiple data channels for parallel transfers – using multiple TCP streams in parallel to improve aggregate bandwidth Partial file transfers Third-party (direct server-to-server) transfers by adding GSSAPI security to the existing third-party data transfers in FTP standard – transfers between 2 servers mediated by a third-party client GSSAPI operations authenticate the third party to the source and destination machines of data transfer

Grid FTP contd… Authenticated data channels - both GSI and Kerberos security Reusable data channels Striped data transfers 2 libraries: globus_ftp_control_library – implements control channel API globus_ftp_control_library – implements control channel API gobus_ftp_client_librray – implement GridFTP API gobus_ftp_client_librray – implement GridFTP API Plugin mechanisms for fault tolerance, performance monitoring, and extended data processing

Globus Replica Management Architecture Replica management For better performance or availability to accesses For better performance or availability to accesses Mainly for access to “published” resources – read-only model Mainly for access to “published” resources – read-only modelFunctions:Architecture: Lower level replica catalog API Lower level replica catalog API Higher level replica management API Higher level replica management API

Replica catalog Provides mapping between logical names of files/locations and physical objects on storage systems Stores 3 kinds of entries Logical collection – user defined collections of files – file aggregation Logical collection – user defined collections of files – file aggregation Location entries – physical locations of files Location entries – physical locations of files Logical files – globally unique names Logical files – globally unique names Replica catalog API provides operations on the replica catalog Replica management API provides session management, catalog creation, file maintenance, access control Implemented with LDAP

Replica management Globus Replica Management integrates the Globus Replica Catalog (for keeping track of replicated files) and GridFTP (for moving data) and provides replica management capabilities for data grids. The globus_replica_management library provides client functions that allow files to be registered with the replica management service, published to replica locations, and moved among multiple locations. Managing the copying and placement of files in a distributed computing system so as to improve the performance of data analysis

Replica management service - functions Registration of files with the replica management service Creation and deletion of replicas of previously registered files Enquiries concerning the location and performance characteristics of replicas. Replica selection based on performance characteristics

Replica management Replica management API – combines storage system operations with calls to low-level catalog API functions Replica management system controls where and when copies are created and provides information about copies But does not ensure file consistency

RM API Session management Session handles and attributes Session handles and attributes Restart Restart Rollback Rollback Catalog creation and file management Creating catalog entries Creating catalog entries registering files registering files Publishing files Publishing files Copying, deleting files Copying, deleting files Future ideas Incorporating advance researvation Incorporating advance researvation Automatic replica selection and creation Automatic replica selection and creation Data grid projects

Replica Catalog Illustration

Replica Selection in Globus Data Grid (Vazhkudai et al.) Replica selection uses MDS for information regarding characteristics of storage systems LDAP information organized as DIT (Directory Information Tree) Each storage resource in Data Grid incorporates GRIS LDAP can execute shell scripts in the background to obtain various dynamic entities like availableSpace, mountPoint etc. Static attributes like seek times can be entered by the system administrator Attributes like data transfer rates across networks to clients can be obtained based on past performance, i.e., historical data ClassAds can also be used for expressing storage attributes

Directory for Storage GRIS

Metadata Specification

Performance Data Specification

Steps in Replica Management 1.Application queries metadata expressing desired characteristics of logical files 2.A logical file is returned 3.Application queries replica catalog for replica instances for the logical file 4.Storage broker helps to choose a particular replica

Replica Selection

Storage Architecture steps 1.Application presents classAds regarding replica requirements to SB 2.SB does search: 1. Queries replica catalogs with the list of all replicas 2. Queries individual GRIS of replicas about their characteristics 3. Collects all information and proceeds to matching 3.Match: 1. Converts replica capabilities to replica classAds 2. Matches application classAds to replica classAds 4.Accesses file using GridFTP

Globus References / sources / credits Grid Information Services for Distributed Resource Sharing. K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August Usage of LDAP in Globus. I. Foster, G. von Laszewski. This short note describes the use of LDAP in the Globus toolkit. It answers three questions: What is LDAP? Where is it used? and Why is it used in Globus? A Directory Service for Configuring High-Performance Distributed Computations. S. Fitzgerald, I. Foster, C. Kesselman, G. von Laszewski, W. Smith, S. Tuecke. Proc. 6th IEEE Symposium on High-Performance Distributed Computing, pp , Describes the Metacomputing Directory Service used to maintain information about Globus components.

Globus References / sources / credits The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke. Journal of Network and Computer Applications, 23: , 2001 (based on conference publication from Proceedings of NetStore Conference 1999). Secure, Efficient Data Transport and Replica Management for High- Performance Data-Intensive Computing. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel, S. Tuecke. IEEE Mass Storage Conference, Presents the design and performance characteristics of two fundamental technologies for data management. Replica Selection in the Globus Data Grid. S. Vazhkudai, S. Tuecke, I. Foster. Proceedings of the First IEEE/ACM International Conference on Cluster Computing and the Grid (CCGRID 2001), pp , IEEE Computer Society Press, May Discusses a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives.

JUNK !!

RFT (Reliable File Transfer) Treat movement of multiple files as a single job Accept transfer requests and reliably manage requests OGSI compliant To transfer data reliably between two GridFTP servers Uses Grid Service Handles (GSH) Acts as a proxy for the user, acts as client on user’s behalf for third-party transfers

RFT Client submits SOAP description of data transfer job Maintains checkpoints in data bases Supports both “push” and “pull” mechanisms

Data Grid Replica Services Need for meta-data services Various kinds: Application metadata Application metadata Replica metadata Replica metadata System configuration metadata System configuration metadata Replica management For better performance or availability to accesses For better performance or availability to accesses Mainly for access to “published” resources – read- only model Mainly for access to “published” resources – read- only model

Replica Catalog Provide mappings between logical names for file or collections and one or more copies of those objects on physical systems Services provided by replica catalog: Registering a list of files as a logical collection Registering a list of files as a logical collection Registering the physical location of a complete or partial replica of a logical collection Registering the physical location of a complete or partial replica of a logical collection Registering information about a particular logical file in a logical collection Registering information about a particular logical file in a logical collection Modifying the contents of registered entities of the catalog Modifying the contents of registered entities of the catalog Responding to queries of the catalog Responding to queries of the catalog The Globus Replica Catalog supports replica management by providing mappings between logical names for files and one or more copies of the files on physical storage systems