CYBERINFRASTRUCTURE FOR THE GEOSCIENCES www.geongrid.org Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
WP2 and GridPP UK Simulation W. H. Bell University of Glasgow EDG – WP2.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Database Architectures and the Web
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
High Performance Computing Course Notes Grid Computing.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Reliable Data Movement using Globus GridFTP and RFT: New Developments in 2008 John Bresnahan Michael Link Raj Kettimuthu Argonne National Laboratory and.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Cole David Ronnie Julio. Introduction Globus is A community of users and developers who collaborate on the use and development of open source software,
Globus – Part II Sathish Vadhiyar. Globus Information Service.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Introduction to The Storage Resource.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
1 Overall Architectural Design of the Earth System Grid.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 GridFTP and SRB Guy Warner Training, Outreach and Education Team, Edinburgh e-Science.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Current Globus Developments Jennifer Schopf, ANL.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
The Data Grid: Towards an architecture for Distributed Management
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Enterprise Computing Collaboration System Example
Viet Tran Institute of Informatics Slovakia
Presentation transcript:

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Outline Motivation Data Replication Service (DRS) Components for DRS –RLS, GridFTP, RFT DRS Deployment DRS setup on GEON Next Steps

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Motivation Science domains spend considerable effort collecting and managing large amounts of data Science domains develop customized data management services that vary with the type of application Common data management requirements –Publish and replicate large datasets –Register data replicas in catalogs and discover them –Perform metadata-based discovery of datasets –May require ability to validate correctness of replicas

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Motivation (cont.) These systems demand considerable resources to design, implement & maintain –Typically cannot be re-used by other applications Need for a long-term solution –Generalize functionality provided by these data management systems –Provide suite of application-independent services Design and build on lower-level grid services –Globus Reliable File Transfer (RFT) service –Replica Location Service (RLS) –GridFTP

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES A possible solution: Data Replication System (DRS) Higher level data management service based on low level data management components like RLS and RFT The primary functionality is to –Allow users to identify a set of desired files existing in their grid environment –Make local replicas of those data files by transferring files from one or more source locations –Register the new replicas in a Replica Location Service

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Replica Location Service (RLS) A simple registry that keeps track of where replicas exist on physical storage systems. Users or services register files in RLS when the files are created. Query RLS servers to find these replicas. RLS can be a distributed registry, consisting of multiple servers at different sites. Distributed RLS increases the overall scale and store more mappings than would be possible in a single, centralized catalog.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES RLS (cont.) A logical file name is a unique identifier for the contents of a file. A physical file name is the location of a copy of the file on a storage system. RLS maintains mappings between logical file names and one or more physical file names of replicas. Users can provide a logical file name to an RLS server and ask for all the registered physical file names of replicas. Users can also query an RLS server to find the logical file name associated with a particular physical file location. XYZ replica 1 XYZ replica 2XYZ replica 3 Logical File Name XYZ Site 1 Site 2 Site 3

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES RLS (cont.) Two servers: LRI, LRC LRC stores mappings between logical names for data items and the physical locations of replicas. Query the LRC to discover replicas associated with a logical name. RLI server collects information about the logical name mappings stored in one or more LRCs. RLI returns a list of all the LRCs it is aware of that contain mappings for the logical name contained in a query. The client then queries these LRCs to find the physical locations of replicas. RLI LRC Local Replica Catalogs (LRC) Replica Location Index (RLI) Nodes

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES RLS in Context The RLS is one component in a layered data management architecture Consistency management provided by higher- level services Replica Consistency Management Services Replica Location Service Reliable Data Transfer Service GridFTP Metadata Service Reliable Replication Service

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GridFTP The GridFTP protocol provides for the secure, robust, fast and efficient transfer of (especially bulk) data. Globus Toolkit provides the most commonly used implementation of the protocol, though others exist. The Globus Toolkit provides –server implementation called globus-gridftp-server –scriptable command line client called globus-url- copy –a set of development libraries for custom clients

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Reliable File Transfer (RFT) A WSRF compliant web service that provides “job scheduler” like functionality for data movement. You provide a list of source and destination URLs (including directories or files), then the service writes your job description into a database and moves the files on your behalf.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES RFT (cont.) Accepts SOAP description of a desired transfer Service methods are provided for querying the transfer status WSRF tools to subscribe for notifications of state change events Supports all the same options as globus-url- copy (buffer size, etc) Increased reliability because state is stored in a database Supports concurrency, multiple files transferred for better performance

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Web Service Container Data Replication Service Replicator Resource Reliable File Transfer Service RFT Resource Local Replica Catalog Replica Location Index GridFTP Server Delegation Service Delegated Credential Local Site Globus Services WSRF Services –Data Replication Service –Delegation Service –Reliable File Transfer Service Pre WSRF Components –Replica Location Service (Local Replica Catalog, Replica Location Index) –GridFTP Server

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES GridFTP Server DRS Service RFT Service Create a Transfer request DRS Deployment Local storage system GridFTP server for file transfer Replica Location Service: –LRCs stores mappings from logical names to storage locations –RLI collects state summaries from LRCs RFT: WSRF service to perform data transfer DRS: The master replication service Database Site Storage System Replica Location Index Location Replica Catalog

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Web Service Container Data Replication Service Replicator Resource Reliable File Transfer Service RFT Resource Local Replica Catalog GridFTP Server Delegation Service Delegated Credential Local Site Web Service Container Data Replication Service Replicator Resource Reliable File Transfer Service RFT Resource Local Replica Catalog Replica Location Index GridFTP Server Delegation Service Delegated Credential Remote Sites 1…N Client Request File Replica Location Index

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES DRS Functionality Initiate a DRS Request Create a delegated credential (Delegate Authority) Create a Replicator resource (Replication Service) Monitor Replicator resource (Status) Discover replicas of files in RLS, select among replicas Start data transfer to local site with RFT service Check status Register new replicas in RLS catalogs Allow client inspection of DRS results Destroy Replicator resource

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Geon DRS Test Setup ASUSDSC GridFTP Server DRS Service RFT Service Create a Transfer request Database Site Storage System Replica Location Index Replica Location Catalog GridFTP Server DRS Service RFT Service Create a Transfer request Database Site Storage System Replica Location Index Replica Location Catalog Data Transfer Globus Container

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Next Tasks Transfer LIDAR data from ASU to SDSC resource. (HPSS, etc) Extend the testbed to include more nodes. Benchmarking data movement. Package DRS and components with GEON software stack version 2.0

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Acknowledgement Ann Chervenak & Robert Schuler (ISI) (slides)

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Questions?