Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda.

Slides:



Advertisements
Similar presentations
Giggle: A Framework for Constructing Scalable Replica Location Services Ann Chervenak, Ewa Deelman, Ian Foster, Leanne Guy, Wolfgang Hoschekk, Adriana.
Advertisements

The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Case Study 1: Data Replication for LIGO Scott Koranda Ann Chervenak.
RLS and DRS Roadmap Items Ann Chervenak Robert Schuler USC Information Sciences Institute.
Database Architectures and the Web
High Performance Computing Course Notes Grid Computing.
The Globus Toolkit and OMII-Europe Neil Chue Hong EPCC, University of Edinburgh Thanks to Ian Foster and the Globus Team for slides.
Data Grids Darshan R. Kapadia Gregor von Laszewski
Application of GRID technologies for satellite data analysis Stepan G. Antushev, Andrey V. Golik and Vitaly K. Fischenko 2007.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
GT4 Introductory and Advanced Practicals Rachana Ananthakrishnan, Charles Bacon, Lisa Childers Argonne National Laboratory University of Chicago.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
The Data Replication Service Ann Chervenak Robert Schuler USC Information Sciences Institute.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Globus Data Replication Services Ann Chervenak, Robert Schuler USC Information Sciences Institute.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
DataGrid Middleware: Enabling Big Science on Big Data One of the most demanding and important challenges that we face as we attempt to construct the distributed.
Overview of GT4 Data Services. Globus Data Services Talk Outline Summarize capabilities and plans for data services in the Globus Toolkit Version
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
1 Introduction to Grid Computing. 2 What is a Grid? Many definitions exist in the literature Early definitions: Foster and Kesselman, 1998 “A computational.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Services Overview & Introduction Ian Foster Argonne National Laboratory University of Chicago Univa Corporation OOSTech, Baltimore, October 26, 2005.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Grid Services I - Concepts
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
LEGS: A WSRF Service to Estimate Latency between Arbitrary Hosts on the Internet R.Vijayprasanth 1, R. Kavithaa 2,3 and Raj Kettimuthu 2,3 1 Coimbatore.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
CEDPS Data Services Ann Chervenak USC Information Sciences Institute.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Rights Management in Globus Data Services Ann Chervenak, ISI/USC Bill Allcock, ANL/UC.
Globus online Software-as-a-Service for Research Data Management Steve Tuecke Deputy Director, Computation Institute University of Chicago & Argonne National.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Nanbor Wang, Balamurali Ananthan Tech-X Corporation Gerald Gieraltowski, Edward May, Alexandre Vaniachine Argonne National Laboratory 2. ARCHITECTURE GSIMF:
Data Management The European DataGrid Project Team
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Ian Foster Computation Institute Argonne National Lab & University of Chicago Application Hosting Services — Enabling Science 2.0 —
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Federating Data in the ALICE Experiment
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
A Replica Location Service
OGSA Data Architecture Scenarios
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Wide Area Data Replication for Scientific Collaborations Ann Chervenak, Robert Schuler, Carl Kesselman USC Information Sciences Institute Scott Koranda Univa Corporation Brian Moe University of Wisconsin Milwaukee

Motivation l Scientific application domains spend considerable effort managing large amounts of experimental and simulation data l Have developed customized, higher-level Grid data management services l Examples: u Laser Interferometer Gravitational Wave Observatory (LIGO) Lightweight Data Replicator System u High Energy Physics projects: EGEE system, gLite, LHC Computing Grid (LCG) middleware u Portal-based coordination of services (E.g., Earth System Grid)

Motivation (cont.) l Data management functionality varies by application l Share several requirements: u Publish and replicate large datasets (millions of files) u Register data replicas in catalogs and discover them u Perform metadata-based discovery of datasets u May require ability to validate correctness of replicas u In general, data updates and replica consistency services not required (i.e., read-only accesses) l Systems provide production data management services to individual scientific domains l Each project spends considerable resources to design, implement & maintain data management system u Typically cannot be re-used by other applications

Motivation (cont.) l Long-term goals: u Generalize functionality provided by these data management systems u Provide suite of application-independent services l Paper describes one higher-level data management service: the Data Replication Service (DRS) l DRS functionality based on publication capability of the LIGO Lightweight Data Replicator (LDR) system l Ensures that a set of files exists on a storage site u Replicates files as needed, registers them in catalogs l DRS builds on lower-level Grid services, including: u Globus Reliable File Transfer (RFT) service u Replica Location Service (RLS)

Outline l Description of LDR data publication capability l Generalization of this functionality u Define characteristics of an application-independent Data Replication Service (DRS) l DRS Design l DRS Implementation in GT4 environment l Evaluation of DRS performance in a wide area Grid l Related work l Future work

A Data-Intensive Application Example: The LIGO Project l Laser Interferometer Gravitational Wave Observatory (LIGO) collaboration l Seeks to measure gravitational waves predicted by Einstein l Collects experimental datasets at two LIGO instrument sites in Louisiana and Washington State l Datasets are replicated at other LIGO sites l Scientists analyze the data and publish their results, which may be replicated l Currently LIGO stores more than 40 million files across ten locations

The Lightweight Data Replicator l LIGO scientists developed the Lightweight Data Replicator (LDR) System for data management l Built on top of standard Grid data services: u Globus Replica Location Service u GridFTP data transport protocol l LDR provides a rich set of data management functionality, including u a pull-based model for replicating necessary files to a LIGO site u efficient data transfer among LIGO sites u a distributed metadata service architecture u an interface to local storage systems u a validation component that verifies that files on a storage system are correctly registered in a local RLS catalog

LIGO Data Publication and Replication Two types of data publishing 1. Detectors at Livingston and Hanford produce data sets u Approx. a terabyte per day during LIGO experimental runs u Each detector produces a file every 16 seconds u Files range in size from 1 to 100 megabytes u Data sets are copied to main repository at CalTech, which stores them in tape-based mass storage system u LIGO sites can acquire copies from CalTech or one another 2. Scientists also publish new or derived data sets as they perform analysis on existing data sets u E.g., data filtering or calibration may create new files u These new files may also be replicated at LIGO sites

Some Terminology l A logical file name (LFN) is a unique identifier for the contents of a file u Typically, a scientific collaboration defines and manages the logical namespace u Guarantees uniqueness of logical names within that organization l A physical file name (PFN) is the location of a copy of the file on a storage system. u The physical namespace is managed by the file system or storage system l The LIGO environment currently contains: u More than six million unique logical files u More than 40 million physical files stored at ten sites

Components at Each LDR Site l Local storage system l GridFTP server for file transfer l Metadata Catalog: associations between logical file names and metadata attributes l Replica Location Service: u Local Replica Catalog (LRCs) stores mappings from logical names to storage locations u Replica Location Index (RLI) collects state summaries from LRCs l Scheduler and transfer daemons l Prioritized queue of requested files

LDR Data Publishing l Scheduling daemon runs at each LDR site u Queries site’s metadata catalog to identify logical files with specified metadata attributes u Checks RLS Local Replica Catalog to determine whether copies of those files already exist locally u If not, puts logical file names on priority-based scheduling queue l Transfer daemon also runs at each site u Checks queue and initiates data transfers in priority order u Queries RLS Replica Location Index to find sites where desired files exists u Randomly selects source file from among available replicas u Use GridFTP transport protocol to transfer file to local site u Registers newly-copied file in RLS Local Replica Catalog

Generalizing the LDR Publication Scheme l Want to provide a similar capability that is u Independent of LIGO infrastructure u Useful for a variety of application domains l Capabilities include: u Interface to specify which files are required at local site u Use of Globus RLS to discover whether replicas exist locally and where they exist in the Grid u Use of a selection algorithm to choose among available replicas u Use of Globus Reliable File Transfer service and GridFTP data transport protocol to copy data to local site u Use of Globus RLS to register new replicas

Relationship to Other Globus Services At requesting site, deploy: l WS-RF Services u Data Replication Service u Delegation Service u Reliable File Transfer Service l Pre WS-RF Components u Replica Location Service (Local Replica Catalog, Replica Location Index) u GridFTP Server

DRS Functionality l Initiate a DRS Request l Create a delegated credential l Create a Replicator resource l Monitor Replicator resource l Discover replicas of desired files in RLS, select among replicas l Transfer data to local site with Reliable File Transfer Service l Register new replicas in RLS catalogs l Allow client inspection of DRS results l Destroy Replicator resource DRS implemented in Globus Toolkit Version 4, complies with Web Services Resource Framework (WS-RF)

WSRF in a Nutshell l Service l State Management: u Resource u Resource Property l State Identification: u Endpoint Reference l State Interfaces: u GetRP, QueryRPs, GetMultipleRPs, SetRP l Lifetime Interfaces: u SetTerminationTime u ImmediateDestruction l Notification Interfaces u Subscribe u Notify l ServiceGroups RPs Resource Service GetRP GetMultRPs SetRP QueryRPs Subscribe SetTermTime Destroy EPR

Service Container Create Delegated Credential Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP proxy Initialize user proxy cert. Create delegated credential resource Set termination time Credential EPR returned EPR

Service Container Create Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Create Replicator resource Pass delegated credential EPR Set termination time Replicator EPR returned EPR Replicator RP Access delegated credential resource

Service Container Monitor Replicator Resource Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Periodically polls Replicator RP via GetRP or GetMultRP Add Replicator resource to MDS Information service Index Index RP Subscribe to ResourceProperty changes for “Status” RP and “Stage” RP Conditions may trigger alerts or other actions (Trigger service not pictured) EPR

Service Container Query Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “discover” Replicator queries RLS Replica Index to find catalogs that contain desired replica information Replicator queries RLS Replica Catalog(s) to retrieve mappings from logical name to target name (URL)

Service Container Transfer Data Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “transfer” Create Transfer resource Pass credential EPR Set Termination Time Transfer resource EPR returned Transfer RP EPR Access delegated credential resource Setup GridFTP Server transfer of file(s) Data transfer between GridFTP Server sites Periodically poll “ResultStatus” RP via GetRP When “Done”, get state information for each file transfer

Service Container Register Replica Information Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Stage” RP value changed to “register” RLS Replica Catalog sends update of new replica mappings to the Replica Index Transfer RP Replicator registers new file mappings in RLS Replica Catalog

Service Container Client Inspection of State Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Notification of “Status” RP value changed to “Finished” Transfer RP Client inspects Replicator state information for each replication in the request

Service Container Resource Termination Client Delegation Data Rep. RFT Replica Index Replica Catalog GridFTP Server GridFTP Server Replica Catalog Replica Catalog Replica Catalog MDS Credential RP Replicator RP Index RP Termination time (set by client) expires eventually Transfer RP Resources destroyed (Credential, Transfer, Replicator) TIME

Performance Measurements: Wide Area Testing l The destination for the pull-based transfers is located in Los Angeles u Dual-processor, 1.1 GHz Pentium III workstation with 1.5 GBytes of memory and a 1 Gbit Ethernet u Runs a GT4 container and deploys services including RFT and DRS as well as GridFTP and RLS l The remote site where desired data files are stored is located at Argonne National Laboratory in Illinois u Dual-processor, 3 GHz Intel Xeon workstation with 2 gigabytes of memory with 1.1 terabytes of disk u Runs a GT4 container as well as GridFTP and RLS services

DRS Operations Measured l Create the DRS Replicator resource l Discover source files for replication using local RLS Replica Location Index and remote RLS Local Replica Catalogs l Initiate an Reliable File Transfer operation by creating an RFT resource l Perform RFT data transfer(s) l Register the new replicas in the RLS Local Replica Catalog

Experiment 1: Replicate 10 Files of Size 1 Gigabyte Component of Operation Time (milliseconds) Create Replicator Resource317.0 Discover Files in RLS Create RFT Resource Transfer Using RFT Register Replicas in RLS l Data transfer time dominates l Wide area data transfer rate of 67.4 Mbits/sec

Experiment 2: Replicate 1000 Files of Size 10 Megabytes Component of Operation Time (milliseconds) Create Replicator Resource Discover Files in RLS 9.8 Create RFT Resource Transfer Using RFT Register Replicas in RLS l Time to create Replicator and RFT resources is larger u Need to store state for 1000 outstanding transfers l Data transfer time still dominates l Wide area data transfer rate of 85 Mbits/sec

Future Work l We will continue performance testing of DRS: u Increasing the size of the files being transferred u Increasing the number of files per DRS request l Add and refine DRS functionality as it is used by applications u E.g., add a push-based replication capability l We plan to develop a suite of general, configurable, composable, high-level data management services