DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Steve Traylen Particle Physics Department Experiences of DCache at RAL UK HEP Sysman, 11/11/04 Steve Traylen
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Author - Title- Date - n° 1 GDMP The European DataGrid Project Team
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
The EDG Testbed Deployment Details The European DataGrid Project
XCAT Science Portal Status & Future Work July 15, 2002 Shava Smallen Extreme! Computing Laboratory Indiana University.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Next Generation Data Mgmt... – n° 1 James Casey CERN
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
File and Object Replication in Data Grids Chin-Yi Tsai.
DataGrid is a project funded by the European Union VisualJob Demonstation EDG 1.4.x 2003 The EU DataGrid How the use of distributed resources can help.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
DataTAG is a project funded by the European Union DataTAG WP4 meeting, Bologna 29/07/2003 – n o 1 GLUE Schema - Status Report DataTAG WP4 meeting Bologna,
1 DataTAG-WP4 and GLUE ( mainly from A.Ghiselli II INFN-Grid workshop) Mirco Mazzucato Gridstart meeting at HPC Cetraro.
Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa.
Data Management The European DataGrid Project Team
Data Management The European DataGrid Project Team
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
Current Globus Developments Jennifer Schopf, ANL.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Massimo Sgaravatto INFN Padova
Jean-Philippe Baud, IT-GD, CERN November 2007
The EDG Testbed Deployment Details
Real Time Fake Analysis at PIC
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
INFNGRID Monitoring Group report
Moving the LHCb Monte Carlo production system to the GRID
The European DataGrid Project Team
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Sergio Fantinel, INFN LNL/PD
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
A conceptual model of grid resources and services
LCG experience in Integrating Grid Toolkits
Report on GLUE activities 5th EU-DataGRID Conference
The EU DataGrid Data Management
The EU DataGrid Project Tutorial
The EU DataGrid Fabric Management Services
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project Flavia Donno (Former EDG WP2, LCG)

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 2 Talk Outline u Introduction u Replication Tools u Architecture Overview u GDMP and edg-replica-manager details u History and Deployment u Summary and Future Work Authors Heinz Stockinger – CERN/EP, CMS Flavia Donno, CERN/IT LCG and INFN Pisa Erwin Laure, Shazhad Muzaffar – CERN/EP Giuseppe Andronico – INFN Catania Peter Kunszt - CERN/IT Paul Millar - PPARC

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 3 Introduction u Data management: large amounts of data at distributed sites SE u Assumption: data is read-only u Replication is required between Storage Elements (SEs) u In Grid environment n File transfer from User Interface and Computing Nodes to Storage resources n Upload of files into Grid

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 4 Replication Tools u We have designed, developed and deployed two major replication packages: n GDMP - Grid Data Mirroring Package n edg-replica-manager EDGPPDG u GDMP was a pioneer effort started initially in the CMS collaboration. It became later a joint project between EDG and PPDG. It allows for mirroring of data between Storage Elements through a host subscription method. u edg-replica-manager deals with point-to-point single file replication. The tool is built around the Globus Replica Manager and Replica Catalogue/Replica Location Service libraries.

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 5 Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service GDMP in detail StorageElement1StorageElement3StorageElement2 GDMP client

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 6 Subscription Model n All the sites that subscribe to a particular site get notified whenever there is an update in its catalog. Site 1 Site 3 Site 2 Subscriber list Subscriber list subscribe

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 7 SE SESE Architecture Overview GDMP GDMP GridFtp gdmp_replicate_get fileA Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service MSS MSS UIWN GDMP Client GDMP Pros Very stable and scalable architecture Reliable and robust replication retries on error file checksumming complex logging Users can control file transfer via local catalogues Back-ends available for actions to be performed on replication (MSS hooks, automatic replication, post replication actions,…) MSS interface GDMP Cons It was designed to handle mirroring among sites and not for point-to-point replication Several steps involved for replication Configuration difficult: can be improved, with the introduction of new Grid services No space management provided since it is responsibility of the SE service Error messages not always clear Some time recovery from errors requires manual intervention

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 8 edg-replica-manager in detail u Extends the Globus replica manager u Only client side tool u Allows for replication (copy) and registering of files in RC n works with LDAP based Globus Replica Catalog and Replica Location Service u Keeps RC consistent with stored data. u Uses GDMP’s staging interface to stage to MSS

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 9 SE SESEGDMP GDMP GridFtp Architecture Overview GDMP Client Globus Replica Catalog or Replica Location Service Globus Replica Catalog or Replica Location Service Edg-replica-manager fileB edg-rm-creg fileA Edg-rm/edg-rc Pros User friendly interface Functional Third party transfer available GSI authorization available for RM and RC Easy configuration Edg-rm/edg-rc Cons RM: Error messages not always clear RM: No roll-back; no transactions RM: No complete interface to schema RC: Performance deterioration with number of entries RC: Centralized, non-scalable RC: No high level user CLI for browsing RC: Schema non flexible MSS MSS

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 10 GDMP vs edg-replica-manager u GDMP n Replicates sets of files n Replication between SEs only n Mass storage interface n logical file attributes (size, timestamp, etc. … extensible) n Subscription model n Event notification n CRC file size check n Support for Objectivity/DB n Automatic retries n Support for multiple VOs u Replica Manager n Replicates single files n Replication between SEs, UI or CE to SE. n Uses GDMP’s Mass Storage interface at the SE client-server client side only

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 11 History: Replication tool development GDMP 1.x September 2000 u First prototype of basic SE-SE replication of Objectivity files u Based on Globus GDMP 2.x October 2001 u General file replication tools (not only Objectivity files) u Uses GridFTP + Globus Replica Catalog u Full Mass Storage Support GDMP 3.x April 2002 u Split into client and server side tool u Improved server functionality/security u Support for multiple VO Edg-replica-manager 1.x May 2002 u Based on globus-replica-management and globus-replica-catalog libs Edg-replica-manager 2.x December 2002 u Several improvement – Replica Location Service binding GDMP 3.2.x October 2002 u RLS + several improvements GDMP 4.0 October 2002 u Globus RH 7.3 gcc gcc 3.2

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 12 Deployment u GDMP first used for High Level Trigger studies (“production”) of HEP experiments in 2000/2001 n Replication between SEs u Later introduced also in European DataGrid testbed: n Requirements changed: s All user commands needed to be executed from a User Interface machine or from Worker Nodes of Computing Element n Caused some redesign u Both tools (GDMP and edg-replica-manager) are used in European and US testbeds n EDG n EDG: ATLAS, CMS, Alice and LHCb stress tests n WorldGrid n WorldGrid: first transatlantic testbed – interoperable tools n LCG-0 n LCG-0: deployed and interoperable with WorldGrid and GLUE testbeds We thank our user community for valuable feedback

CHEP 2003 – March 2003 – Grid Data Management in Action – n° 13 Summary and Future Work u First generation of EDG replica management tools satisfy basic use case and requirements u Client-only tools are simple to use but no server side logging u Limitations of certain services proved: Globus and EDG working together to design and implement new tools u A lot of experience gained: new software tools under development (see talk “Next-Generation EU DataGrid Data Management Services “)Next-Generation EU DataGrid Data Management Services Thanks to the EU and our national funding agencies for their support of this work