Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

Slides:



Advertisements
Similar presentations
Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
DataGrid is a project funded by the European Union CHEP 2003 – March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and.
GRID DATA MANAGEMENT PILOT (GDMP) Asad Samar (Caltech) ACAT 2000, Fermilab October , 2000.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
AustrianGrid, LCG & more Reinhard Bischof HPC-Seminar April 8 th 2005.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
How to Install and Use the DQ2 User Tools US ATLAS Tier2 workshop at IU June 20, Bloomington, IN Marco Mambelli University of Chicago.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Direct gLExec integration with PanDA Fernando H. Barreiro Megino CERN IT-ES-VOS.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
Performance of The NorduGrid ARC And The Dulcinea Executor in ATLAS Data Challenge 2 Oxana Smirnova (Lund University/CERN) for the NorduGrid collaboration.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
© Geodise Project, University of Southampton, Geodise Middleware Graeme Pound, Gang Xue & Matthew Fairman Summer 2003.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
ALCF Argonne Leadership Computing Facility GridFTP Roadmap Bill Allcock (on behalf of the GridFTP team) Argonne National Laboratory.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
DDM Central Catalogs and Central Database Pedro Salgado.
DataGrid is a project funded by the European Commission EDG Conference, Heidelberg, Sep 26 – Oct under contract IST OGSI and GT3 Initial.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
Current Globus Developments Jennifer Schopf, ANL.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Data Management cluster summary
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

10/05/2004Don Quijote - Status & Plans2 Overview  Don Quijote o New Focus  Functionalities o POOL  Architecture  Current Status o NorduGrid o US Grid 3(+) o LCG-2 o Integration with ATLAS prodsys  Future plans

10/05/2004Don Quijote - Status & Plans3 Don Quijote  Data Management for the ATLAS Automatic Production System  Allow transparent registration and movement of replicas between all grid “flavors” used by ATLAS o US Grid o Nordugrid o LCG o (support for legacy systems might be introduced soon)  Avoid creating yet another catalog o which grid middleware wouldn't recognize (e.g Resource Brokers) o use existing catalogs and data management tools o find common features between tools and catalogs o bridge them and provide a unified interface  Accessible as a service o lightweight clients

10/05/2004Don Quijote - Status & Plans4 Don Quijote – new focus  Provide a single tool to end-users to manage data files o Integrates all tools that users would have to know about into a single one. E.g.:  FCpublish, FCregister, … (POOL File Catalogs)  edg-rm, edg-rmc, edg-lrc, … (EDG)  globus-rls-cli, globus-url-copy, … (Globus)  ldapsearch, … (querying information system)  rfdir, rfcp, … (common use of Castor)  Acts as a POOL-aware Replica Manager  Eases security requirements for end-users o Temporarily!

10/05/2004Don Quijote - Status & Plans5 Functionalities Replica Catalogs Manipulation File Movement LPN = Logical Collection Name + Logical File Name (unique)  search | fullSearch | searchHosts ( lpn )  add[Restricted] ( lpn, url [, guid, fsize, md5sum ] )  addTemporary[Restricted] ( lpn, url, nrhours [, guid, fsize, md5sum ] )  keepUntil ( url, nrhours )  makePermanent ( url )  removeReplica ( url )  remove ( lpn )  rename ( old lpn, new lpn )  stageOut( url )  getToDestination ( src SE, lpn, dest )  putToSE ( src turl, lpn, dest SE [, guid, md5sum] )

10/05/2004Don Quijote - Status & Plans6 Functionalities - POOL  Integrates file movement with POOL XML File Catalogs o Uses DQ + POOL FC command line tools o Python scripts  Use-cases: o Get local copy of file and generate or update corresponding PoolFileCatalog.xml  (to provide input data and input POOL XML catalog for a job) o Copy and register a local copy of a file to a grid flavor given UUID in the local PoolFileCatalog.xml  (to register output data from a job)

10/05/2004Don Quijote - Status & Plans7 Architecture  Python Client o C++ client library o Configuration file indicating endpoint of each server  Servers o Per grid-flavor o GSI and insecure o Configuration file User interface tool written in Python Servers and client library written in C++

10/05/2004Don Quijote - Status & Plans8 Changes on Server-side  Why was server-side code rewritten? o Partly because of CMS experience  Persistent connections were necessary  Connection pooling mechanism  Each request could not instantiate a connection to the grid catalog – too slow! o Partly from our initial experience  Flexible security mechanism Either provide a single certificate for all, or delegate credentials  Initial version: o A command line tool for each grid flavor with the same syntax and same “output” o Clarens server was forking out a process that executed the request by calling the command line tool o This proved to be inefficient and too restrictive – e.g. could not maintain persistent connections across multiple requests!  Therefore, o Server code was built by extending the command line tools – each tool is now a daemon

10/05/2004Don Quijote - Status & Plans9 Current Status  Current structure: DqCore DqFakePoolFileCatalog DqGlobusRls DqLcgPoolFileCatalog DqClassicReplicaAccessDqLcgReplicaAccess DqPoolRls DqConfigFile DqFactory DqInterfaceDqMonitor DqUI dms.py Python Module C++  Python wrapper (user interface) C++ Client Module DqLcgInfoService DqVdtInfoService DqNgInfoService DqServerLcg, DqServerNg, DqServerVdt

10/05/2004Don Quijote - Status & Plans10 NorduGrid  Globus RLS 2.x  Only Classic Storage Elements (GridFTP servers)  Information System o Connects to LDAP o Special attributes in the RLS DqCore DqFakePoolFileCatalog DqGlobusRls DqClassicReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqNgInfoService DqServerNg

10/05/2004Don Quijote - Status & Plans11 LCG-2  EDG/LCG RLS (v2.2)  GFAL support: o SRM/Castor support o SRM/dCache support o Classic Storage Element support  Information System: o LDAP-based (MDS)  Native POOL Support o Using POOL DqCore DqLcgPoolFileCatalog DqPoolRls DqLcgReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqLcgInfoService DqServerLcg

10/05/2004Don Quijote - Status & Plans12 US Grid 3(+)  Globus RLS 2.x  DQ supports at the moment only Classic Storage Elements (GridFTP servers)  No “information system” interface o DQ creates a “dummy” information system which consists of a local configuration file DqCore DqFakePoolFileCatalog DqGlobusRls DqClassicReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqVdtInfoService DqServerVdt

10/05/2004Don Quijote - Status & Plans13 Integration with ATLAS prodsys  Executors are using their “native” grid tools to do file registration o But are adding extra-metadata attributes required by DQ o This allows integration with DQ  Windmill is using DQ o To locate replicas of files o Renaming of logical files to their final names (after validation) o This week: move files across grids so that each executor finds at least a replica of all files required by the jobs

10/05/2004Don Quijote - Status & Plans14 Future plans  Better integration with POOL o Must come from end-users experience  Better end-user documentation and support o For now, focus has been only on the Automatic Production System  Get “best” replica (not high priority) o within a grid o between grids  Monitoring o Still being discussed…  Reliable transfer service o Using MySQL database to manage transfers and automatic retries

10/05/2004Don Quijote - Status & Plans15 Future plans  Release command line tools appropriate for end-users o Request has been made to provide such tools for the Combined Test Beam effort  Provide servers as Pacman-caches  Much to improve o Reliability o Easy installation of client tool for users outside “grid”  Get local copies of files to non-grid machine  ? wrap in Pacman the minimal Globus GridFTP libraries  As true interoperability comes, Don Quijote goes… o Common information schema & similar catalogs o Common interface to storage resource “managers”