Presentation is loading. Please wait.

Presentation is loading. Please wait.

Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC

Similar presentations


Presentation on theme: "Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC"— Presentation transcript:

1 Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC miguel.branco@cern.ch

2 10/05/2004Don Quijote - Status & Plans2 Overview  Don Quijote o New Focus  Functionalities o POOL  Architecture  Current Status o NorduGrid o US Grid 3(+) o LCG-2 o Integration with ATLAS prodsys  Future plans

3 10/05/2004Don Quijote - Status & Plans3 Don Quijote  Data Management for the ATLAS Automatic Production System  Allow transparent registration and movement of replicas between all grid “flavors” used by ATLAS o US Grid o Nordugrid o LCG o (support for legacy systems might be introduced soon)  Avoid creating yet another catalog o which grid middleware wouldn't recognize (e.g Resource Brokers) o use existing catalogs and data management tools o find common features between tools and catalogs o bridge them and provide a unified interface  Accessible as a service o lightweight clients

4 10/05/2004Don Quijote - Status & Plans4 Don Quijote – new focus  Provide a single tool to end-users to manage data files o Integrates all tools that users would have to know about into a single one. E.g.:  FCpublish, FCregister, … (POOL File Catalogs)  edg-rm, edg-rmc, edg-lrc, … (EDG)  globus-rls-cli, globus-url-copy, … (Globus)  ldapsearch, … (querying information system)  rfdir, rfcp, … (common use of Castor)  Acts as a POOL-aware Replica Manager  Eases security requirements for end-users o Temporarily!

5 10/05/2004Don Quijote - Status & Plans5 Functionalities Replica Catalogs Manipulation File Movement LPN = Logical Collection Name + Logical File Name (unique)  search | fullSearch | searchHosts ( lpn )  add[Restricted] ( lpn, url [, guid, fsize, md5sum ] )  addTemporary[Restricted] ( lpn, url, nrhours [, guid, fsize, md5sum ] )  keepUntil ( url, nrhours )  makePermanent ( url )  removeReplica ( url )  remove ( lpn )  rename ( old lpn, new lpn )  stageOut( url )  getToDestination ( src SE, lpn, dest )  putToSE ( src turl, lpn, dest SE [, guid, md5sum] )

6 10/05/2004Don Quijote - Status & Plans6 Functionalities - POOL  Integrates file movement with POOL XML File Catalogs o Uses DQ + POOL FC command line tools o Python scripts  Use-cases: o Get local copy of file and generate or update corresponding PoolFileCatalog.xml  (to provide input data and input POOL XML catalog for a job) o Copy and register a local copy of a file to a grid flavor given UUID in the local PoolFileCatalog.xml  (to register output data from a job)

7 10/05/2004Don Quijote - Status & Plans7 Architecture  Python Client o C++ client library o Configuration file indicating endpoint of each server  Servers o Per grid-flavor o GSI and insecure o Configuration file User interface tool written in Python Servers and client library written in C++

8 10/05/2004Don Quijote - Status & Plans8 Changes on Server-side  Why was server-side code rewritten? o Partly because of CMS experience  Persistent connections were necessary  Connection pooling mechanism  Each request could not instantiate a connection to the grid catalog – too slow! o Partly from our initial experience  Flexible security mechanism Either provide a single certificate for all, or delegate credentials  Initial version: o A command line tool for each grid flavor with the same syntax and same “output” o Clarens server was forking out a process that executed the request by calling the command line tool o This proved to be inefficient and too restrictive – e.g. could not maintain persistent connections across multiple requests!  Therefore, o Server code was built by extending the command line tools – each tool is now a daemon

9 10/05/2004Don Quijote - Status & Plans9 Current Status  Current structure: DqCore DqFakePoolFileCatalog DqGlobusRls DqLcgPoolFileCatalog DqClassicReplicaAccessDqLcgReplicaAccess DqPoolRls DqConfigFile DqFactory DqInterfaceDqMonitor DqUI dms.py Python Module C++  Python wrapper (user interface) C++ Client Module DqLcgInfoService DqVdtInfoService DqNgInfoService DqServerLcg, DqServerNg, DqServerVdt

10 10/05/2004Don Quijote - Status & Plans10 NorduGrid  Globus RLS 2.x  Only Classic Storage Elements (GridFTP servers)  Information System o Connects to LDAP o Special attributes in the RLS DqCore DqFakePoolFileCatalog DqGlobusRls DqClassicReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqNgInfoService DqServerNg

11 10/05/2004Don Quijote - Status & Plans11 LCG-2  EDG/LCG RLS (v2.2)  GFAL support: o SRM/Castor support o SRM/dCache support o Classic Storage Element support  Information System: o LDAP-based (MDS)  Native POOL Support o Using POOL-1.6.5 DqCore DqLcgPoolFileCatalog DqPoolRls DqLcgReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqLcgInfoService DqServerLcg

12 10/05/2004Don Quijote - Status & Plans12 US Grid 3(+)  Globus RLS 2.x  DQ supports at the moment only Classic Storage Elements (GridFTP servers)  No “information system” interface o DQ creates a “dummy” information system which consists of a local configuration file DqCore DqFakePoolFileCatalog DqGlobusRls DqClassicReplicaAccess DqConfigFile DqFactory DqInterfaceDqMonitor DqUI DqVdtInfoService DqServerVdt

13 10/05/2004Don Quijote - Status & Plans13 Integration with ATLAS prodsys  Executors are using their “native” grid tools to do file registration o But are adding extra-metadata attributes required by DQ o This allows integration with DQ  Windmill is using DQ o To locate replicas of files o Renaming of logical files to their final names (after validation) o This week: move files across grids so that each executor finds at least a replica of all files required by the jobs

14 10/05/2004Don Quijote - Status & Plans14 Future plans  Better integration with POOL o Must come from end-users experience  Better end-user documentation and support o For now, focus has been only on the Automatic Production System  Get “best” replica (not high priority) o within a grid o between grids  Monitoring o Still being discussed…  Reliable transfer service o Using MySQL database to manage transfers and automatic retries

15 10/05/2004Don Quijote - Status & Plans15 Future plans  Release command line tools appropriate for end-users o Request has been made to provide such tools for the Combined Test Beam effort  Provide servers as Pacman-caches  Much to improve o Reliability o Easy installation of client tool for users outside “grid”  Get local copies of files to non-grid machine  ? wrap in Pacman the minimal Globus GridFTP libraries  As true interoperability comes, Don Quijote goes… o Common information schema & similar catalogs o Common interface to storage resource “managers”


Download ppt "Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC"

Similar presentations


Ads by Google