Presentation is loading. Please wait.

Presentation is loading. Please wait.

MAGDA Roger Jones UCL 16 th December 2002. RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.

Similar presentations


Presentation on theme: "MAGDA Roger Jones UCL 16 th December 2002. RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng."— Presentation transcript:

1 MAGDA Roger Jones UCL 16 th December 2002

2 RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng DengTorre Wenaus  Magda is a distributed data manager prototype for grid- resident data.  The system is designed for rapid and flexible evolution of database schema and surrounding infrastructure, integration and interchange of third party components, etc.  Web service (via perl::SOAP) and command line interfaces  C++, Java and Perl APIs for access to all components of the database  C++, Java APIs autogenerated by perl scripts from the mySQL database, so always synchronised  Developed as part of PPDG

3 RWL Jones, Lancaster University Magda Documents and Installation Status  User guide http://www.atlasgrid.bnl.gov/magdadoc/userguide.htm http://www.atlasgrid.bnl.gov/magdadoc/userguide.htm  In preparation. Suggestions are welcome.  Useful introduction - SC2002 hand out  http://www.atlasgrid.bnl.gov/magdademo/sc2002_poster.ppt Servers (web and database) at BNL http://www.atlasgrid.bnl.gov/magdademo/sc2002_poster.ppt  AFS clients available at:  /afs/usatlas.bnl.gov/project/magda/current  /afs/cern.ch/atlas/maxidisk/d94/wenaus/wdeng/atlas_magda/magda_ setup  Installation document  http://www.atlasgrid.bnl.gov/magdadoc/userguide.htm#3 http://www.atlasgrid.bnl.gov/magdadoc/userguide.htm#3

4 RWL Jones, Lancaster University Magda Usage  Total 327k files occupying 26 TB  38k DC1 files (mostly gathered using spiders)  Transferred more than 4 TB data between CERN castor and BNL HPSS since the start of DC1  4k U.S. Grid Testbed DC1 files and replicas registered using magda tools  Tokyo and Lyon tried for DC1, other sites being added progressively.  RAL is a priority (large store)  GDMP and Reptor integration now underway but we need these (production level) tools now

5 RWL Jones, Lancaster University Stores currently accessed  NFS and AFS disk areas at US ATLAS Tier 1, CERN  ATLAS pools in the CERN staging system  CERN Castor mass store (ATLAS storage areas, eg. testbeam data)  US ATLAS Tier 1 HPSS 'rftp' service (the HPSS access mode that US ATLAS currently has access to)  ATLAS code repository contents  Personal data areas  MSS Locations at US ATLAS grid testbed sites (ANL, LBNL, Boston, Indiana)  Also Lyon, Tokyo, …

6 RWL Jones, Lancaster University MAGDA Entities  prime: File catalog.  Catalogs all instances of all files in the system.  logical: Logical filename catalog. Metadata about logical files (associated keys) not specific to particular physical instances.  site: A computing facility, may have many data stores  e.g CERN CASTOR  location: Data locations (eg. directory, staging pool).  Associated with a particular site.  Given location designated as either a 'prime' or 'replica' location.  host: Computers on which system runs or which provide access  Is the means by which the spider knows  Where it is  What locations it can scan  collection: Collections of logical files.  collectionContent: Logical file lists for collections.  task: Catalog of replication tasks.  generic_sig: Generic 'data signature‘ sufficient for regeneration  Identifies equivalent data sets

7 RWL Jones, Lancaster University Host 2 Location Cache Disk Site Location Mass StoreS ite Source to cache stagein Source to dest transfer MySQL Synch via DB Host 1 Collection of logical files to replicate Spider gridftp,bbftp,scp Register replicas Catalog updates Cache Location Disk Site Location Mass StoreS ite Replication tasks Magda Architecture

8 RWL Jones, Lancaster University Magda Command-line Tools  Type tools without parameters - get usage info  Calls ‘globus-url-copy’ internally, and ‘globus-job-run’ to interact with HPSS  magda_findfile: searches the magda database  magda_putfile: extended to work with Lyon HPSS recently  magda_getfile:  magda_delete:  Usage: magda_delete filerecord  magda_delete filerecord  magda_delete location

9 RWL Jones, Lancaster University Magda Examples  $ magda_findfile dc1.002107.simul.0024 --sub  LFN://atlas.org/test.dc1.002107.simul.0024.hlt.eta_scan.zebra site=usatlasrftp path=… size=28188000 primary  LFN://atlas.org/dc1.002107.simul.0024.hlt.eta_scan.zebra site=utatlasfarm path=… size=28188000  also shows.his and.log files  $ magda_getfile dc1.002107.simul.0024.hlt.eta_scan.log  … Instance at usatlasrftp:/home/grid_a/simul/002107/log remotely accessible.  Instance at utatlasfarm:/opt/testbed/cache/replica remotely accessible.  globus-url-copy -p 3 gsiftp://atlas000.uta.edu/opt/testbed/cache/replica/dc1.002107.simul.0024.hlt.eta_scan.lo g file:///tmp/dc1.002107.simul.0024.hlt.eta_scan.log 2>&1  File dc1.002107.simul.0024.hlt.eta_scan.log staged into local directory  LFN follows EDG form  Multiple versions are handled

10 RWL Jones, Lancaster University Magda Replication  Automated file replication is supported  Definition of replication tasks:  collection of files to be replicated  information on source location, including a cache collection if needed  information on the file transport mechanism (currently gridftp, bbftp and scp)  information on destination location, including a destination-side cache if necessary

11 RWL Jones, Lancaster University Magda File Spider  File spider processes run as cron jobs on distributed hosts to fill catalog and keep the catalog up to date  Based on the host it is running on, it determines which sites and locations are accessible and updates them  Catalog entry is deleted if file is removed  Run ‘crontab –e’ to set it up as a cron job, useful info  /afs/usatlas.bnl.gov/project/magda/current/*.cron  Spider can be invoked from command-line  dyFileSpider.pl [site:location]  magda_putfile is preferred for positive registration in production scripts

12 RWL Jones, Lancaster University 1. submit jobs 2. check status 3. move outputs, catalog them 4. check partition, may repeat step 3 5. list statistics MySQL server Mass storage Remote linux farm magda_putfile - third party transfer, put files to BNL HPSS directly and register them with the magda database magda_putfile - copy replicas to disk store and register them with the magda database magda_findfile - search the magda database magda_findfile - search the magda database Magda in U.S. Grid Testbed DC1

13 RWL Jones, Lancaster University Magda Production Database  Magda production database capability was used in U.S. Grid Testbed for DC1  Jobinfo:  filename, submithost, processhost, joburl, moddate, primestore  Jobstatus:  project, dataset, step, partition, finished, joburl, started, group, filename, dirname, extra  Very useful feature for general ATLAS DC production management

14 RWL Jones, Lancaster University Magda Future Plans  Integration with ATLAS MetaData Iinterface for DC analysis  Will integrate Hierarchical Resource Manager (HRM) with the command line tools  Implementation of managing files distributed on the local disk of each node of a Linux farm  When file records go up to the order of millions, scalability is an important issue. Will look into grid catalog service (RLS)  Being evaluated by other experiments (STAR)


Download ppt "MAGDA Roger Jones UCL 16 th December 2002. RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng."

Similar presentations


Ads by Google