Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.

Similar presentations


Presentation on theme: "David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop."— Presentation transcript:

1 David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop

2 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 2 Contents Files and datasets Metadata Logical and physical files VO’s sites and users LSE Timeouts Lifetimes Other operation parameters Claims VO FMS File transfer service Datasets Dataset catalogs DSMS Conclusions

3 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 3 Files and datasets The units of data management are files and datasets File is the smallest unit of data transfer Dataset specifies a collection of data More later DMS (data management system) can be decomposed into FMS – file management system and DSMS – dataset management system The DSMS depends on the FMS

4 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 4 Metadata For production or analysis, user specifies an input dataset Not just a list of files Existence of a dataset implies some consistency Same type of data for all events Similar provenance for all data No duplicate events Someone has decided these data belong together Search for a dataset can be done with query On dataset metadata Not file metadata Need well-designed DSC (dataset selection catalog) Limited need for file metadata Intrinsic parameters (size, checksum,…) for validation

5 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 5 Logical and physical files A file is typically assigned a logical identity (LFID) when it is entered into the FMS LFID = LFN or GUID (please choose) The FMS may later be used to access a physical replica of this file using this LFID Physical files come in may flavors: Posix file, i.e. directly accessible from OS Castor, dCache, gLite, … SRM Transfer protocols: ftp, http, gsiftp, … Assume FMS replicates file when it is entered User responsible for deleting the original file FMS manages (deletes) the first and all subsequent replicas

6 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 6 VO’s, sites and users Logical files are associated with a VO (virtual organization) E.g. ATLAS VO must guarantee uniqueness of LFID’s Sites are responsible for managing replicas Contract with VO to archive replicas –If the last replica is deleted, the logical file is inaccessible! –VO may archive a replica at multiple sites >Ensure data survival >Increase availability Provide service to transfer and stage files –Staged file is accessible to applications which support its protocol –Call this LSE: logical storage element >SE with logical file interface User interacts with LSE to register or access a file

7 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 7 LSE The LSE is a service running at a site Provides the interface with which users interact with the FMS Service rather than just command line interface because users may be remote Should provide the following Put (register) Input: –Accessible file reference (gsiftp, nfs, dcap, …) –Hint for creating LFID (e.g. the value) –Lifetime for logical file LSE copies file, assigns LFID and returns LFID to user –May take management rather than copy LSE then archives file until VO archives it elsewhere

8 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 8 LSE (cont) Get (stage) Input: –LFID –Ordered list of acceptable protocols for staging –Lifetime for staging LSE stages the file in accordance with one of the protocols and returns a corresponding URL –Until the file is released or the lifetime exceeded May be necessary to retrieve file from another site before staging Copy (archive) Input –LFID –Lifetime for archiving LSE retrieve file from another site and archives locally for the specified lifetime

9 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 9 Timeouts All operations will sometimes require a long time to complete Caller should be able to specify: Timeout – time after which to abort if operation is not complete Blocking mode – indicates whether operation may return before is complete Possible implementation: User provides –Tb = blocking timeout –Tnb = non-blocking timeout Operation must return within Tb Operation may continue for a time Tnb after return –Non blocking if Tnb = 0

10 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 10 Lifetimes Files should be assigned lifetimes To avoid cluttering catalogs and filling disks Files at all scopes –Logical (VO) file –Archived file replica –Staged file Multiple users will reserve a given file So a single lifetime will not work Introduce claims…

11 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 11 Other operation parameters User ID Need to know who registered make claims etc. Use the DN from authentication to look up the user File sets All operations should take a list of files in place of a single file Optimize bulk registrations, transfers, etc.

12 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 12 Claims Usage User request with lifetime is assigned a claim –Put, get, copy, … User may add claims on existing files –Logical, archived or site Claim owner may (should) release claim when done Claim owner may extend lifetime of claim Behavior Each claim has an expiration time (now plus lifetime) Claim is active until released or expired File may have multiple active claims File should not be deleted while claim is active Accounting Claims provide mechanism for accounting

13 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 13 VO FMS Motivation Most FMS functionality provided by site service (LSE) There are also data and operations that are not site specific  VO file management system Functionality Catalog logical LFID and their attributes Copy of VO claims on archived files –So VO know where files are supposed to be archived –Other replicas may exist >User claims >Local caching without claims Could also keep a more complete replica catalog –Do we want a comprehensive replica catalog?

14 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 14 File transfer service Motivation Desire for peak transfer rate will often exceed available bandwidth Need to prioritize requests from different users and VO’s Manage file transfers with FTS FTS: file transfer service Runs at site and has responsibility for transferring files to the site Maintains queues and assigns priorities like a batch system Primary ATLAS user would be the LSE –Which handles logical file registration after transfer Implementations from gLite, Condor, DQ (?), … If site does not provide FTS LSE must use gsiftp or Provide rudimentary FTS functionality

15 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 15 Datasets Files are not the data view we want to present to users Typical data collection spans a large number of files Physics metadata is properly associated with a collection –All files have the same physics metadata Introduce datasets May span multiple files Need not include all the data in these files May be hierarchical (composed of other datasets) May be virtual, i.e. have multiple representations –E.g. selected events in these files or a copy of those events into a file Portable description of each dataset (XML, C++, Python) Catalog dataset metadata Replicates some of the data in the dataset descriptions

16 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 16 Dataset catalogs Dataset repository XML descriptions indexed by ID At present cannot query on attributes DSC: Dataset selection catalog Datasets of interest to physicists are named and assigned attributes Physicist can query DSC to find a dataset of interest Need to identify the relevant attributes Current (rudimentary) implementation, see query page at –http://www.atlasgrid.bnl.gov/dialds/dlShowMain.plhttp://www.atlasgrid.bnl.gov/dialds/dlShowMain.pl Dataset placement catalogs Sites where datasets (all their files) can be found Distinguish archived and staged?

17 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 17 DSMS As with FMS, most of the DSMS is site-based. Dataset placement service For users and processing system Find which datasets installed at a site Methods to put, get and copy –Layer over same methods in FMS –Operate on all the files in the dataset –For virtual datasets, system may choose between different representations (file sets) Datasets have lifetimes like files –Recorded using claims –Distinct for dataset defined, placed and maybe staged >Analogous to logical, archived and staged files –File may be deleted when no datasets hold references

18 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 18 Conclusions Have seen a high-level outline of a DMS (data MS) DSMS (Dataset MS) layered on top of FMS (file management system) Three scopes for files Logical, archived and staged Same for datasets Defined, placed and staged Provide lifetime management at all scopes Claim mechanism –Support multiple users of a file or dataset –Able to release a claim, extend it or let it expire Use dataset lifetime to control file lifetime –I.e. dataset claims it files

19 David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 19 Conclusions (cont) Mostly site based services Scalable Keep service close to the relevant metadata Lightweight clients Also need a VO-based component File transfer service Balance load Prioritize More information ADA documents page –http://www.usatlas.bnl.gov/ADA/docshttp://www.usatlas.bnl.gov/ADA/docs Datasets for the grid File management on the grid Dataset management


Download ppt "David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop."

Similar presentations


Ads by Google