David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.

Slides:



Advertisements
Similar presentations
Chap 2 System Structures.
Advertisements

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL June 23, 2003 GAE workshop Caltech.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Data Grid Web Services Chip Watson Jie Chen, Ying Chen, Bryan Hess, Walt Akers.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL March 18, 2004 ATLAS Software Workshop Grid session.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
David Adams ATLAS ATLAS Distributed Analysis Plans David Adams BNL December 2, 2003 ATLAS software workshop CERN.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Event Data History David Adams BNL Atlas Software Week December 2001.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Datasets on the GRID David Adams PPDG All Hands Meeting Catalogs and Datasets session June 11, 2003 BNL.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
SRM & SE Jens G Jensen WP5 ATF, December Collaborators Rutherford Appleton (ATLAS datastore) CERN (CASTOR) Fermilab Jefferson Lab Lawrence Berkeley.
David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5: Distributed Computing Systems and Experiences.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL November 17, 2003 SC2003 Phoenix.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
David Adams ATLAS ATLAS Distributed Analysis: Overview David Adams BNL December 8, 2004 Distributed Analysis working group ATLAS software workshop.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
David Adams ATLAS AJDL: Abstract Job Description Language David Adams BNL June 29, 2004 PPDG Collaboration Meeting Williams Bay.
David Adams ATLAS ADA: ATLAS Distributed Analysis David Adams BNL December 15, 2003 PPDG Collaboration Meeting LBL.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL May 19, 2003 BNL Technology Meeting.
David Adams ATLAS Hybrid Event Store Integration with Athena/StoreGate David Adams BNL March 5, 2002 ATLAS Software Week Event Data Model and Detector.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
ATLAS DDM Developing a Data Management System for the ATLAS Experiment September 20, 2005 Miguel Branco
Federating Data in the ALICE Experiment
OGF PGI – EDGI Security Use Case and Requirements
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
ATLAS Use and Experience of FTS
gLite Data management system overview
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
StoRM Architecture and Daemons
Introduction to Data Management in EGI
OGSA Data Architecture Scenarios
GSAF Grid Storage Access Framework
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Presentation transcript:

David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 2 Contents Files and datasets Metadata Logical and physical files VO’s sites and users LSE Timeouts Lifetimes Other operation parameters Claims VO FMS File transfer service Datasets Dataset catalogs DSMS Conclusions

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 3 Files and datasets The units of data management are files and datasets File is the smallest unit of data transfer Dataset specifies a collection of data More later DMS (data management system) can be decomposed into FMS – file management system and DSMS – dataset management system The DSMS depends on the FMS

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 4 Metadata For production or analysis, user specifies an input dataset Not just a list of files Existence of a dataset implies some consistency Same type of data for all events Similar provenance for all data No duplicate events Someone has decided these data belong together Search for a dataset can be done with query On dataset metadata Not file metadata Need well-designed DSC (dataset selection catalog) Limited need for file metadata Intrinsic parameters (size, checksum,…) for validation

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 5 Logical and physical files A file is typically assigned a logical identity (LFID) when it is entered into the FMS LFID = LFN or GUID (please choose) The FMS may later be used to access a physical replica of this file using this LFID Physical files come in may flavors: Posix file, i.e. directly accessible from OS Castor, dCache, gLite, … SRM Transfer protocols: ftp, http, gsiftp, … Assume FMS replicates file when it is entered User responsible for deleting the original file FMS manages (deletes) the first and all subsequent replicas

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 6 VO’s, sites and users Logical files are associated with a VO (virtual organization) E.g. ATLAS VO must guarantee uniqueness of LFID’s Sites are responsible for managing replicas Contract with VO to archive replicas –If the last replica is deleted, the logical file is inaccessible! –VO may archive a replica at multiple sites >Ensure data survival >Increase availability Provide service to transfer and stage files –Staged file is accessible to applications which support its protocol –Call this LSE: logical storage element >SE with logical file interface User interacts with LSE to register or access a file

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 7 LSE The LSE is a service running at a site Provides the interface with which users interact with the FMS Service rather than just command line interface because users may be remote Should provide the following Put (register) Input: –Accessible file reference (gsiftp, nfs, dcap, …) –Hint for creating LFID (e.g. the value) –Lifetime for logical file LSE copies file, assigns LFID and returns LFID to user –May take management rather than copy LSE then archives file until VO archives it elsewhere

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 8 LSE (cont) Get (stage) Input: –LFID –Ordered list of acceptable protocols for staging –Lifetime for staging LSE stages the file in accordance with one of the protocols and returns a corresponding URL –Until the file is released or the lifetime exceeded May be necessary to retrieve file from another site before staging Copy (archive) Input –LFID –Lifetime for archiving LSE retrieve file from another site and archives locally for the specified lifetime

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 9 Timeouts All operations will sometimes require a long time to complete Caller should be able to specify: Timeout – time after which to abort if operation is not complete Blocking mode – indicates whether operation may return before is complete Possible implementation: User provides –Tb = blocking timeout –Tnb = non-blocking timeout Operation must return within Tb Operation may continue for a time Tnb after return –Non blocking if Tnb = 0

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 10 Lifetimes Files should be assigned lifetimes To avoid cluttering catalogs and filling disks Files at all scopes –Logical (VO) file –Archived file replica –Staged file Multiple users will reserve a given file So a single lifetime will not work Introduce claims…

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 11 Other operation parameters User ID Need to know who registered make claims etc. Use the DN from authentication to look up the user File sets All operations should take a list of files in place of a single file Optimize bulk registrations, transfers, etc.

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 12 Claims Usage User request with lifetime is assigned a claim –Put, get, copy, … User may add claims on existing files –Logical, archived or site Claim owner may (should) release claim when done Claim owner may extend lifetime of claim Behavior Each claim has an expiration time (now plus lifetime) Claim is active until released or expired File may have multiple active claims File should not be deleted while claim is active Accounting Claims provide mechanism for accounting

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 13 VO FMS Motivation Most FMS functionality provided by site service (LSE) There are also data and operations that are not site specific  VO file management system Functionality Catalog logical LFID and their attributes Copy of VO claims on archived files –So VO know where files are supposed to be archived –Other replicas may exist >User claims >Local caching without claims Could also keep a more complete replica catalog –Do we want a comprehensive replica catalog?

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 14 File transfer service Motivation Desire for peak transfer rate will often exceed available bandwidth Need to prioritize requests from different users and VO’s Manage file transfers with FTS FTS: file transfer service Runs at site and has responsibility for transferring files to the site Maintains queues and assigns priorities like a batch system Primary ATLAS user would be the LSE –Which handles logical file registration after transfer Implementations from gLite, Condor, DQ (?), … If site does not provide FTS LSE must use gsiftp or Provide rudimentary FTS functionality

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 15 Datasets Files are not the data view we want to present to users Typical data collection spans a large number of files Physics metadata is properly associated with a collection –All files have the same physics metadata Introduce datasets May span multiple files Need not include all the data in these files May be hierarchical (composed of other datasets) May be virtual, i.e. have multiple representations –E.g. selected events in these files or a copy of those events into a file Portable description of each dataset (XML, C++, Python) Catalog dataset metadata Replicates some of the data in the dataset descriptions

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 16 Dataset catalogs Dataset repository XML descriptions indexed by ID At present cannot query on attributes DSC: Dataset selection catalog Datasets of interest to physicists are named and assigned attributes Physicist can query DSC to find a dataset of interest Need to identify the relevant attributes Current (rudimentary) implementation, see query page at – Dataset placement catalogs Sites where datasets (all their files) can be found Distinguish archived and staged?

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 17 DSMS As with FMS, most of the DSMS is site-based. Dataset placement service For users and processing system Find which datasets installed at a site Methods to put, get and copy –Layer over same methods in FMS –Operate on all the files in the dataset –For virtual datasets, system may choose between different representations (file sets) Datasets have lifetimes like files –Recorded using claims –Distinct for dataset defined, placed and maybe staged >Analogous to logical, archived and staged files –File may be deleted when no datasets hold references

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 18 Conclusions Have seen a high-level outline of a DMS (data MS) DSMS (Dataset MS) layered on top of FMS (file management system) Three scopes for files Logical, archived and staged Same for datasets Defined, placed and staged Provide lifetime management at all scopes Claim mechanism –Support multiple users of a file or dataset –Able to release a claim, extend it or let it expire Use dataset lifetime to control file lifetime –I.e. dataset claims it files

David Adams ATLAS ATLAS SW Workshop ATLAS Distributed Data Mgmt22feb05 19 Conclusions (cont) Mostly site based services Scalable Keep service close to the relevant metadata Lightweight clients Also need a VO-based component File transfer service Balance load Prioritize More information ADA documents page – Datasets for the grid File management on the grid Dataset management