Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.

Similar presentations


Presentation on theme: "Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012."— Presentation transcript:

1 Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012

2 POOL – from A. Valassi LHCb stopped using POOL (will collect statement for report) "ATLAS will continue to need support for POOL, including any relevant software patches and releases, for as long as the 2012 production version of the ATLAS software is actively used. It is based on the LCG61 configuration, which includes a version of POOL centrally built and maintained by the core team in IT-ES (using ROOT 5.30). ATLAS will no longer need support for POOL from IT-ES for their releases based on the LCG62 configuration (using ROOT 5.32), where a custom software package derived from POOL is built and maintained by ATLAS as part of their internal software. The first such release already exists and will be used as an ATLAS development release in 2012; this will eventually become the production version of the ATLAS software, by end 2012-beg 2013.”

3 Data Placement Data has been placed (pushed to sites) from the the start of WLCG – Traditional transfer method implemented by FTS All experiments are placing data - Alice is pushing data (via xroot) Pre-placement has unbeatably low latency, but is constrained by available local space Optimization of efficiency requires knowledge of expected frequency of access (popularity) which is a steeply falling distribution with a long tail Pre-placement is not expected to disappear – But now complemented by new federation concept in ALICE, ATLAS, CMS

4 Placement Responsibilities Experiment – Data Management – Define geographical data (sets) distribution within available resources according to current priorities Site – Storage Management – Maintain stable storage for placed data..but Experiments regularly take over repair tasks – Support access from experiment jobs Bandwidth, availability and access latency(eg tape) have directly impact on cpu/wall ratio

5 Placement on Disk Which data is on disk at Tier 1 sites – used to be a storage management decision? Well, at least it was optimized by the storage element s/w – Today a conscious data management (experiment) decision Source of data transfer to disk may not be only a local tape but another T1 disk – Another reason to question the traditional HSM approach

6 Today/Tomorrow The split between data management and storage management is actually a (still rather minimal) model – Useful to define (people/component) responsibility split – And interface problems: Eg dark data can make it hard for sites to leave experiments full data management responsibility without getting stuck (disk full) (Single) sites can not easily check consistency of data file content, consistency with non-local catalogues What are the “data/storage repair” responsibilities / agreements between sites and experiments?

7 Maybe not today – but better soon Where is the rest of the model? – If there wasn’t a concrete one when the current system was designed, can we produce a strawman model for what we think is in place now? Relationship between data management, transfer components, storage management, storage components Implied responsibility split between experiments & sites Areas where joint operation is required and how the s/w supports the organization of these joint tasks – And for how recent changes are “changing the model” ?

8 LHCb input to DM and SM TEGs

9 Remarks to DM and SM TEGS Introduction m We have already provided some input during our dedicated session of the TEG m Here are a list of questions we want to address for each session o Not exhaustive, limited to one slide per session m It would have been useful to hear what are the proposals for evolution of SM and DM as well o These should be proposals, not diktat o Extensive discussions are needed with users and sites before embarking on full scale “prototypes” that no longer are prototypes m Whatever the future is, WLCG must ensure the long term support (no external “firm”) o Do not underestimate the amount of work needed for users to adapt o Therefore plan well ahead, gather functionality requirements… F2F Data and Storage Management TEG, Amsterdam9

10 Remarks to DM and SM TEGS F2F Data and Storage Management TEG, Amsterdam10 Data Placement and Federation m Does it imply replica catalogs are no longer needed? o How is brokering of jobs done? o Random or “where most files are” m When is data transfer performed? o By the WMS: implies a priori brokering, incompatible with pilot jobs o By the job: inefficiency of slots o Is it a cache (life time?) or just a download facility? m What is the advantage compared to placement using popularity? o Limited number of “master replicas” (e.g. 2) o Add replicas when popularity increases o Remove replicas when popularity decreases o … but still with a catalog and job brokering m What is the minimal number of sites for which it becomes worth it?


Download ppt "Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012."

Similar presentations


Ads by Google