Report TEG WLCG Data and Storage Management Giacinto DONVITO INFN-IGI 14/05/12Workshop INFN CCR - GARR 2012 1.

Slides:



Advertisements
Similar presentations
Data Management TEG Status Dirk Duellmann & Brian Bockelman WLCG GDB, 9. Nov 2011.
Advertisements

Distributed Xrootd Derek Weitzel & Brian Bockelman.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Distributed Tier1 scenarios G. Donvito INFN-BARI.
Data & Storage Management TEGs Summary of recommendations Wahid Bhimji, Brian Bockelman, Daniele Bonacorsi, Dirk Duellmann GDB, CERN 18 th April 2012.
TEG next steps Ian Bird, CERN GDB CERN, 9 th May 2012.
EU-GRID Work Program Massimo Sgaravatto – INFN Padova Cristina Vistoli – INFN Cnaf as INFN members of the EU-GRID technical team.
ALICE data access WLCG data WG revival 4 October 2013.
Storage TEG “emerging” observations and recommendations Wahid Bhimji With contributions from the SM editors (listed in intro)
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
LHCb input to DM and SM TEGs. Remarks to DM and SM TEGS Introduction m We have already provided some input during our dedicated session of the TEG m Here.
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
Workshop summary Ian Bird, CERN WLCG Workshop; DESY, 13 th July 2011 Accelerating Science and Innovation Accelerating Science and Innovation.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CERN IT Department CH-1211 Geneva 23 Switzerland GT WG on Storage Federations First introduction Fabrizio Furano
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Summary of Data Management Jamboree Ian Bird WLCG Workshop Imperial College 7 th July 2010.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
WebFTS File Transfer Web Interface for FTS3 Andrea Manzi On behalf of the FTS team Workshop on Cloud Services for File Synchronisation and Sharing.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Evolution of storage and data management Ian Bird GDB: 12 th May 2010.
Procedure for proposed new Tier 1 sites Ian Bird WLCG Overview Board CERN, 9 th March 2012.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
Storage Classes report GDB Oct Artem Trunov
Data management demonstrators Ian Bird; WLCG MB 18 th January 2011.
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
Storage Management Technical Evolution Group Wahid Bhimji Daniele Bonacorsi.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Wahid Bhimji (Some slides are stolen from Markus Schulz’s presentation to WLCG MB on 19 June Apologies to those who have seen some of this before)
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Evolution of WLCG Data & Storage Management Outcome of Amsterdam.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Federating Data in the ALICE Experiment
Evolution of storage and data management
WLCG Workshop 2017 [Manchester] Operations Session Summary
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
EGEE Middleware Activities Overview
Vincenzo Spinoso EGI.eu/INFN
WLCG Data Steering Group
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Storage Interfaces and Access: Introduction
Storage / Data TEG Introduction
Introduction to Data Management in EGI
Taming the protocol zoo
Final summary Ian Bird Amsterdam, DAaM 18th June 2010.
Ákos Frohner EGEE'08 September 2008
OGSA Data Architecture Scenarios
The INFN Tier-1 Storage Implementation
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Report TEG WLCG Data and Storage Management Giacinto DONVITO INFN-IGI 14/05/12Workshop INFN CCR - GARR

Outlook What are WLCG TEGs and why we need them? How they are composed? What they deal about? How they have worked? Reports and Recommendations Conclusions and Future 14/05/12Workshop INFN CCR - GARR

Mandate: To reassess the implementation of the grid infrastructures that we use in the light of the experience with LHC data, and technology evolution, but never forgetting the important successes and lessons, and ensuring that any evolution does not disrupt our successful operation. The work should: Document a strategy for evolution of the technical implementation of the WLCG distributed computing infrastructure. This strategy should provide a clear statement of needs for WLCG, which can also be used to provide input to any external middleware and infrastructure projects. Deliverables: Assessment of the current situation with middleware, operations, and support structure. Strategy document setting out a plan and needs for the next 2-5 years. What are WLCG TEGs and why we need them? 14/05/12Workshop INFN CCR - GARR

Several groups – most relevant here are Data Management (chairs: Brian Bockelman, Dirk Duellmann)Data Management Storage Management (chairs: Wahid Bhimji, Daniele Bonacorsi)Storage Management Several INFN peoples: Daniele Bonacorsi, Giacinto Donvito, Luca Dell’Agnello, Riccardo Zappi, Vladimir Sapunenko DM-SM TEGs started their activity with a survey for all the members where each could provide feedback on the issues and success stories from his/her point of view Very detailed survey was asked to the 4 LHC experiments too 14/05/12Workshop INFN CCR - GARR How they are composed?

Data Management DM.1 Review of the Data Management demonstrators from summer DM.2 Dataset management and Data placement (policy‐based or dynamic) DM.3 Data federation strategies DM.4 Transfers and WAN access protocols (HTTP, xrootd, gsiftp) DM.5 Data transfer management (FTS) DM.6 Understanding data accessibility and security requirements/needs DM.7 POOL DM.8 ROOT, Proof DM.9 Namespace management. DM.10 Management of catalogues (LFC, future direcions) 14/05/12Workshop INFN CCR - GARR What they deal about?

Storage Management SM.1 Experiment I/O usage patterns SM.2 Requirements and evolution of storage systems SM.3 Separation of archives and disk pools/caches SM.4 Storage system interfaces to Grid SM.5 Filesystems/protocols (standards?) SM.6 Security/access controls SM.7 Site-run services. 14/05/12Workshop INFN CCR - GARR What they deal about?

Starting since the first meeting it was clear that the two TEGs overlaps in many areas Most of the meetings and the activities were carried on together As the Face-to-face meeting held in Amsterdam (Jan 24 th -25 th ) 14/05/12Workshop INFN CCR - GARR What they deal about?

Process Informatio n Gathering Synthesis / Exploration / Orientation Refinement Initial questionnairequestionnaire Defined topics [TopicsDataStorageTEG]TopicsDataStorageTEG Soon Data / Storage TEG merged really.. Questions to experiments: Experiment Presentations and Twikis [ALICE; ATLAS;CMS;LHCb]ALICE ATLASCMSLHCb Storage Middleware presentations: Face-to-face session for each topic plus broader discussions. Developed : Layer Diagram: Overarching picture Recommendations under each topic See: Final and draft report “Emerging” recommendations Recommendations Nov 2011 Jan 2012: Face-to-face Apr 2011 Feb 2012: GDB

9 Data Placement

The archive layer has two different responsibilities: Cost-effective scaling in storage volume, not client access bandwidth Increasing reliability by provisioning a separate copy of files on a different storage media to decrease the risk of data loss (due to software or operational mistakes) In the future it may happen that the tape as a medium may get progressively replaced by disk-based solutions (reducing latency, but increasing the power budget), But it is not guaranteed, as tape densities are increasing In the long term, the majority of the archive layer will likely move with the consumer market to random-access bulk storage We expect that the WLCG storage volume will shrink relative to typical market volumes 14/05/12Workshop INFN CCR - GARR Archive Layer Description/status Expected Evolution

Using placed data has the important benefit of not introducing additional transfer latency or wide-area network (WAN) activity for jobs that process this data. Predicting which datasets will be popular is a non trivial task that has resulted initially in an inefficient use of the available storage space and network capacity. Therefore, experiments have introduced an active monitoring system, which collects and aggregates information on file popularity Traditionally, the disk pools on the placement layer have been integrated via a hierarchical storage management system (HSM) with local archive resources to transparently manage the files available on disk The WLCG usage pattern involves data scans, which are well-known to perform poorly on LRU caches 14/05/12Workshop INFN CCR - GARR Placement Layer

At this point in time, all experiments can work with both scenarios - HSM coupled storage or split disk and archive components some experiments have a preference to extend the split architecture to become a strategic direction over the next years 14/05/12Workshop INFN CCR - GARR Placement Layer

13 Placement with Federation

The pre-placement data movement has been more recently complemented by the federation approach the redirection capabilities of the client access protocol are used to increase data availability beyond the level a single site can provide Requirements: A common client access protocol A deterministic mapping from site-local file names for replicas to the experiment’s namespace Federated data is read-only and replicas of a file are identical To hide local unavailability due to service problems for a small subset of jobs, which would otherwise fail and be resubmitted 14/05/12Workshop INFN CCR - GARR Federation Layer

Focused work on an http plugin for xrootd Establish a monitoring of the aggregate network bandwidth used via federation mechanisms Launch and keep alive storage working groups to follow up a list of technical topics Detailing the process of publishing new data into the read-only placement layer Investigating a more strict separation of read-only and read-write data Feasibility of moving a significant fraction of the current (read- only) data to world readable access Investigating federation as repair mechanism of placed data 14/05/12Workshop INFN CCR - GARR Recommendation Federation

16 Storage Element Components

17 Examples of current SE’s

18 Examples of (possible) future SE’s

We observe that each LHC experiment has implemented its own cataloguing software for the dataset namespace For the file namespace, ALICE and CMS again have unique, internally- developed cataloguing software. However, LHCb and ATLAS currently share a common piece of software LFC, supported by EMI both plan to remove their use of this software in the medium term. The WLCG should plan for the LFC to become experiment-specific software, then eventually unused as an experiment catalogue in the medium-term. Particularly, we should advise for EMI (and subsequent projects) of this fact. In the meantime, maintenance will likely be needed. We believe the LFC may be repurposed by subsequent EMI projects; i.e., as a central redirector for a federation system. 14/05/12Workshop INFN CCR - GARR Catalogues and Namespaces

All of the LHC experiments seem to be working fine with (or towards) splitting disk caches from tape archives. ALICE, ATLAS and LHCb are split, while CMS has a work plan in progress. The experiments require a separation between archives and disk caches. At a first glance, FTS seems to provide the needed functionalities to transfer data from the disk cache to the disk buffer in front of the tape system. We hope to verify this in the next version, but thought should be given to if FTS is the most appropriate tool for scheduling, or if a different concept or architecture should be developed. None of the experiments want to “drop” the useful functionality of HSM in managing a disk buffer in front of the archive Management of archive storage in the model described potentially moves from within a single storage system and involves the transfer layer. Experiment workflows may need to be adapted accordingly and so tools such as FTS should support the required features as in the MT recommendations 14/05/12Workshop INFN CCR - GARR Archive/Disk Separation

Flexible integration with the experiment workflow management systems, supporting different scheduling strategies Ability to manage a large number of endpoints, including the management of fair-share and priorities. Management of the staging process to move data from the Archive to the Placement Layer. Fault tolerant behavior, resuming interrupted transfers, retries and the use of replicas (in case source files are not available, but replicas on other sites exist). The ability to handle back-pressure. Support for sites not providing an SRM interface. Support for additional transfer protocols, such as HTTP and xrootd. Detailed monitoring of transfers to allow optimization of storage endpoints and networks. 14/05/12Workshop INFN CCR - GARR Managed Transfer

Not all storage is manageable through SRM – particularly storage implementations outside the HEP niche do not integrate SRM. Not all Storage Element implementations provide the complete implementation of SRM specification. Not all of the SRM v2.2 specification has proved useful Standards also exist that provide comparable levels of functionality: CDMI is an emerging standard interface for managing storage and, in addition, WebDAV provides many of the functionality of SRM Maintain SRM at archive sites Experiments, middleware experts and sites should agree on alternatives to be considered for testing and deployment 14/05/12Workshop INFN CCR - GARR Storage Management Interfaces

Benchmarking and I/O requirement gathering Protocol support and evolution: Both remote I/O (direct reading from local storage) and streaming of the file (copy to WN) should be supported in the short/medium term. The trend to move towards remote I/O should be encouraged by both experiments and storage solution providers I/O error management and resilience Future technology review High-throughput computing research: Possibilities for much higher throughput computing should be investigated. This research should not be restricted to ROOT data structures and should fully utilize cutting edge industry technologies, such as Hadoop data processing or successors, building on existing exploration activity 14/05/12Workshop INFN CCR - GARR Site Storage Performance

Site involvement in protocol and requirement evolution Expectations on data availability and access and the handling of data losses Where additional reliability is required, sites shall use ‘smart’ techniques and redundancy on block or file level to setup and operate this storage Improved activity monitoring: Monitoring of files accesses, access frequency Catalogue level monitoring Storage accounting. We recommend that WLCG agrees to use the EMI StAR accounting record. 14/05/12Workshop INFN CCR - GARR Site Storage Operations

POOL has become experiment-specific software, and will become unnecessary in the medium-term. No future development is foreseen. 14/05/12Workshop INFN CCR - GARR POOL persistency

Remove backdoors from CASTOR Check actual permissions implemented by storage systems Resolve issues raised with data ownership 14/05/12Workshop INFN CCR - GARR Security

Reports have been delivered under “Technical Evolution Strategy” folder Goal now is to have initial summary for WLCG workshop at CHEP At CHEP: Summary of important recommendations – and initial proposal of priorities Summary of areas where open questions have not been (fully) addressed Summary of areas where more work needs to happen, or discussions need to finish Initial proposal of working groups that WLCG should set up (for GDB and pre-GDB slots) Proposal for those general topics that could be dealt with at HEPiX (for example) 14/05/12Workshop INFN CCR - GARR Conclusions and Future