CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Slides:

Advertisements

Similar presentations

LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.

Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.

Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.

Distributed Tier1 scenarios G. Donvito INFN-BARI.

Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.

MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Experiences Deploying Xrootd at RAL Chris Brew (RAL)

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Input from CMS Nicolò Magini Andrea Sciabà IT/SDC 5 July 2013.

Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.

Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.

CERN IT Department CH-1211 Genève 23 Switzerland t Tier0 Status - 1 Tier0 Status Tony Cass LCG-LHCC Referees Meeting 18 th November 2008.

1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.

Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.

GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.

ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.

INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.

WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.

1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Maarten Litmaath (CERN), GDB meeting, CERN, 2006/06/07 SRM v2.2 working group update Results of the May workshop at FNAL

Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.

Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

Evolution of storage and data management Ian Bird GDB: 12 th May 2010.

The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.

ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.

Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.

Storage Classes report GDB Oct Artem Trunov

EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,

1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.

Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,

U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.

Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.

Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.

SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).

Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.

The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.

The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.

VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -

WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.

Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

Federating Data in the ALICE Experiment

Computing Operations Roadmap

Status of the SRM 2.2 MoU extension

BNL Tier1 Report Worker nodes Tier 1: added 88 Dell R430 nodes

ALICE Monitoring

Farida Fassi, Damien Mercie

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

CMS staging from tape Natalia Ratnikova, Fermilab

Simulation use cases for T2 in ALICE

LQCD Computing Operations

WLCG Demonstrator R.Seuster (UVic) 09 November, 2016

Ákos Frohner EGEE'08 September 2008

Production Manager Tools (New Architecture)

The LHCb Computing Data Challenge DC06

Presentation transcript:

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013

Outline  Motivation: gains in operations  Impact on data federation  Progress and technical issues  Changes in operations and procedures WLCG Workshop: Disk/Tape separation 2

Introduction  CMS asked the Tier-1 sites to change their storage setup to gain more flexibility and control of the available disk and tape resources  Old setup:  One MSS system controlling both disk and tape  Automatic migration of new files to tape  Disk pool automatically purges unpopular files to make room for more popular files  Automatic recall of files from tape when accessing files without disk copy  Several disadvantages:  Pre-staging needed for organized processing, not 100% efficient because system was still allowed to automatically purge files if needed  User analysis was not allowed at Tier-1 sites to protect the tape drives from chaotic user access patterns WLCG Workshop: Disk/Tape separation 3

Disk/Tape separation  CMS asked the Tier-1 sites to separate disk and tape and base the management of both on PhEDEx  Sites were asked to deploy two independent [*] PhEDEx endpoints  “Large” [**] persistent disk  Tape archive with “small” [**] disk buffer  All file access will be restricted to the disk endpoint  All processing will write only on the disk endpoint  [*] Can write/delete a file on disk-only, or on tape-only, or on both simultaneously  [**] “small” ~ 10% of “large”, but can be sized according to expected rates to tape WLCG Workshop: Disk/Tape separation 4

Motivation  Increase flexibility for Tier-1 processing  Enable user analysis at Tier-1s  Enable remote access of Tier-1 data WLCG Workshop: Disk/Tape separation 5

Processing at Tier-1s: Location independence  Use case:  Organized processing needs to access input samples stored custodially on tape at one of the Tier-1 sites  Old model:  Jobs needed to run close to tape endpoint hosting input and output data (custodial location)  New model:  Jobs can run against any disk endpoint, not necessarily close to tape endpoint hosting input or output data  Benefit of new model:  Custodial distribution optimizes tape space utilization taking into account processing capacities of the Tier-1 sites  Not all data is being accessed at the same time causing uneven processing resource utilization  Location independence enables to use both tape and processing resources efficiently at the same time WLCG Workshop: Disk/Tape separation 6

Processing at Tier-1s: Pre-staging and Pinning  Use case:  Staging and pinning input files to local disk for organized processing is required to optimize CPU efficiency  Input files need to be released from disk when processing is done  Old model:  Pre-staging via SRM or Savannah tickets was used to convince the MSS to have input files available on disk  Release of input relied on automatic purge within MSS  New model:  CMS will centrally subscribe and therefore pre-stage input files to have them available on disk before jobs start  CMS will permanently keep input files on disk for regular activities  Benefit of new mode:  CMS is in control of what is on disk at the Tier-1 sites and can optimize disk utilization (CMS will have to actively manage the disk space through PhEDEx) WLCG Workshop: Disk/Tape separation 7

Processing at Tier-1s: Output from central processing  Use case:  Central processing produces output which needs to be archived on tape  Old model:  Output of individual workflows could only be produced at one site, the site of the custodial location  New model:  Output can be produced at one or more disk endpoints, then migrated to tape only at single final custodial location  Benefit of new model:  CMS can optimize processing resource utilization  Tier-1s with no free tape are no longer idle  CMS can validate data before final tape migration, reducing unnecessary tape usage WLCG Workshop: Disk/Tape separation 8

Impact on data federation  CMS would like to benefit from a fully deployed CMS data federation  Tier-1s need to publish files on the disk endpoints in the Xrootd federation  Eventually, all popular data will be accessible through the federation  Benefits:  Further optimize processing resource utilization by processing input files without the need to relocate samples through PhEDEx  Enables processing not only on remote Tier-1 sites through the LHCOPN but also at Tier-2 sites WLCG Workshop: Disk/Tape separation 9

Technical implementation  Sites and storage providers free to choose implementation  Two possibilities identified in practice:  Two independent storage endpoints  CERN, FNAL  Single storage endpoint with two different trees in the namespace  RAL, KIT, CNAF, CCIN2P3, PIC WLCG Workshop: Disk/Tape separation 10

Internal transfers  Currently using standard tools for disk  tape buffer transfers at all sites  e.g. FTS, xrdcp  No bottleneck seen so far  If needed, internal optimizations are possible with a single endpoint  e.g. on a single dCache endpoint, internal data flow can be delegated to the pools WLCG Workshop: Disk/Tape separation 11

Site concerns  Main site concern has been duplication of space used between disk and tape buffer  Should not be a big effect given the “small” size of the buffer in front of tape  For dCache, a solution is planned:  “flush-on-demand” command creating a hard link in tape namespace instead of copy  development schedule will depend on need, for now gather experience with current version WLCG Workshop: Disk/Tape separation 12

Current status  DONE  RAL, CNAF  KIT (in commissioning last week)  ~ DONE  CERN (except for Tier-0 streamers and user)  IN PROGRESS  PIC, CCIN2P3, FNAL WLCG Workshop: Disk/Tape separation 13

Issues  At sites  No blocking technical issues  Not stress-tested yet: challenge in 2014?  In CMS software  Minor update needed in PhEDEx to handle disk  tape moves  Need to settle data location for job matching  PhEDEx node vs. SE…  CMS internal, in progress WLCG Workshop: Disk/Tape separation 14

Changes in operations and procedures  The Tier-1 disk endpoint is a central space  CMS will manage subscriptions and deletions on disk  Tape endpoint subscriptions are subject to approval by Tier-1 data managers (functions that are held by site-local colleagues)  CMS would like to auto-approve disk subscription and deletion requests to be able to reduce latencies WLCG Workshop: Disk/Tape separation 15

Changes in operations and procedures  Tape families:  Together with the Tier-1 sites, CMS optimized placement of files on tape for reading by requesting tape families  In the old model, tape family requests needed to be made before processing started, could lead to complications if forgotten  New model allows processing on disk endpoints without the need for tape families  A PhEDEx subscription archives the output to tape: needs to be approved by the site-local data manager  Tape family requests by CMS are not needed anymore, Sites can create tape families before approving archival PhEDEx subscriptions  CMS is happy and available for the sites to optimize rules for tape family creation  CMS would like to evolve the tape family procedure from requesting individual families to a dialogue with the sites defining tape family setups and rules WLCG Workshop: Disk/Tape separation 16

Changes in site readiness  Site readiness metrics for Tier-1s will evolve taking into account separated disk and tape PhEDEx endpoints  SAM tests only on CEs close to disk  SAM tests for SRM both on disk and on tape endpoints  More links to monitor:  disk  WAN  tape  WAN  disk  tape WLCG Workshop: Disk/Tape separation 17

Conclusions  Hosting Tier-1 data on disk will increase flexibility in all computing workflows  Technical solutions identified for all sites  Deployment in progress with no blocking issues, expecting completion at all sites by beginning of 2014  For more details:   WLCG Workshop: Disk/Tape separation 18