Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.

Slides:



Advertisements
Similar presentations
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Advertisements

Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LHC Computing, CERN, & Federated Identities
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Debugging Data Transfers between CMS Computing.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
Handling of T1D0 in CCRC’08 Tier-0 data handling Tier-1 data handling Experiment data handling Reprocessing Recalling files from tape Tier-0 data handling,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
2 CMS 6 PB raw/run Phobos 50 TB/run E917 5 TB/run.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Baseline Services Group Status of File Transfer Service discussions Storage Management Workshop 6 th April 2005 Ian Bird IT/GD.
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Belle II Physics Analysis Center at TIFR
James Casey, IT-GD, CERN CERN, 5th September 2005
POW MND section.
Data Challenge with the Grid in ATLAS
BNL FTS services Hironori Ito.
Introduction to Data Management in EGI
FTS Monitoring Ricardo Rocha
CMS transferts massif Artem Trunov.
Experiment Dashboard overviw of the applications
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
CMS staging from tape Natalia Ratnikova, Fermilab
Storage elements discovery
Data Management cluster summary
Monitoring of the infrastructure from the VO perspective
N. De Filippis - LLR-Ecole Polytechnique
ExaO: Software Defined Data Distribution for Exascale Sciences
LHC Data Analysis using a worldwide computing grid
From Prototype to Production Grid
lundi 25 février 2019 FTS configuration
The LHCb Computing Data Challenge DC06
Presentation transcript:

Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació Científica (PIC), Barcelona, Spain A. Sartirana École Polytechnique, Palaiseau, France On behalf of the CMS experiment CHEP 2010 - International Conference on Computing in High Energy and Nuclear Physics 19 October 2010, Academia Sinica, Taipei, Taiwan

Outline CMS data transfer workflows Jan 12, 2005 Outline CMS data transfer workflows Measuring the performance of CMS data transfers Using data transfer statistics to improve transfer operations

CMS Computing Model 7 Tier-1s Tier-0 ~50 Tier-2s CAF TIER-0 CERN WLCG Computing Grid Infrastructure CMS detector TIER-0 CERN 900 MB/s agg. TIER-1 TIER-1 50-500MB/s 10-20MB/s TIER-2 TIER-2 TIER-2 7 Tier-1s (“online” to the DAQ) High availability centres Custodial mass storage of share of data Data reconstruction and reprocessing Data skimming & selection Distribute analysis data  Tier-2s Tier-0 (the accelerator centre) Data acquisition & initial processing Long-term mass data storage CMS CERN Analysis Facility (latency critical data processing, high priority analysis) Distribution of data  Tier-1 centres ~50 Tier-2s in ~20 countries End-user physics analyses Detector Studies Monte Carlo Simulation  Tier-1

CMS Transfer Workflow CMS transfer management system PhEDEx WLCG middleware FTS SRM gridFTP

CMS Transfer Workflow Transfer request is placed through PhEDEx web interface

CMS Transfer Workflow PhEDEx central agents create and distribute transfer tasks to site agents

CMS Transfer Workflow PhEDEx Download agent submits transfer batch job to FTS server

CMS Transfer Workflow FTS contacts source and destination SRMs to get transfer URLs

CMS Transfer Workflow FTS executes the transfer as third-party with gridFTP FTP extension GSI security parallel streams Other configurations also used: srmCopy started by FTS server srmCopy started directly by PhEDEx Download agent with SRM client

Jan 12, 2005 File Transfer Service Provides scheduling of multiple asynchronous file transfers on CHANNELS Single direction transfer queue between two endpoints Not tied to a physical network path Each endpoint (source and destination) can be: A single site, e.g. CERN-RAL IN2P3-BELGIUMULB A group of sites (“cloud”), e.g. RALLCG2-CLOUDCMSITALY CLOUDCMSFRANCE-RALLCG2 All sites (“star”), e.g. CNAF-STAR STAR-FNAL For FTS, a channel is a transfers management queue; once a job is submitted, according to the source and destination endpoints the VO of the user the job is assigned the most suitable channel according to the channel topology configured on the server. The concept of channel is not related to a physical network path. All file transfer on the same channel are served as part of the same queue; on this queue it is possible to set intra-VO shares (Atlas gets 75%, CMS gets the rest) or priorities within a VO Each channel can be configured to use a particular transfer method (gridftp or srmcopy) and has its own parameters (number of concurrent files running, number streams, TCP buffer, etc). It is also possible to set limits on concurrent transfers on the same storage element at a given time etc. Introduction to FTS

FTS server deployment At Tier-0 At each Tier-1 TIER-0 CERN Jan 12, 2005 FTS server deployment TIER-0 CERN At Tier-0 Dedicated channel to each of the Tier-1s At each Tier-1 Dedicated channel from each of the other Tier-1s Dedicated channels to and from each of the associated Tier-2s CLOUD and/or STAR channels to/from other Tier-2s STAR-T2 channels for each associated Tier-2 TIER-1 TIER-1 TIER-2 TIER-2 TIER-2

FTS channels FTS channel configuration defines: Jan 12, 2005 FTS channels FTS channel configuration defines: Transfer limits Maximum number of concurrent active transfers Protect network, storage Shared among VOs according to policy Transfer priorities Between users in the same VO on a channel Transfer parameters Number of parallel TCP streams, buffer size Timeouts Overall throughput for a link in a channel Link throughput = rate/stream * streams/file * active transfers/link In a dedicated channel Expect ~constant rate/stream up to saturation Fixed number of available active transfer slots/link In cloud or star channel rate/stream can be significantly different for links in same channel Available active transfer slots/link depends on overall channel occupancy Slow links keep transfer slots busy for longer For FTS, a channel is a transfers management queue; once a job is submitted, according to the source and destination endpoints the VO of the user the job is assigned the most suitable channel according to the channel topology configured on the server. The concept of channel is not related to a physical network path. All file transfer on the same channel are served as part of the same queue; on this queue it is possible to set intra-VO shares (Atlas gets 75%, CMS gets the rest) or priorities within a VO Each channel can be configured to use a particular transfer method (gridftp or srmcopy) and has its own parameters (number of concurrent files running, number streams, TCP buffer, etc). It is also possible to set limits on concurrent transfers on the same storage element at a given time etc. Introduction to FTS

Evolution of transfer workflows Jan 12, 2005 Evolution of transfer workflows Scale and complexity of CMS data transfers has been steadily increasing thanks to focused effort on improving transfer quality and throughput 2007 ~300 links T0  T1 T1  T1 T1  “associated” T2s

Evolution of transfer workflows Jan 12, 2005 Evolution of transfer workflows Scale and complexity of CMS data transfers has been steadily increasing thanks to focused effort on improving transfer quality and throughput 2010 ~3000 links T0  T1 T1  T1 T1  all T2s T2  all T2s As more and more data transfer links are commissioned, the sites start competing for the same slots in the FTS channels Making optimal use of bandwidth requires identifying and isolating the problematic links

FTS Monitor FTS server database contains detailed transfer information Jan 12, 2005 FTS Monitor FTS server database contains detailed transfer information Wealth of knowledge that can be used to spot issues Information is exposed through FTS Monitor Transfer summary

FTS Monitor FTS server database contains detailed transfer information Jan 12, 2005 FTS Monitor FTS server database contains detailed transfer information Wealth of knowledge that can be used to spot issues Information is exposed through FTS Monitor Channel configuration details

FTS Monitor FTS server database contains detailed transfer information Jan 12, 2005 FTS Monitor FTS server database contains detailed transfer information Wealth of knowledge that can be used to spot issues Information is exposed through FTS Monitor Individual transfer details

FTS monitor parser Tool to extract data from FTS monitors worldwide Jan 12, 2005 FTS monitor parser Tool to extract data from FTS monitors worldwide Full statistics about transfers are extracted daily, and summary reports are produced Several views available Global: e.g. transfer rate per file and per stream on all T1-T1 channels

FTS monitor parser Tool to extract data from FTS monitors worldwide Jan 12, 2005 FTS monitor parser Tool to extract data from FTS monitors worldwide Full statistics about transfers are extracted daily, and summary reports are produced Several views available Site: e.g. rate per stream on all CNAF-T2 and T2-CNAF channels

FTS monitor parser Tool to extract data from FTS monitors worldwide Jan 12, 2005 FTS monitor parser Tool to extract data from FTS monitors worldwide Full statistics about transfers are extracted daily, and summary reports are produced Several views available Historical: e.g. evolution of rate per stream on IN2P3-PIC channel

FTS channel optimization Jan 12, 2005 FTS channel optimization Using data extracted from FTS Monitoring to improve transfer operations Example: PICT2 exports Massive PICT2s transfers in early October following processing campaign Clogged by slow links on PIC-STAR FTS channel Links with low rate-per-stream identified

FTS channel optimization Jan 12, 2005 FTS channel optimization Using data extracted from FTS Monitoring to improve transfer operations Example: PICT2 exports Created “cloud” FTS channels for “fast” and “slow” links Result: Improved FTS channel occupancy Increased number of transfer attempts Improved overall export throughput

Identifying infrastructure issues Jan 12, 2005 Identifying infrastructure issues The wealth of data available allows to spot potential issues in the site or network infrastructure Example: PIC import/export asymmetry Rate-per-stream lower for exports than for imports on most links Doesn’t seem to depend on distance Potential site issue? Possible explanation: known limitation of kernel used on disk servers Dedicated testing will reveal more

Jan 12, 2005 Expanding the scope First results presented – lots of potential to expand the analysis Gather data for FTS servers not yet included Identify “reference” statistics and publish corresponding plots to monitor regularly For central shifters For site administrators Spot problems in sites and network and assist site administrators with troubleshooting Include more statistics Distributions by file size Transfer preparation times Channel occupancy … Integrating with other VOs

Summary PhEDEx ensures reliable data transfers with FTS The scale and complexity of CMS transfers has constantly increased over the years The FTS Monitor offers detailed information on transfers Extracting and analyzing transfer statistics provides useful insight to improve transfer operations