Project: COMP_01 R&D for ATLAS Grid computing

Project: COMP_01 R&D for ATLAS Grid computing
Tetsuro Mashimo International Center for Elementary Particle Physics (ICEPP), The University of Tokyo on behalf of the COMP_01 project team 2016 Joint Workshop of FKPPL and TYL/FJPPL May 18-20, Institute for Advanced Study, Seoul

COMP_01 “R&D for ATLAS Grid computing”
Cooperation between French and Japanese teams in R&D on ATLAS distributed computing for the LHC Run 2 era (2015~2018) Goal Tackle important challenges of the next years: new computing model, hardware, software, and networking issues Partners The International Center for Elementary Particle Physics (ICEPP), the University of Tokyo (WLCG (Worldwide LHC Computing Grid) Japanese Tier-2 center for ATLAS) and French Tier-2 centers/Tier-1 center (Lyon)

COMP_01: members French group Lab. Japanese group E. Lançon* Irfu
* leader French group Lab. Japanese group E. Lançon* Irfu T. Mashimo* ICEPP L. Poggioli IN2P3 I. Ueda KEK R. Vernet T. Nakamura M. Jouvin N. Matsui S. Jézéquel H. Sakamoto C. Biscarat T. Kawamoto J.-P. Meyer E. Vamvakopoulos

LHC Run 2 (2015~2018) Started in 2015 with collision energy of 13 TeV, but the goal of the year 2015 was to establish important running parameters of LHC for Run 2. Integrated luminosity delivered to ATLAS: ~ 4.2 fb-1 2016: ~ 25 fb-1 for ATLAS will put more burden on computing than 2015 Run 3 (2021 ~ ): even bigger challenges Cooperation must be strengthened on R&D for basic day-to-day technical/operational aspects of Grid

COMP_01 “R&D for ATLAS Grid computing”
Cooperation between French and Japanese teams on ATLAS distributed computing has been lasting for 10 years (previous projects “LHC_02” (2006 ~ 2012) and “LHC_07” (2013 ~ 2014) ) ICEPP Tier-2 provides resources for ATLAS only (one of the largest Tier-2 centers for ATLAS) and is “associated with” the Tier-1 center in Lyon A main interest in the past was efficient use of wide area network, especially for the transfer between the ICEPP Tier-2 and the remote sites in Europe, etc. e.g. Round Trip Time (RTT) ~ 300 msec. between the ICEPP Tier-2 and the Tier-1 in Lyon

Network between Lyon and Tokyo (in an old era)
New York SINET GEANT RENATER Lyon 10 Gb/s RTT=300 ms Lyon ASGC (Taiwan) BNL (USA-Long Island) Triumf (Canada-Vancouver) Exploiting the bandwidth is not a trivial thing: packet loss at various places, which leads to lower transfer speed, directional asymmetry in transfer performance, performance fluctuations in time, …

ATLAS Computing Model - Tiers

Implementation of the ATLAS computing model: tiers and clouds
Hierarchical tier organization based on Monarc network topology Sites are grouped into clouds for organizational reasons Possible communications: Optical Private Network T0-T1 T1-T1 National networks Intra-cloud T1-T2 Restricted communications: General public network Inter-cloud T1-T2 Inter-cloud T2-T2

Detector Data Distribution
Tier-0 O(2to4GB) files (with exceptions) RAW and reconstructed data generated at CERN and dispatched at T1s. Reconstructed data further replicated downstream to T2s of the SAME cloud Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

Data distribution after Reprocessing and Monte Carlo Reconstruction
RAW data is re-processed at T1s to produce a new version of derived data Derived data are replicated to T2s of the same cloud Derived data are replicated to a few other T1s (or CERN) And, from there, to other T2s of the same cloud Tier-0 O(2to4GB) files (with exceptions) Tier-1 Tier-1 Tier-2 Tier-2 Tier-2 Tier-2 Tier-2

Monte Carlo production
Simulation (and some reconstruction) run at T2s Input data hosted at T1s is transferred (and cached) at T2s Output data are copied and stored back to T1s For reconstruction, derived data are replicated to few other T1s (or CERN) And, from there, to other T2s of the same cloud Tier-0 INPUT OUTPUT Tier-1 Tier-1 Tier-2 Tier-2 Tier-2

Analysis The paradigm is “jobs go to data” i.e. No WAN involved.
Jobs are brokered at sites where data have been pre-placed Jobs access data only from the local storage of the site where they run Jobs store the output in the storage the site where they run No WAN involved. (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

Issues - I You need data at some T2 (normally “your” T2)
The inputs are at some other T2 in a different cloud Examples: Outputs of analysis jobs Replication of particular samples on demand According to the model you should: Tier-1 Tier-1 Tier-2 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

Issues - II You need to process data available only at a given T1
All sites of that cloud are very busy You assign jobs to some T2 of a different cloud INPUT According to the model you should: OUTPUT Tier-1 Tier-1 Tier-2 (by Simone Campana, ATLAS TIM, Tokyo, May 2013)

Evolution of the ATLAS computing model
ATLAS decided to relax the “Monarch model” Allow T1-T2 and T2-T2 traffic between different clouds (growth of network bandwidth) Any site can exchange data with any site if the system believes it is convenient In the past ATLAS asked (large) T2s To be well connected to their T1 To be well connected to the T2s of their cloud Now ATLAS are asking large T2s: To be well connected to all T1s To foresee non negligible traffic from/to other (large) T2s

COMP_01: R&D for ATLAS Grid computing
Networking therefore remains as a very important issue, especially for a large Tier-2 like ICEPP which now takes various roles (part of which were responsibility of mainly Tier-1s): careful monitoring is necessary with e.g. the “perfsonar” tool Other topics addressed by the collaboration Use of virtual machines for operating WLCG services Improvement of reliability of the middleware for storage Evolution toward a new batch system Performance of data access from analysis jobs through various protocols (http, FAX (Federated ATLAS storage systems using XRootD) ) Preparation of the evolution needed for Run 3

New internet connections for Tokyo Tier-2
“SINET” (Science Information Network), the Japanese academic backbone network provided by National Institute for Informatics (“NII”), renewed: “SINET4” → “SINET5” (started in April 2016) Backbone network inside Japan: “SINET4”: 40 Gbps + 10 Gbps → “SINET5”: 100 Gbps International connections: Direct connection from Japan to Europe (10 Gbps x 2 via Siberia instead of via US in SINET4) Round Trip Time (RTT) to Lyon: ~ 290 msec → ~ 190 msec Japan to US: 100 Gbps + 10 Gbps ICEPP Tier-2’s connection to outside: soon 10 Gbps → 20 Gbps

WAN for TOKYO Tier-2 (“SINET4” era)
ASGC BNL TRIUMF NDGF RAL CCIN2P3 CERN CANF PIC SARA NIKEF LA Pacific Atlantic 10Gbps WIX Additional new line (10Gbps) since the end of March 2013 OSAKA 40Gbps 20 Gbps 10 Gbps Amsterdam Geneva Dedicated line 14:50 Overview of the SINET 20' 14:50 Overview of the SINET 20' LHCONE: New dedicated (virtual) network for Tier-2 centers, etc. “perfSONAR” tool put in place for network monitoring

Budget plan in the year 2016 Item Euro Support-ed by k Yen Travel
1,000 160 Nb travels 3 3,000 IN2P3 480 ICEPP Per-diem 235 22.7 Nb days 15 3,525 12 272 Nb Travels 2 2,000 Irfu 10 2,350 Total 10,875 752

Cost of the project Cost for hardware not needed
The project uses the existing computing facilities at the Tier-1 and Tier-2 centers in France and Japan and the existing network infrastructure provided by NRENs and GEANT, etc. Good communication is the key issue s and video-conferences are widely used, but face-to-face meetings are necessary usually once per year (a small workshop), therefore the cost for travel and stay

Project: COMP_01 R&D for ATLAS Grid computing

Similar presentations

Presentation on theme: "Project: COMP_01 R&D for ATLAS Grid computing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Project: COMP_01 R&D for ATLAS Grid computing

Similar presentations

Presentation on theme: "Project: COMP_01 R&D for ATLAS Grid computing"— Presentation transcript:

Similar presentations

About project

Feedback