Download presentation
Presentation is loading. Please wait.
1
CMS transferts massif Artem Trunov
2
CMS site roles Tier0 Tier1 Tier2 Initial reconstruction
Archive RAW + REC from first reconstruction Analysis, detector studies, etc Tier1 Archive a fraction of RAW (2nd copy) Subsequent reconstruction “Skimming” (off AOD) Archiving Sim data produced at T2s Serve AOD data to other T1 and T2s Analysis Tier2 Simulation Production
3
Data Distribution T0 – T1 T1 – T1 T1 – T2 T2 – T1
RAW + first pass RECO,AOD T1 – T1 subsequent RECO pass AOD exchange T1 – T2 AOD and other data T2 – T1 MC upload
4
Storage at a typical T1 (tape)
Dave 1027= SIM + tapefill*(AnaStore + Nstreams * (streamRAW +NReco * ( sRECO + sAOD + AnaGroupSpace)) + Tape2007 1027= *(50 + 5*(36 +3*( ) )) = = Custodial: RAW + re-Reco + MC 537 = Dave’s total without Tape2007 and AnaStore and AnaGroupSpace SIM 249 RAW 180 ( of first reco&aod?) RECO 3pass x 30 = 90 AOD 3pass x 6 = 18 Tape subtotal Custodial tape = 827 Tape for AOD exchange? AOD exchange worth of 1y 4 x 54 =216 Or current RECO exchange = 270 With AOD exchange – 1043 With RECO exchange
5
Nominal rates for a typical T1
T0 – T1 RAW ~30 MB/s (2100 TB, 92 days of running, 260MB/s out of CERN during run days) T1 – T1 AOD exchange In ~50 MB/s (60TB sample from other T1 during 14 days) Out ~50 MB/s (6TB x 7sites for 14 days) T1 – T2 ~100 MB/s (60TB x 30 T2s / 7 T1s / 30days) T2 - T1 MC In ~10 MB/s (250TB for 356 days) Resume: 100 MB/s always to tape + disk MAX 150 MB/s mostly from disk MAX
6
Phedex CMS data transfer tool
The only tool that is required to run at all sites Work in progress to ease this requirement – with SRM, fully remote operation is possible. However local support is still need to debug problems. Set of site-customizable agents that perform various transfer related tasks. download files to site produce SURL of local files for other sites to download follow migration of files to the MSS staging of files to the MSS removing local files Uses ‘pull’ model of transfers, i.e. transfers are initiated at the destination site by Phedex running at this site. Uses Oracle at CERN to keep it’s state information Can use FTS to perform transfer, or srmcp Or another mean, like direct gridftp, but CMS requires SRM at sites. One of the oldest and stable SW component of CMS Secret of success: development is carried out by the CERN+site people who are/were involved in daily operations Uses someone’s personal proxy certificate
7
Results so far – SC4 and CSA06
From FNAL From CERN
8
IN2P3 transfer rate - CSA06 23 MB/s average The goal was stability
problems with new HW 23 MB/s average The goal was stability
9
T1 – T1 and T1 – T2 CMS has not yet demonstrated great performance in this area. So far tests involved simultaneous transfers of T2 and T1, and broken links with hanging transfers sometimes “jam” sites and real performance can not be determined. Besides, overwhelming amount of errors simply prevents from debugging. This year started with new format of LoadTest, where dedicated transfer links are centrally activated and subscribed, so this will bring much better understanding of individual links performance
10
Problems – SRM instability
very complex system poor implementation
11
How did we get there? Choice of authentification model (gsi)
→ gridftp But the protocol requires some many ports open and difficult to deal from behind NAT Interoperability → SRM Did not initially target LCH, not compatible implementations, obscure semantics Transfer management → FTS (developed at CERN) Does not fully address bandwidth management
12
Problems - FTS deployment scheme
In this scheme CERN takes advantage of managing it’s own traffic in or out. But T1 site only manages incoming traffic! Your outgoing traffic is not under your control! Big issue – one has to account for potential increase in traffic and take measures to guarantee incoming traffic Increase number of servers in transfer pools Make separate write pools (import) and read pools (export) CERN Traffic managed by CERN FTS T1 Traffic managed by T1 FTS T2 Neither Phedex is managing outgoing traffic!
13
Throughput tests T0 – T1 When it comes to records, Phedex-driven transfers made to 250 MB/s from CERN into a common SC buffer in absence of other VO transfers. The credit is split between Lyon storage admins, CERN FTS people and CMS coordination people
14
Plans until Fall’07 Transfer tests inline with LCG plans
Feb-March – 65% of 2008 rates ~21MB/s April-May – same with SRM 2.2 LoadTest for testing inter-site links - permanent CSA07 – July 50MB/s from CERN to tape 50MB/s aggregate from T1s to tape 50MB/s aggregate to T1s 100MB/s aggregate to (5) T2s in 8h bursts 50MB/s from (5) T2s to tape Basically testing 2008 numbers during July
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.