Presentation is loading. Please wait.

Presentation is loading. Please wait.

BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.

Similar presentations


Presentation on theme: "BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven."— Presentation transcript:

1 BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven National Lab

2 2 Services at BNL  FTS (version 2.3.1) client + server and its backend Oracle and myproxy servers.  FTS does the job of reliable file transfer from CERN to BNL.  Most Functionalities were implemented. It became reliable in controlling data transfer after several rounds of redeployments for bug fixing: short timeout value causing excessive failures, incompatibility with dCache/SRM.  Does not support DIRECT data transfer between CERN to BNL dCache data pool server (dCache SRM third party data transfer). The data transfers actually go through a few dCache GridFTP door nodes at BNL, which presents scalability issue. We had to move these door nodes to non-blocking networking ports to distribute traffic.  Both BNL and RAL discovered that the number of streams per file could not be more than 10, (a bug)?  Networking to CERN:  Network for dCache was upgraded to 2*1Gpbs around June.  Shared link with Long Round Trip Time: >140 ms, while RTT for Europe sites to CERN is about 20ms.  Occasional packet losses were discovered along the path between BNL-CERN.  1.5 G bps aggregated bandwidth observed by iperf with 160 TCP streams.

3 3 Services at BNL  dCache/SRM ( V1.6.5-2, with SRM 1.1 interface, Total 332 (3.06 Ghz, 2GByte Memory and 3 SCSI 140 Gbyte drives) nodes with about 170 TB disks, Multiple GridFTP, SRM, and dCap doors ): USATLAS production dCache system. q All nodes have Scientific Linux 3 with XFS module compiled. q Experienced High load on write pool serves during large amount data transfer. Was fixed by replacing the EXT file systems with XFS file system. q Core server crashed once. Reason was identified and fixed. q Small buffer space (1.0TB) for data written into dCache system. q dCache can now deliver up to 200MB/second for input/output (limited by network speed.)  LFC (1.3.4) client and server was installed at BNL Replica Catalog Server.  Server was installed. Tested the basic functionalities: lfc-ls, lfc-mkdir etc.  Will populate LFC with the entries in our production globus RLS server.  ATLAS VO Box (DDM + LCG VO box) was deployed at BNL.

4 4 Read pools DCap doors SRM door doors GridFTP doors doors Control Channel write pools Data Channel DCap Clients Pnfs ManagerPool Manager HPSS GridFTP Clientsd SRM Clients Oak Ridge Batch system DCache System BNL dCache Configuration

5 5 CERN Storage System

6 6 Data Transfer from CERN to BNL (ATLAS Tier 1)

7 7 Transfer Plots Castor2 LSF plugin problem plugin problem

8 8 BNL SC3 data transfer All data actually are routed through GridFtp doors SC3 Monitored at BNL

9 9 Data Transfer Status  BNL stablized FTS data transfer with high successful completion rate, as shown in the left image.  We have attained150 MB/second rate for about one hour with large number (> 50) of parallel file transfers. CERN FTS had the limit of 50 files per channel, which is not enough to fill up CERN  BNL data channel.

10 10 Final Data Transfer Reports

11 11 Lessons Learned From SC2  Four file transfer servers with 1 Gigabit WAN network connection to CERN.  Meet the performance/throughput challenges (70~80MB/second disk to disk).  Enabled data transfer between dCache/SRM and CERN SRM at openlab  Design our own script to control SRM data transfer.  Enabled data transfer between BNL GridFtp servers and CERN openlab GridFtp servers controlled by Radiant software.  Many components need to be tuned  250 ms RRT, high packet dropping rate, has to use multiple TCP streams and multiple file transfers to fill up network pipe.  Sluggish parallel file I/O with EXT2/EXT3, lot of processes with “D” state, more file streams, worse the performance on file system.  Slight improvement with XFS system. Still need to tune file system parameter

12 12 Some Issues  Service Challenge also challenges resource:  Tuned network pipes, optimized the configuration and performance of BNL production dCache system and its associate OS, file systems,  Required more than one staff’s involvements to stabilize the newly deployed FTS, dCache and network infrastructure.  Staffing level decreased as services became stable.  Limited Resources are shared by experiments and users.  At CERN, SC3 infrastructure are shared by multiple Tier 1 sites.  Due to the heterogeneous nature of Tier 1 sites, data transfer for each site should be optimized non-uniformly based on site’s various aspects: i.e. network RRT, packet loss rates, experiment requirements etc.  At BNL, network and dCache are also used by production users.  Need to closely monitor the SRM and network to avoid impacting production activities.  At CERN, James Casey alone handles answering email, setting up the system, reporting problems and running data transfer. He provides 7/16 support himself.  How to scale to 7/24 production support/production center?  How to handle the time difference between US and CERN?  CERN Support Phone (Tried once, but the operator did not speak English)

13 13 What have been done.  SC3 Tier 2 Data Transfer  Data were transferred to three selected Tier 2 sites.  SC3 Tape Transfer  Tape Data Transfer was stablized at 60 MB/second with loaned tape resources.  Met the goal defined at the beginning of Service Challenge.  Full Chain of data transfer was exercised.

14 14 ATLAS SC3 Service Phase

15 15 ATLAS SC3 Service Phase goals  Exercise ATLAS data flow  Integration of data flow with the ATLAS Production System  Tier-0 exercise  More information:  https://uimon.cern.ch/twiki/bin/view/Atlas/DDMSc3

16 16 ATLAS-SC3 Tier0  Quasi-RAW data generated at CERN and reconstruction jobs run at CERN  No data transferred from the pit to the computer centre  “Raw data” and the reconstructed ESD and AOD data are replicated to Tier 1 sites using agents on the VO Boxes at each site.  Exercising use of CERN infrastructure …  Castor 2, LSF  and the LCG Grid middleware …  FTS, LFC, VO Boxes  Distributed Data Management (DDM) software

17 17 ATLAS Tier-0 EF CPU T1 castor RAW 1.6 GB/file 0.2 Hz 17K f/day 320 MB/s 27 TB/day ESD 0.5 GB/file 0.2 Hz 17K f/day 100 MB/s 8 TB/day AOD 10 MB/file 2 Hz 170K f/day 20 MB/s 1.6 TB/day AODm 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day RAW AOD RAW ESD (2x) AODm (10x) RAW ESD AODm 0.44 Hz 37K f/day 440 MB/s 1 Hz 85K f/day 720 MB/s 0.4 Hz 190K f/day 340 MB/s 2.24 Hz 170K f/day (temp) 20K f/day (perm) 140 MB/s

18 18 ATLAS-SC3 Tier-0  Main goal is a 10% exercise  Reconstruct “10%” of the number of events ATLAS will get in 2007 using “10%” of the full resources that will be needed at that time  Tier-0  ~300 kSI2k  “EF” to CASTOR: 32 MB/s  Disk to tape: 44 MB/s (32 for raw and 12 for ESD+AOD)  Disk to WN: 34 MB/s  T0 to each T1: 72 MB/s  3.8 TB to “tape” per day  Tier-1 (in average):  ~8500 files per day  At a rate of ~72 MB/s

19 19 24h before 4 day intervention 29/10 - 1/11 We achieved quite good rate in the testing phase (sustained 20-30 MB/s to three sites (PIC, BNL and CNAF). ATLAS DDM Monitoring

20 20 Data Distribution  Use a generated “dataset”  Contains 6035 files (3 TB) and we tried to replicate it to BNL, CNAF and PIC.  BNL Data Transfer is under way.  PIC: 3600 files copied and registered  2195 ‘failed replication’ after 5 retries by us x 3 FTS retries  Problem under investigation  205 ‘assigned’ - still waiting to be copied  31 ‘validation failed’ since SE is down  4 ‘no replicas found’ LFC connection error  CNAF: 5932 files copied and registered  89 ‘failed replication’  14 ‘no replicas found’

21 21 General view of SC3  When everything is running smoothly ATLAS get good results  The middleware (FTS) is stable but there were still lots of compatibility issues:  FTS does not work new version of dCache/SRM (version 1.3).  ATLAS DDM software dependencies can also cause problems when sites upgrade middleware  not managed to exhaust anything production s/w; LCG m/w)  Still far from concluding the exercise and not running stably in any way.  Exercise will continue adding new sites.

22 22 BNL Service Challenge 4 Plan  Several steps needed to set-up hardware or service (ex: choose, procure, start install, end install, make operational)  LAN, Tape system, Computing farm, disk storage  dCache/SRM, FTS, LFC, DDM.  Continue to maintain and support the services with the define SLA (Service Level Agreement).  December/2005: begin installation of expanded LAN, new tape system and make the new installation operational.  January/2006: begin data transfer with the newly upgraded infrastructure, the target rate is 200M bytes/second and deploy all required baseline software.

23 23 BNL Service Challenge 4 Plan  April/2006, establish the stable data transfer in the speed of 200M Bytes/second to disks and 200 M Bytes/second to tape.  May/2006, disk and computing farm upgrading.  June/01/2006: stable data transfer driven by ATLAS production system and ATLAS data management infrastructure between T0~T1 (200M Bytes/second) and provide services to satisfy SLA (Service level agreement).  Details of involving Tier 2 are in planning too.


Download ppt "BNL Service Challenge 3 Site Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven."

Similar presentations


Ads by Google