Download presentation
Presentation is loading. Please wait.
Published byKristopher Boone Modified over 9 years ago
1
AMOD Report Doug Benjamin Duke University
2
Hourly Jobs Running during last week 140 K Blue – MC simulation Yellow Data processing Red – user Analysis Magenta – group production Grey – group Analysis 0
3
DDM data flows during last week 10 TB 0 TB 800 TB
4
Notable activities Monday - Recover from slow T0 Export over the weekend to RAL and Triumf o Both switched over to backup OPN over the weekend Cause never understood o Triumf slower link and RAL Asymmetric link Tuesday – SARA T0 export and T1 stage from tape issues Wednesday - RAL unplanned power cut, CERN LSF job submission slowness Thursday – RAL power restored – recover outage, continue with CERN LFS job submission slowness Friday - CERN LFS job submission slowness Saturday – Rain lots of it (flooding, R1, my office building, SPS – took beam offline)
5
Other notable events ND cloud local storage problems o Currently trying to recover 70k files to avoid declaring them lost. Resubmitting most tasks and Rob subscribed to missing Raw input files. RAL – worked to recover several ATLAS pools affected by the power cut. (159 files declared lost)
6
Bulk reprocessing Bulk Reprocessing o Originally planned to start Period D, then B, then A and then C o Instead Period D started, then period B, A and C to keep all jobs running in all clouds but….. This processing pattern has caused problems with disk space issues at Tier 1 sites o Stopped early submission of periods A and C, D and B continue As of Sunday period D – 98.5% done (before merge) Period B 68% done Over weekend disk space in Tier 1 became an issue.
7
T1 data disk space Due to low free disk space – PIC, SARA, FZK all were removed from SANTA CLAUS, now 4 T1 sites excluded (DE,ES,NL,IT clouds) Saturday – Stephane Jezequel triggered cleaning (Victor is running very slowly recently). o Situation at FZK and SARA improved. o Monday (12-Nov) SARA will migrate 60 TB from scratch to data disk o PIC still issue as of Sunday night. o Stephane – moving away MC datasets
8
LSF LSF job dispatch speed caused problems all week, 6K 60 K
9
GGUS tickets
10
Conclusions Thanks to the experts, sites, shifts (Comp@p1, ADCOS, ADCOS expert) Bulk reprocessing proceeding relatively smoothly LSF job submission speed causing Tier 0 team headaches DATA disk space at the Tier 1 sites an issue. Needs to be monitored as not to effect Bulk reprocessing
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.