Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.

Similar presentations


Presentation on theme: "ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004."— Presentation transcript:

1 chudoba@fzu.cz ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004

2 chudoba@fzu.cz Hardware in Prague available for ATLAS Golias: 32 dual CPU nodes PIII1.13GHz, 1GB RAM upgraded since July: + 49 dual CPU Xeon 3.06 GHz, 2 GB RAM (WN) 3TB disk space reserved for atlas PBSPro batch system lcgatlasprod queue reserved for atlas VO members, high priority Skurut: 16 dual CPU nodes PIII700MHz, 1GB RAM OpenPBS batch system queues: lcgpbs-short, long, infinite, used mainly by atlas 2 independent CEs in LCG2

3 chudoba@fzu.cz Jobs waiting for input or output replication, sometimes hanging ‘forever’: Example: Job Id Queue User Node CPUTime WallTime 34031.golias lcgatlasprod atlas001 golias30 03:09:28 43:30:39 34035.golias lcgatlasprod atlas002 golias03 04:17:38 43:19:18 34113.golias lcgatlasprod atlas002 golias10 03:00:41 41:52:11 34127.golias lcgatlasprod atlas001 golias11 04:19:11 41:21:46 34583.golias lcgatlasprod atlassgm goliasx56 00:00:17 26:01:14... Not yet cured: running jobs, 20.9.2004: Job Id Queue User Node CPUTime WallTime 55162.golias lcgatlasprod atlassgm goliasx42 00:00:03 102:19:45 58528.golias lcgatlasprod atlas001 golias02 11:22:40 11:33:13 58529.golias lcgatlasprod atlas001 golias03 00:00:16 11:33:49... Usually such long jobs are killed either by administrator or by PBS time limit

4 chudoba@fzu.cz July 1 – September 21 GOLIAS jobsCPU (days) Elapsed (days) all481116531992 long (cpu>100s)237716531881 short2434.4111 SKURUT jobsCPU (days) Elapsed (days) all144615071591 long (cpu>100s)87015071554 short576.237 number of jobs in DQ: 1349 done 1231 failed = 2580 jobs number of jobs in DQ: 362 done 572 failed = 934 jobs

5 chudoba@fzu.cz Job distribution almost always not enough jobs on GOLIAS ATLAS SKURUT usage much better

6 chudoba@fzu.cz Memory usage atlas jobs on GOLIAS, july – september (part) 2004

7 chudoba@fzu.cz CPU Time PIII1.13GHz Xeon 3.06GHz hours PIII700MHz hours queue limit: 48 hours later changed to 72 hours

8 chudoba@fzu.cz Miscellaneous no job name in the local batch system – difficult to identify no (?) documentation where to look for log files, which logs are relevant lost jobs due to CPU time limit - no warning lost jobs due to one missconfigured node - spotted from local logs and by Simone too some jobs loop forever – where to send this information?


Download ppt "ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004."

Similar presentations


Ads by Google