Presentation on theme: "London Tier2 Status O.van der Aa. Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central,"— Presentation transcript:
London Tier2 Status O.van der Aa
Slide 2 LT 2 21/03/2007 London Tier2 Status Current Resource Status 7 GOC Sites using sge, pbs, pbspro –UCL: Central, Hep –Imperial: Hep, LeSC, ICT –Queen Mary –Royal Holloway –Brunel Total –CPU: 2.6 MSI2K –Disk: 94 TB disk (DPM and dCache)
Slide 3 LT 2 21/03/2007 London Tier2 Status MoU, where are we ? For the Disk we are at 48% of what was promised But KSI2K/TB Ratio=28 ! Sept 2007 CPU Target
Slide 4 LT 2 21/03/2007 London Tier2 Status London CPU Load Usage=(Apel CPU Time)/(Potential CPU Time) Potential CPU Time=(KSI2K Online)*(hours in a month) Monthly potential=1.7 MSI2K*hours Gives an view on how well we perform wrt cpu
Slide 5 LT 2 21/03/2007 London Tier2 Status CPU Time per VO 1) Biomed stopped in dec 2) Recovered with lhcb/cms - Supporting 21 VO helps to keep you CPU busy.
Slide 6 LT 2 21/03/2007 London Tier2 Status CPU Time: Site contributions
Slide 7 LT 2 21/03/2007 London Tier2 Status What contribution in the UK Tier2
Slide 8 LT 2 21/03/2007 London Tier2 Status New resources Online In the last quarter, both Imperial and Brunel 440 KSI2K 60 TB (dCache) 208 KSI2K 6 TB (DPM) Now has a second 1Gb connection
Slide 9 LT 2 21/03/2007 London Tier2 Status New resources to come Imperial ICT shared resources: we will get 300 KSI2K out of it. –It runs pbspro. –Well use the IC-HEP SE. –What is currently there ? We have one frontend with a VM to run RHEL3/i386 for the CE installation All RPMS installed –What need to be done ? Accounting. Adapt the GIP plugins.
Slide 10 LT 2 21/03/2007 London Tier2 Status New resources to come RHUL new cluster –Will be located at 265 kSI2k of CPU. 126 TB storage. –Remotely manager but there will be staff on site that can reboot, change disks. –The existing resources will also move there. –UL-CC is the SJ5 POP.
Slide 11 LT 2 21/03/2007 London Tier2 Status New VOs NGS enabled at Imperial LeSC but –Test suite failed globus submission without queue parameter. –Does not seem to be a problem on the sge jobmanager side. Camont and Total enabled on our RB. –RB coping difficulty with the cms production.
Slide 12 LT 2 21/03/2007 London Tier2 Status Storage is our weak point. –Tune DPM installs in all London sites. –Start with the biggest sites (QMUL) Install more pools to distribute the load. Make sure we use the latest kernels. Allocate individual pools for big VOs. –Stress the SE using CMS merge jobs or ATLAS equivalent. Cross site support –Becoming more and more important. Example: help solve getting atlas data out of ucl. –Almost all sites agreed to give access to others. But the level of access is not uniform. Need to be implemented How do we handle tickets ? What do we need to improve ?
Slide 13 LT 2 21/03/2007 London Tier2 Status Every site admins have to many source of monitoring: –SAM, Gstat, CMS Dashboard, –GridLoad, LogWatches, Dirac Monitoring. Need to aggregate different sources in one place. –Nagios is a good candidate. Possibly one instance in London Example What to improve: better monitoring #Aborted Jobs Home dir full Problem solved
Slide 14 LT 2 21/03/2007 London Tier2 Status Conclusion CPU –Monthly > 1 MSI2k*hours. –Utilization around 65%. –Will get additional 565 KSI2K. Disk –We really need more focus –Tune our DPM setups –Increase our cpu/disk ratio –Test with real cms/atlas jobs Availability –Cross site support. –Integrate the monitoring tools that exists there within nagios. 16 April: LT2 Workshop at Imperial to encourage non HEP users on to the Grid.
Slide 15 LT 2 21/03/2007 London Tier2 Status Thanks to all of the Team M. Aggarwal, D. Colling, A. Chamberlin, S. George, K. Georgiou, M. Green, W. Hay, P. Hobson, P. Kyberd, A. Martin, G. Mazza, D. Rand, G. Rybkine, G. Sciacca, K. Septhon, B. Waugh, LT 2
Slide 16 LT 2 21/03/2007 London Tier2 Status Backup slide: LCG RB backlog Matching to slow A lot of jobs waiting to be matched What is the cure ? Move to the glite wms ?