Presentation on theme: "Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008."— Presentation transcript:
Northgrid Status Alessandra Forti Gridpp20 Dublin 12 March 2008
Layout General status Manpower Other VOs Atlas shifts Sites news Conclusions
General Status (1) 90% DPMyes upgradi ngGlite3.1 Sheffiel d 83% dcache working on it installi ng upgradi ngGlite3.1 Manche ster 80% dcache working on it installi ngSL4Glite3.1 Liverpo ol 82% dcache -> DPMyes SL4Glite3.1 Lancast er Aver age avail abilit y Used Storage (TB) Storage (TB) CPU (kSI2K) SRM brand Space Tokens SRM2. 2OS Middle wareSite
General Status (2)
General Status (3)
Man power Lancaster: –Brian Davies –Matt Doidge, Peter Love Liverpool: –Pawel Trepka –Rob Fay, John Bland Manchester: –Colin Morey –Owen McShane, Stuart Wild, Sergey Dolgodobrov Sheffield –Dominic Wilson –Elena Korolkova, Matt Robinson
Other VOs Northgrid VO has been created for VO-less users and is being installed. –Some users have already subscribed it –Users now in gridpp will be moved to northgrid Other VOs are running on our systems –~24 enabled between all sites –hone, dzero and biomed leading the cpu usage of Other VOs
Atlas Shifts Northgrid among the biggest supplier of shifters: 5 people from all sites already involved –Carl Gwilliams expert shifter –Peter Love and Alessandra Forti: senior shifters –Mark Hodgkinson, Paul Hogson: trainees Benefits are evident: site managers have an inside perspective of atlas problems and atlas can benefit from sys admins shifters feedback.
Lancaster news UKLight link to RAL had problems, affected Atlas upload into RAL (Cause: bad 10G card with core Ciena kit in Reading) –http://www.gridpp.rl.ac.uk/blog/2008/02/29/a-week-of-woe-on-the-lancaster- link/http://www.gridpp.rl.ac.uk/blog/2008/02/29/a-week-of-woe-on-the-lancaster- link/ Power cut toasted dCache system disk –Forced a fresh install and upgrade to SL4 –dCache install not smooth Migrating from dCache to DPM –DPM installation trivial, up and running with no problems –Atlas production now on DPM –space tokens in place –data migration underway This weekend, FTS problems from RAL (diagnosis ongoing) –Active transfers still not normal: New data centre, hoarding going up on site
Liverpool News Cluster upgraded to SL4 Working on the dcache upgrade and enabling Space Tokens. –dcache installation not smooth Installed a new more powerful CE and SE Upgraded the rack software servers to 250GB RAID1 to cope with the >100GB size of the ATLAS code. Still testing Puppet as preferred fabric management solution.
Manchester News Manchester setup has been completely reorganised –cfengine configuration rewritten according to tasks and not to host type –All the quick and dirty extra steps have been cleaned up and are now handled by cfengine –Test and trash machines have been reorganised and installation doesnt require any special handling in cfengine or out. –All the certificates have been renewed in one go thanks to the new bulk request/renewal script –Still in the process of upgrading the first cluster as dcache proved to be more complicated than it should be. The WN+pools and CEs are ready to go though. Plan to go ahead and deal with dcache head node more slowly Tickets from GGUS a sorer point than ever –GGUS opens a ticket in RT at each reply…
Sheffield news Benefited by the staff change –Elena is located in the physics department. Upgraded to SRM 2.2 Enabled Space Tokens for atlas Added 2.5 TB of storage Problems with apel accounting due to apel using the wrong batch system Problems with biomed jobs hanging for >70 because there is no time out when a remote server doesnt reply. –Still handled with manual monitoring
Conclusions Northgrid is in a healthy state Upgrades to SL4, SRM 2.2 and enabling space tokens are going on. –We should make it for the deadlines The main problems at the moment –sys admins turn over –dcache installation/upgrade and setup is not smooth Well integrated with atlas and good exploitation from other users communities.