Presentation is loading. Please wait.

Presentation is loading. Please wait.

London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.

Similar presentations


Presentation on theme: "London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling."— Presentation transcript:

1 London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling

2 4 th July 2005GridPP 13: London Tier 2 Status Brunel 2 WN PBS @ LCG-2_4_0 –R-GMA and APEL installed –RH7.3 LCFG installed Additional farm being installed –SL3 –Private networked WN –16 nodes –Expected to move into production after 2_6_0 upgrade –Hoping to bring further resources over the summer –Recruiting support post with RHUL (Job offer made)

3 4 th July 2005GridPP 13: London Tier 2 Status Imperial College London Appointment of Mona Aggarwal to GridPP Hardware Support Post 52 CPU Torque HEP farm @ LCG-2_5_0 –RGMA and APEL installed –OS RHEL 3 IC HEP participating in SC3 as the UK CMS site –dCache SRM installed with 2.6TB storage + 6TB on order –Another 6TB on order Numerous power outages (scheduled and unscheduled) have caused availability problems London e-Science Centre -SAMGrid installed across HEP and LeSC Certified for D0 data reprocessing 186 Job Slots –SGE farm, 64bit RHEL Globus-jobmanager installed Beta version of SGE plug-in to generic information provider Firewall issues had blocked progress but this has now been resolved. Testing will start soon. –“Community of Interest” mailing list established for sites interested in SGE integration with LCG –coi-sge-lcg@imperial.ac.ukoi-sge-lcg@imperial.ac.uk 19 subscribers from sites in UK, Italy, Spain, Germany, France, Russia

4 4 th July 2005GridPP 13: London Tier 2 Status Queen Mary 320 CPU Torque farm –After difficulties with Fedora 2, have moved LCG WN to SL3 –Departure of key staff member just as LCG-2_4_0 released led to manpower problems GridPP Hardware Support post filled Guiseppe Mazza start(ed) 1 st July –RGMA and APEL installed early in June.

5 4 th July 2005GridPP 13: London Tier 2 Status Royal Holloway Little change: 148 CPU Torque farm –LCG 2_4_0 –OS SL3 –RGMA installed Problems with APEL default installation Gatekeeper and batch server on separate nodes Little manpower available –Shared GridPP Hardware Support post with Brunel still in recruitment process Job offer made?

6 4 th July 2005GridPP 13: London Tier 2 Status University College London UCL-HEP 20 CPU PBS farm @ LCG-2_4_0 –OS SL3 –RGMA installed Problems with APEL default installation Separate batch server to gatekeeper UCL-CCC 88 CPU Torque farm @ LCG- 2_4_0 –OS SL3 –RGMA and APEL installed –Main cluster is SGE farm interest in putting SGE farm into LCG and integrating nodes into single farm

7 4 th July 2005GridPP 13: London Tier 2 Status Current site status summary SiteService nodes Worker nodes Local network connectivity Site connectivity SRMDays SFT failed Days in scheduled maintenance BrunelRH7.3 LCG2.4.0 1Gb100MbNo2116 ImperialRHEL3 LCG2.5.0 1Gb dCache2628 QMULSL3 LCG2.4.0 SL3 LCG2.4.0 1Gb100MbNo4512 RHULRHEL3 LCG2.4.0 1Gb No2229 UCL (HEP)SL3 LCG2.4.0 SL3 2.4.0 1Gb No930 UCL (CCC)SL3 LCG2.4.0 SL3 LCG2.4.0 1Gb No129 1)Local network connectivity is that to the site SE 2)It is understood that SFT failures do not always result from site problems, but it is the best measure currently available.

8 4 th July 2005GridPP 13: London Tier 2 Status LCG resources SiteEstimated for LCGCurrently delivering to LCG Total job slots CPU (kSI2K) Storage (TB) Total jobs slots CPU (kSI2K) Storage (TB) Brunel60 1440.4 IC66331652263 QMUL57224713.54642000.1 RHUL1421673.21481677.7 UCL2041080.8186980.8 Total104461534.585449512 1) The estimated figures are those that were projected for LCG planning purposes: http://lcg-computing-fabric.web.cern.ch/LCG-Computing-Fabric/GDB_resource_infos/Summary_Institutes_2004_2005_v11.htm 2) Current total job slots are those reported by EGEE/LCG gstat page.

9 4 th July 2005GridPP 13: London Tier 2 Status Resources used per VO over quarter (kSI2K hours) Site CPUALICEATLASBABARCMSLHCBZEUSTotal Brunel 6149155 Imperial 198482214,8633126,263 QMUL 4111682,69782,854 RHUL 1,1241,8407942,21845,261 UCL 6,98212614,11521,223 Total1,1439,711548144,042312155,756 Data taken from APEL

10 4 th July 2005GridPP 13: London Tier 2 Status Expressed as a pie chart Njobs percentage numbers of jobs 51,209 according to APEL

11 4 th July 2005GridPP 13: London Tier 2 Status Site Experiences LCG-2_4_0 release was first “scheduled” release date –Despite a slippage of 1 week in the release (and an overlap with EGEE conference) all LT2 sites upgraded within 3 weeks Some configuration problems for a week after –Overall experience was better than the past Farms are not fully utilised –This is true of the grid as a whole –Will extend the range of VOs supported Overall improvement in Scheduled Downtime (SD) compared to previous quarter. –QMUL had manpower problems NB: Although QMUL had highest number of (SFT failure+SD) provided most actual processing power during quarter! –IC had several scheduled power outages, plus two unscheduled power failures Caused knock-on failures for sites using BDII hosted at IC IC installed dCache SRM in preparation for SC3 –Installation configuration not simple: default configuration was not suitable for most Tier 2 sites and changing from the default was hard –Some security concerns: installations not Secure by Default Coordinator- Owen Leaving in two weeks Have made an offer


Download ppt "London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling."

Similar presentations


Ads by Google