Presentation is loading. Please wait.

Presentation is loading. Please wait.

London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.

Similar presentations


Presentation on theme: "London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney."— Presentation transcript:

1 London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney

2 15 September 2004GridPP 11: London Tier 2 Status LT2 Sites Brunel University Imperial College London –(including London e-Science Centre) Queen Mary University of London Royal Holloway University of London University College London

3 15 September 2004GridPP 11: London Tier 2 Status LT2 Management Internal LT2 MoU signed by all institutes MoU with GridPP signed by David Colling as acting chair of Management Board Management board being formed but has not yet met Technical board meets every three to four weeks

4 15 September 2004GridPP 11: London Tier 2 Status Contribution to LCG2 ‘Snapshot’ taken on 26 th August –Number of WN CPUs in use by LCG SiteWNsJobs RunningJobs Waiting IC66 21 QMUL*32028743 RHUL148 70 UCL-HEP202140 UCL-CCC8817623 Total646698197 *QMUL have since turned on hyperthreading and now allow up to 576 jobs Brunel joined LCG2 on 3 rd September

5 15 September 2004GridPP 11: London Tier 2 Status Brunel Test system (1WN) PBS farm @ LCG-2_2_0 Joined Testzone on 3 rd September –Completely LCFG installed In process of adding 60 WN’s –LCFG installation –Private network –Some problems with SCSI drives and network booting with LCFG Have had problems with local firewall restrictions –These now seem to be resolved GLOBUS_TCP_PORT_RANGE is not the default range

6 15 September 2004GridPP 11: London Tier 2 Status Imperial College London 66 CPU PBS HEP farm @ LCG-2_1_1 Joined LCG2 prior to 1 st April –Completely LCFG installed –In core zone –Early adopter of RGMA London e-Science Centre has 900 CPU cluster –Cluster runs on locally patched RH7.2 version Shared facility: no possibility of changing operating system Could not install LCG2 on RH7.2 Will run RH7.3 under User Mode Linux to install LCG2 –Batch system (Sun Grid Engine) is not currently supported by LCG LeSC have already provided a globus-jobmanager for SGE Work is in progress on updating this for LCG jobmanager Information provider is being developed by Durham Interests in SGE from other sites LeSC Cluster will soon be SAMGrid enabled –LCG2 to follow

7 15 September 2004GridPP 11: London Tier 2 Status Queen Mary 348 CPU Torque+Maui Farm @ LCG-2_1_0 Joined Testzone on 6 th July Private networked WN’s Existing Torque server –“Manual” installation on WNs (local automated procedure) –LCFG installed CE and SE –Configure CE to be client to Torque server OS is Fedora 2 –Only site in LCG2 not running RH7.3 ! Recently turned on hyperthreading –Offers 576 job slots

8 15 September 2004GridPP 11: London Tier 2 Status Queen Mary Fedora 2 Port CE and SE are LCFG installed RH7.3 LCG-2_0_0 was installed on Fedora 2 WN –tar up the /opt directory from an LCFG installed RH7.3 node –untar it on the Fedora 2 WN. –Only needed to recompile the Globus toolkit. –Also jar files in /usr/share/java needed –Everything worked! For LCG-2_1_0 this method failed! –The upgraded edg-rm functions no longer worked. –Recompiling the Globus toolkit did not help. –LCG could not provide SRPMS for the edg-rm Current status: –LCG-2_1_0 on SE, CE with LCG-2_0_0 on WN –Seems to work! But is clearly not ideal… With LCG-2_2_0 upgrade will test the lcg-* utilities –Will replace the edg-rm functions

9 15 September 2004GridPP 11: London Tier 2 Status Royal Holloway 148 CPU PBS+Maui Farm @ LCG-2_1_1 Joined Testzone on 19 th July Private networked WN’s Existing PBS server –Manual installation on WN’s –LCFG installed CE and SE –Configure CE to be client to PBS server Shared NFS /home directories –Uses pbs jobmanager, not lcgpbs jobmanager –Needed to configure WN for jobs to run in scratch area not enough space in /home for whole farm Some problems still under investigation –Stability problems with Maui. –Large, compressed files sometimes become corrupted when copied to the SE Looks like a hardware problem. Also: 80 cpu Babar farm running Babargrid

10 15 September 2004GridPP 11: London Tier 2 Status University College London UCL-HEP 20 CPU PBS farm @ LCG-2_1_1 Joined Testzone 18 th June Existing PBS server –Manual installation of WN’s –LCFG installed CE and SE –Configure CE to be client to PBS server Shared /home directories –Uses pbs jobmanager, not lcgpbs jobmanager so far no problems with space on shared /home Hyperthreading allows up to 76 jobs, but grid queues are restricted to less than this Stability problems with OpenPBS

11 15 September 2004GridPP 11: London Tier 2 Status University College London UCL-CCC 88 CPU PBS farm @ LCG-2_2_0 Joined Testzone on 24 th June –Had power failure that took farm offline from 4 th to 25 th August Originally had cluster of 192 CPUs running Sun Grid Engine under RH9 –UCL central computing services agreed to reinstall half of farm with RH7.3 for LCG, using LCFG Hyperthreading allows 176 jobs (44 dual- CPU WNs)

12 15 September 2004GridPP 11: London Tier 2 Status Contribution to GridPP Promised vs. Delivered SitePromisedDelivered CPUkSI2KTBCPUkSI2KTB Brunel30 1440.1 IC (HEP)170791078390.4 IC (LeSC)916*3416--- QMUL444*31713.5356*27525.0 RHUL23020413.22302045.6 UCL-HEP---25201.9 UCL-CCC192*600.8961000.5 Total1982103034.576164233.5 *CPU count includes shared resources where CPU’s are not 100% dedicated to Grid/HEP kSI2K value takes this sharing into account

13 15 September 2004GridPP 11: London Tier 2 Status Site Experience Storage Elements are all ‘classic’ gridftp servers –Cannot pool large TB raid arrays to deploy large disk spaces Many farms are shared facilities –Existing batch queues Manual installation of WN – needs to be automated for large farms! CE becomes client to batch server –Private networked WNs Needed additional Replica Manager configuration –Some OS constraints Lack of all SRPMS still a problem Most sites taken by surprise by lack of warning of new releases –Problems scheduling workload –Documentation has improved –But communication could be improved further! The default LCFG installed farms (IC-HEP, UCL-CCC) have been amongst the most stable and easily upgraded –But this is not an option for most significant Tier 2 resources

14 15 September 2004GridPP 11: London Tier 2 Status Summary LT2 sites have managed to contribute a significant amount of resources to LCG –Still more to come! This has required a significant amount of (unfunded) effort from staff – HEP and IT – at the institutes –2.5 GridPP2 funded support posts to be appointed soon –Will help! Any deviation from a “standard” installation comes at a price –Installation, upgrades, maintenance –But the large resources at Tier 2s tend to be shared facilities don’t have the freedom to install a “standard” OS, whatever it might be LCG moving from RH7.3 to Scientific Linux will not necessarily help!


Download ppt "London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney."

Similar presentations


Ads by Google