Download presentation
Presentation is loading. Please wait.
Published byOsborn Jason Lyons Modified over 9 years ago
1
London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney
2
1 st February 2005GridPP 12: London Tier 2 Status LT2 Sites Brunel University Imperial College London –(including London e-Science Centre) Queen Mary University of London Royal Holloway University of London University College London
3
1 st February 2005GridPP 12: London Tier 2 Status LT2 Management Management board had first meeting on 3 rd December 2004 –Next meeting 9 th March 2005 Members of management board: –Brunel: Paul Kyberd –IC: John Darlington (& Steve McGough) –RHUL: Michael Green –QMUL: Alex Martin –UCL: Ben Waugh –Chair: David Colling –Secretary: Owen Maroney
4
1 st February 2005GridPP 12: London Tier 2 Status Brunel 1 WN PBS @ LCG-2_2_0 –R-GMA installed but not APEL In process of adding 60 WN’s –Issues with private networking attempted to resolve with LCG-2_2_0 –Will now proceed directly to LCG-2_3 –Investigating installation of SL on nodes If goes well will use YAIM If goes badly will use RH7.3 LCFG
5
1 st February 2005GridPP 12: London Tier 2 Status Imperial College London 66 CPU PBS HEP farm @ LCG-2_2_0 –APEL installed –Upgrading to LCG-2_3_0 (this week!) –Will still use RH7.3 LFCGng HEP computing undergoing re-organisation LCG nodes will be incorporated into SGE cluster, and made available to LCG (dependancy on LeSC SGE integration) Will re-install as RHEL OS at that time. London e-Science Centre –Problems over internal re-organisation –SGE farm, 64bit RHEL Problems with default installation tool (APT) supplied by LCG Also LCG-2_3 not supported on 64bit systems Working on deploying LCG-2_3 on 32bit frontend nodes using YUM and RHEL Tarball install on WN. Hope this is binary compatible! –Then need to work on SGE information provider
6
1 st February 2005GridPP 12: London Tier 2 Status Queen Mary 320 CPU Torque farm OS is Fedora 2 –Currently running LCG-2_1_1 on frontend, LCG-2_0_0 on WN. More up-to-date versions of LCG were not binary compatible with Fedora –Trinity College Dublin have recently provided Fedora port of LCG-2_2_0 and are working on port of LCG-2_3_0 –Will install LCG-2_3_0 frontend as SL3 machines, using yaim. Install LCG-2_2_0 on Fedora WN Upgrade to 2_3_0 on WN when TCD ready.
7
1 st February 2005GridPP 12: London Tier 2 Status Royal Holloway Little change: 148 CPU PBS farm –APEL installed –But no data reported! Very little manpower available Currently running LCG-2_2_0 –Hoped to upgrade to LCG-2_3_0 during February Late breaking news…. RHUL PBS server hacked and taken offline….
8
1 st February 2005GridPP 12: London Tier 2 Status University College London UCL-HEP 20 CPU PBS farm @ LCG-2_2_0 –In process of upgrading to LCG-2_3_0 –Frontends SL3 using YAIM –WN stay on RH7.3 UCL-CCC 88 CPU PBS farm @ LCG-2_2_0 –Running APEL –Upgrade to LCG-2_3_0 SL3 during February
9
1 st February 2005GridPP 12: London Tier 2 Status Contribution to GridPP Promised vs. Delivered : No change since GridPP11 SitePromisedDelivered CPUkSI2KTBCPUkSI2KTB Brunel30 1440.1 IC (HEP)170791078390.4 IC (LeSC)916*3416--- QMUL444*31713.5356*27525.0 RHUL23020413.22302045.6 UCL-HEP---25201.9 UCL-CCC192*600.8961000.5 Total1982103034.576164233.5 *CPU count includes shared resources where CPU’s are not 100% dedicated to Grid/HEP kSI2K value takes this sharing into account
10
1 st February 2005GridPP 12: London Tier 2 Status Usage by VO (APEL) Jobs Nov 2004Dec 2004Jan 2005 alice000 atlas040117 cms000 dteam09491430 lhcb0447412 zeus033675 CPU Nov 2004Dec 2004Jan 2005 alice000 atlas02,710,2764,120,225 cms000 dteam0204,7775,379 lhcb036,337,22714,322,962 zeus0124,5852,983,753
11
1 st February 2005GridPP 12: London Tier 2 Status Usage by VO (Jobs)
12
1 st February 2005GridPP 12: London Tier 2 Status Usage by VO (CPU)
13
1 st February 2005GridPP 12: London Tier 2 Status Site Experiences (I) Storage Elements are all ‘classic’ gridftp servers –Still waiting for deployment release of SRM solution Problems with experiments use of Tier 2 Storage –Assumption: Tier 2 SE used as a import/export buffer for local farm Input data staged in for jobs on farm Output data staged out to long term storage at Tier 0/1 Tier 2 not permanent storage: no backup! –In practice: Grid does not distinguish between SE’s. No automatic data migration tools. No SE “clean-up” tools. All SE’s advertised as “Permanent” by default. –“Volatile” and “Durable” settings only appropriate for SRM? SE’s fill up with data: become ‘read-only’ data servers –Some datafiles left on SE without entry in RLS: dead-space! –One VO can fill an SE blocking all other VO’s »Disk quota integration with information provider Clean-up tools needed to deal with files older than “x” weeks? –Delete from SE and entry in RLS, if another copy exists –Migrate to different (nearest Tier 1?) SE if only copy –But site admin needs to be in all VO’s to do this!
14
1 st February 2005GridPP 12: London Tier 2 Status Site Experiences (II) Timing and release of LCG-2_3_0 still could have been improved –Information flow (pre-)release still a problem. –But at least a long upgrade period was allowed! –Structure of documentation changed Generally an improvement Some documents clearly not proof read before release BUT: NO LT2 sites have managed to upgrade yet! WHY NOT? –Lot’s of absence over Christmas/New Year period: not really 2 months –Perception that YAIM installation tool was not mature: lots of ‘bugs’ Bugs fixed quickly, but still the temptation to let other sites ‘go first’ YAIM did not originally handle separate CE and PBS server –Most common configuration in LT2! –Still need to schedule time against other constraints Hardware support posts still not appointed Sites still supported on unfunded ‘best-effort’ basis. –Uncertainty at sites if experiments were ready to use SL New release schedule proposed by LCG Deployment at CERN should help –As should appointment of hardware support posts
15
1 st February 2005GridPP 12: London Tier 2 Status Summary Little change since GridPP11 –R-GMA and APEL installations –Additional resources (Brunel, LeSC) still to come online –Failure to upgrade to LCG-2_3_0 rapidly Significant effort over Summer 2004 put a lot of resources into LCG –But manpower was coming from unfunded ‘best-effort’ –When term-time starts, much less effort available! Maintenance manageable Upgrades difficult Major upgrades very difficult! Use of resources in practice is turning out to be different to expectations!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.