Presentation on theme: "RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007."— Presentation transcript:
RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007
30/08/2007 email@example.com 2001 to 2007 Sorry GridPP, Im afraid I cant do that!
30/08/2007 firstname.lastname@example.org Result of GridPP3 for Tier1 Good result: –Effort increases from 16.5 to 20.4 FTE –£6.8M hardware budget (cf £2.3M in GridPP2) Extra fault management/hardware staff as size of farm increases A good result but team remains thinly stretched; hardware is just sufficient to meet experiments requirements.
30/08/2007 email@example.com Estimated number of Disk Servers
30/08/2007 firstname.lastname@example.org Estimated number of Spinning Drives
30/08/2007 email@example.com Approximate H.W Value Allocated to Experiments in 2008
30/08/2007 firstname.lastname@example.org Hardware CPU Disk Tape Further procurements in FY08, FY09 and FY10
30/08/2007 email@example.com New Machine Room Order placed and contractor has started work 800m 2 can accommodate 300 racks + 5 robots 2.3MW Power/Cooling capacity (some UPS) Office accommodation for all E-Science staff Scheduled to be available for September 2008
30/08/2007 firstname.lastname@example.org Staffing Lex Holt left Tier1 James Adams is moving from hardware support to Fabric Team system admin Plan to recruit: –Replacement hardware repair position –Two experiment support posts; one ATLAS, one CMS. –Raja Nandakumar as honorary team member from LHCb –Will also shortly commences GridPP3 recruitments
30/08/2007 email@example.com CASTOR Operational issues mentioned at GridPP 18 were tip of iceberg and CASTOR 2.1.2 service was found to be inoperable. Massive amount of re-engineering carried out since March with much effort from CASTOR team. –Huge progress –Areas of concern We are optimistic that CASTOR will be a success
30/08/2007 firstname.lastname@example.org SL4 20% of batch farm now running SL4 Negotiating with LHC experiments to agree the move of their capacity from SL3 to SL4. Once LHC migration is completed, remaining capacity will follow within a few weeks. Depends on the experiments, but should expect termination of SL3 service in September
30/08/2007 email@example.com Reliability March: invested a lot of effort without much gain Continue to prioritise reliability and making progress Recently exceeded target, now must maintain Start Sysadmin On Duty in September Start on call later this year
30/08/2007 firstname.lastname@example.org CPU Efficiencies CPU efficiency much improved August fall still being investigated March minimum when CASTOR was broken
30/08/2007 email@example.com CPU Efficiencies
30/08/2007 firstname.lastname@example.org Termination of GridPP use of ADS Service GridPP funding and use of old legacy Atlas Datastore service scheduled to end at end of March 2008. RAL will continue to operate ADS service and experiments are free to purchase capacity directly from ADS Team.
30/08/2007 email@example.com dCache Closure dCache still supported and working We will give 6 months notice before terminating dCache service No notice of termination yet Aiming to end service by end of GRIDPP2 (March 2008). Also cannot terminate ADS service until dCache ceases.
30/08/2007 firstname.lastname@example.org Grid Only Move to Grid only access postponed until December 2007 No new local accounts In January 2008: –Batch job submission through RB/CE only (no qsub, some exceptions) –No local login to UIs (some exceptions) –AFS Service will end
30/08/2007 email@example.com Conclusions Positioning ourselves for LHC production. A lot of good progress with CASTOR and expect to meet the needs of the ATLAS M4 run and CMSs CSA07. Reliability has finally improved.
Your consent to our cookies if you continue to use this website.