Presentation on theme: "RAL Tier1: 2001 to 2011 James Thorne GridPP 19 30 th August 2007."— Presentation transcript:
RAL Tier1: 2001 to 2011 James Thorne GridPP th August 2007
30/08/ to 2007 Sorry GridPP, Im afraid I cant do that!
30/08/2007 Result of GridPP3 for Tier1 Good result: –Effort increases from 16.5 to 20.4 FTE –£6.8M hardware budget (cf £2.3M in GridPP2) Extra fault management/hardware staff as size of farm increases A good result but team remains thinly stretched; hardware is just sufficient to meet experiments requirements.
30/08/2007 Planned Tier1 Storage Capacity (TiB)
30/08/2007 Planned Tier1 CPU Capacity (KSI2K)
30/08/2007 Estimated Rack Count
30/08/2007 Estimated number of Disk Servers
30/08/2007 Estimated number of Spinning Drives
30/08/2007 Approximate H.W Value Allocated to Experiments in 2008
30/08/2007 Hardware CPU Disk Tape Further procurements in FY08, FY09 and FY10
30/08/2007 New Machine Room Order placed and contractor has started work 800m 2 can accommodate 300 racks + 5 robots 2.3MW Power/Cooling capacity (some UPS) Office accommodation for all E-Science staff Scheduled to be available for September 2008
30/08/2007 Staffing Lex Holt left Tier1 James Adams is moving from hardware support to Fabric Team system admin Plan to recruit: –Replacement hardware repair position –Two experiment support posts; one ATLAS, one CMS. –Raja Nandakumar as honorary team member from LHCb –Will also shortly commences GridPP3 recruitments
30/08/2007 CASTOR Operational issues mentioned at GridPP 18 were tip of iceberg and CASTOR service was found to be inoperable. Massive amount of re-engineering carried out since March with much effort from CASTOR team. –Huge progress –Areas of concern We are optimistic that CASTOR will be a success
30/08/2007 SL4 20% of batch farm now running SL4 Negotiating with LHC experiments to agree the move of their capacity from SL3 to SL4. Once LHC migration is completed, remaining capacity will follow within a few weeks. Depends on the experiments, but should expect termination of SL3 service in September
30/08/2007 Reliability March: invested a lot of effort without much gain Continue to prioritise reliability and making progress Recently exceeded target, now must maintain Start Sysadmin On Duty in September Start on call later this year
30/08/2007 RAL-LCG2 Availability/Reliability
30/08/2007 CPU Efficiencies CPU efficiency much improved August fall still being investigated March minimum when CASTOR was broken
30/08/2007 CPU Efficiencies
30/08/2007 Termination of GridPP use of ADS Service GridPP funding and use of old legacy Atlas Datastore service scheduled to end at end of March RAL will continue to operate ADS service and experiments are free to purchase capacity directly from ADS Team.
30/08/2007 dCache Closure dCache still supported and working We will give 6 months notice before terminating dCache service No notice of termination yet Aiming to end service by end of GRIDPP2 (March 2008). Also cannot terminate ADS service until dCache ceases.
30/08/2007 Grid Only Move to Grid only access postponed until December 2007 No new local accounts In January 2008: –Batch job submission through RB/CE only (no qsub, some exceptions) –No local login to UIs (some exceptions) –AFS Service will end
30/08/2007 Conclusions Positioning ourselves for LHC production. A lot of good progress with CASTOR and expect to meet the needs of the ATLAS M4 run and CMSs CSA07. Reliability has finally improved.