Presentation is loading. Please wait.

Presentation is loading. Please wait.

The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.

Similar presentations


Presentation on theme: "The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006."— Presentation transcript:

1 The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006

2 Overview A bit of general stuff about the Tier-1 at RAL Show how the Tier-1 support and procurement model impact 3D deployment Some thoughts on the 3D deployment as seen from a Tier-1’s project manager’s perspective.

3 Part of GRIDPP Tier-1 is part of the UK GRIDPP2 project: http://www.gridpp.ac.uk/ http://www.gridpp.ac.uk/ Funded by GRIDPP2 with some support by CCLRC/RAL. GRIDPP2 runs September 2004-August 2007 (May extend to March 2008). GRIDPP3 bid underway for LHC exploitation 16.5 FTE of effort pa. 2.3M pounds hardware budget over 3 years

4 So What is a Tier1? Large Computing Resource (Yes but so are Tier2s) Large Storage Resource (rarer in Tier2s) and if necessary will keep the data safe. Able to commit to and deliver high quality of service. Respond rapidly to service faults Contributing to collaborative services/core infrastructure Providing high quality technical support Able to make long term service commitments Fulfil specific role within main experiment production... 3D Project Clearly fits these criteria well

5 Hardware CPU Farm –About 550 systems –About 1100KSI2K Disk Service – mainly “standard” configuration –Commodity hardware (IDE and SATA) –Mix of external RAID arrays and PCI based RAID –About 2000 spinning drives (when recent delivery comes online) –300TB usable space Further big capacity upgrade in next 6 months.

6 Robot Upgrade Old Robot –6000 slots –Early 1990s hardware – but still pretty good –1 robot arm –supports most recent generation drives but end of line –Still operational but migrate drives shortly and close New Robot: STK SDL8530 –10,000 slots –Life expectancy of at least 2 drive generations –Up to 8 mini robots mounting tapes – faster – resiliant –T10K drives/media in FY06: Faster drives&bigger tapes

7 Capacity Planning Detailed financial/technical capacity planning model allows planning of commitments to LCG. Detailed machine room planning model to ensure infrastructure meets our capacity ramp up. LCG MoU commitments for capacity in FY06/07/.. –Needs capacity purchases early in financial year (ie deliver in Q2). Capacity needs to be about 80% of spend. M&O 10-15% of spend Non capacity about 5-10% (do you get the picture) Money is tight for things such as an H/A Oracle service. Keen to see: –Appropriate hardware solutions (3D was about 6% of H/W spend in 2005) –Accurate capacity estimates We need to know early in the FY what we must spend on Oracle to tension against other stuff.

8 Staff Effort Tier-1 project funds 16.5 FTE –1.5 Tier-1 Project Management/UK ROC management –2 to support the infrastructure (machine room/net) –1 to maintain the hardware (fix faults – RMA etc) –4.5 Grid Infrastructure and experiment support –4.5 to sysadmin the disk/CPU and core (eg 3D) –2.5 to run the tape service (and deploy CASTOR) –0.5 Oracle support (funds part of Gordon’s team) Oracle team at CCLRC E-Science (Gordon’s Group) 2.5 DBAs

9 Oracle Support Effort seems tight –0.5 FTE seems tight to fund the management of a production 3D (plus other T1 Oracle) service. Maybe we underestimated – but if so what else should we cut –2.5 FTE of Oracle team seems tight to operate a production service meeting LCG MoU commitments. Keen to see: –Simple solutions, –Standard installations, –Shared support between Tier-1s and T0

10 3D Hardware 10 06 March 2016 Four servers –Dual AMD Opteron 250 –4GB ECC RAM –dual HotSwap 250GB SATA systems drives on hardware RAID1 mirror –single port 2Gb/s FCAL adapter –in 2U rackmount with dual redundant HS PSUs 1 storage array –Infortrend EonStor A16F-G2221-M2 –16 bay SATA Hotswap with dual 2Gb/s FCAL host –Li-ION backup modules –256MB cache –dual redundant PSUs –Fitted with 16 x Western Digital Raid Edition 250GB SATA drives

11 3D Hardware 11 06 March 2016 1 SAN switch –Qlogic SANbox 8 port FCAL switch (fairly cheap and stackable) All on 3 years NBD on-site Estimated array space for file system –250GB x 14 (RAID 5 + Hot spare) = 3.5TB (10^12) –Drop 6% for Filesystem and convert to actual bytes for true space

12 3D Database Hardware Structure

13 Hardware Future Still unclear about 3D capacity (and functionality) requirements for FY06 Hardware requirement changes (Sep06) too late for us to deploy H/W in FY06. Tier-1 spend plan finalised in the next couple of weeks. –To early to assess recently purchased hardware –Have to make a guess – plan to allocate only £5-10Kto either: Double storage capacity Increase storage resilience Provision more servers Then have to last until April 2007 –Some scope for changes by retrofitting F/C HBA to batch workers. –Buying single disks etc

14 3D Operation Moving from stand-alone test system to core service. Need (at least) partial integration of monitoring with existing tier-1 services –Nagios and Ganglia (not Lemon) –recognise that Oracle has its own specialist monitoring tools Oracle team becoming increasingly involved in Tier-1 weekly operation – no longer considered development. Boundaries between teams can be a problem Need to get Tier-1 support staff integrated.

15 UK Deployment GRIDPP has a well developed national deployment team. Has an excellent track record (eg recent throughput testing) Would probably be good at deploying 3D to Tier-2 sites and carrying out integration tests. Still appear to need to understand how the Tier- 2s will take part

16 Conclusions 3D is clearly a key service and meets the role of the Tier-1 Its going to be challenging to support a production Oracle service with limited human resources. Although requirements seem to be firming up, this is rather late in our procurement cycle and further finance is limited in FY06 Beginning to work to integrate the 3D team into Tier-1 operations


Download ppt "The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006."

Similar presentations


Ads by Google