Presentation is loading. Please wait.

Presentation is loading. Please wait.

UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.

Similar presentations


Presentation on theme: "UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009."— Presentation transcript:

1 UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009

2 SouthGrid Status May 2009 2 SouthGrid Tier 2 The UK is split into 4 geographically distributed tier 2 centres SouthGrid comprise of all the southern sites not in London New sites likely to join

3 SouthGrid Status May 2009 3 UK Tier 2 reported CPU – Historical View to present

4 SouthGrid Status May 2009 4 SouthGrid Sites Accounting as reported by APEL

5 SouthGrid Status May 2009 5 Site Upgrades in the last 6 months RALPPD Increase of 640 cores (1568KSI2K) +380TB Cambridge 32 cores (83KSI2K) + 20TB Birmingham 64 cores on pp cluster and 128 cores HPC cluster which add ~430KSI2K Bristol original cluster replaced by new quad cores systems 16 cores + increased share of the HPC cluster 53KSI2k + 44TB Oxford extra 208 cores 540KSI2K + 60TB Jet extra 120 cores 240KSI2K

6 SouthGrid Status May 2009 6 New Total Q109 SouthGrid 999.55501 6332743 160972 60455 55120 90728 1.5483 Totals RALPPD Oxford Cambridge Bristol Birmingham EDFA-JET Storage (TB) CPU (kSI2K) GridPP % of MoU CPU% of MoU Disk 304.35%142.86% 96.77%343.75% 469.07%230.77% 592.68%363.64% 329.63%374.56% 377.47%314.31%

7 SouthGrid Status May 2009 7 Site Setup Summary SiteCluster/sInstallation Method Batch System BirminghamDedicated & Shared HPCPXE, Kickstart, CFEngine. Tarball for HPC Torque BristolSmall Dedicated & Shared HPC PXE, Kickstart, CFEngine. Tarball for HPC Torque CambridgeDedicatedPXE, Kickstart, custom scripts Condor JETDedicatedKickstart, custom scripts Torque OxfordDedicatedPXE, Kickstart, CFEngine Torque RAL PPDDedicatedPXE, Kickstart, CFEngine Torque

8 SouthGrid Status May 2009 8 Oxford Central Physics Centrally supported Windows XP desktops (~500) Physics wide Exchange Server for email –BES to support Blackberries Network services for MAC OSX –Astro converted entirely to Central Physics IT services (120 OSX systems) –Started experimenting with Xgrid Media services –Photocopiers/printers replaced – much lower costs than other departmental printers. Network –Network is too large. Looking to divide into smaller pieces – better management and easier to scale to higher performance. –Wireless – introduced EDUROAM on all physics WLAN base stations. –Identified problems with 3com 4200G switch which caused a few connections to run very slowly. Now fixed. –Improved network core and computer room with redundant pairs of 3com 5500 switches.

9 SouthGrid Status May 2009 9 Oxford Tier 2 Report Major Upgrade 2007 Lack of decent Computer room with adequate power and A/C held back upgrading our 2004 kit until Autumn 07 11 systems, 22 servers, 44 cpus, 176 cores. Intel 5345 Clovertown cpu’s provide ~430KSI2K, 16GB memory for each server. Each server has a 500GB SATA HD with IPMI remote KVM cards. 11 servers each providing 9TB usable storage after RAID 6, total ~99TB, 3ware 9650-16ML controller. Two racks, 4 Redundant Management Nodes, 4 APC 7953 PDU’s, 4 UPS’s

10 SouthGrid Status May 2009 10 Oxford Physics now has two Computer Rooms Oxford’s Grid Cluster initially housed in the departmental Computer room late 2007 Later moved to the new shared University room at Begbroke (5 miles up the road)

11 SouthGrid Status May 2009 11 Oxford Upgrade 2008 13 systems, 26 servers, 52 cpus, 208 cores. Intel 5420 Harpertown cpu’s provide ~540KSI2K, 16GB Low Voltage FBDIMM memory for each server. Each server has a 500GB SATA HD. 3 servers each providing 20TB usable storage after RAID 6, total ~60TB, Areca Controllers More of the same but better!

12 SouthGrid Status May 2009 12 Nov 2008 Upgrade to the Oxford Grid Cluster at Begbroke Science Park

13 SouthGrid Status May 2009 13 Newer generation Intel Quads take less power Tested using one cpuburn process per core on both sides of a twin killing a process every 5 minutes. Electrical Power consumption Busy 645W Idle 410W Busy 490W Idle 320W Intel 5345 Intel 5420

14 SouthGrid Status May 2009 14 Electricity Costs* We have to pay for the electricity used at the Begbroke Computer Room: Cost in electricity to run old (4 years) Dell nodes is ~£8600 per year. (~79 KSI2k) Replacement cost in new twins is ~£6600 with electricity cost of ~£1100 per year. So saving of ~£900 in the first year and £7500 per year there after. Conclusion is, its not economically viable to run kit older than 4 years. * Jan 2008 figures

15 SouthGrid Status May 2009 15 IT related power saving Shutting down desktops when idle –Must be idle, logged off, no shared printers or disks, no remote access etc. –140 machines regularly shut down –Automatic power up early in the morning to apply patches and get ready for user (using Wake-On-LAN) Old cluster nodes removed/replaced with more efficient servers Virtualisation reduces number of servers and power. Computer room temperatures raised to improve A/C efficiency (from 19C to 23-25C) Windows 2008 server allows control of new power saving options on more modern desktop systems

16 SouthGrid Status May 2009 16 CPU Benchmarking HEPSPEC06 hostnamecpu typememoryno of coreshepspec06hepspec06/core node102.4GHZ zeon4GB273.5 node102..4GHz4GB26.963.48 t2wn61E5345 2.33GHz16GB857.747.22 pplxwn16E5420 2.5GHz16GB864.888.11 pplxint3E5420 2.5GHZ16GB864.718.09 These figures match closely with those published on http://www.infn.it/CCR/server/cpu2006-hepspec06-table.jpg http://www.infn.it/CCR/server/cpu2006-hepspec06-table.jpg

17 SouthGrid Status May 2009 17 Roughly equal share between LHCb and ATLAS for CPU hours. ATLAS runs many short jobs. LHCb longer jobs. Cluster occupancy approx 70% so still room for more jobs. Local contribution To Atlas MC storage Cluster Usage at Oxford

18 SouthGrid Status May 2009 18 In out Oxford recently had its network link rate capped to 100mbs This was as a result of continuous 300- 350mbs traffic caused by CMS commissioning stress testing. As it happens this test completed at the same time as we were capped, so we passed the test, and current normal use is not expected to be this high Oxfords Janet link is actually 2 * 1gbit links which had become saturated. Short term solution is to only rate cap JANET traffic to 200mbs which doesn’t impact on normal working (for now) all other on site traffic remains at 1gbs. Long term plan is to upgrade the JANET link to 10gbs within the year. 200mbps

19 SouthGrid Status May 2009 19 gridppnagios Have setup a nagios monitoring site for the UK which several other sites use to get advanced warnings of failures. https://gridppnagios.physics.ox.ac.uk/nagios/ https://twiki.cern.ch/twiki/bin/view/LCG/Gr idServiceMonitoringInfo https://gridppnagios.physics.ox.ac.uk/nagios/ https://twiki.cern.ch/twiki/bin/view/LCG/Gr idServiceMonitoringInfo

20 SouthGrid Status May 2009 20 The End But Since some of you may remember my old pictures of computer rooms with parquet floor and computers running in basements without A/C Some pictures showing the building of Oxford Physics Local Infrastructure computer room.

21 SouthGrid Status May 2009 21 Local Oxford DWB Computer room Completely separate from the Begroke Science park a computer room with 100KW cooling and >200KW power is being built. ~£150K Oxford Physics Money. Local Physics department Infrastructure. Was completed Sept 2007. This will relieve local computer rooms and house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.


Download ppt "UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009."

Similar presentations


Ads by Google