Presentation is loading. Please wait.

Presentation is loading. Please wait.

RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.

Similar presentations


Presentation on theme: "RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper."— Presentation transcript:

1 RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper

2 My talk will be... Where we’re at now Our new stuff, including – GridPP purchases – DRI networking kit Benchmarking and hyperthreads Virtual machine infrastructure Managing configuration and stuff: cfEngine vs Puppet Future stuff

3 RALPP For Dummies Part of SouthGrid Staff – Chris Brew (part) – Rob Harper (part) One cluster serving Tier 2 (85%) and Tier 3 (15%), managed by Torque/Maui dCache storage

4 RALPP CPU

5 Cluster is currently nominally: 2,872 Job slots 26,409 HS06 Where available, hyperthreads used to get 150% of physical cores

6 RALPP Storage TB

7 RALPP Storage 1,060 TB in production Soon to be 1,260 TB

8 New Stuff: GridPP Purchases CPU: – 9 * Viglen/Supermicro Twin 2 Intel E5645 based 48 GB / node Using hyperthreads => 648 job slots, 6208 HS06 Disk: – 5 * Viglen/Supermicro 24 bay storage nodes => 200 TB of disk pool

9 New Stuff: Networking DRI money bought us: – 5 * Force10 s4810 switches – A heap of 10Gb NICs for older disk pool nodes – A heap of 10Gb cables Coming soon: a much reconfigured network...

10 New Network Layout

11 Benchmarking & Hyperthreads We ran HS06 benchmark on a heap of nodes with varying numbers of concurrent benchmark jobs Going past # of physical cores did give us some gains

12 Benchmarking & Hyperthreads So we committed 1.5 * physical cores as job slots for some nodes and ran real jobs No significant drop in efficiency More work done Many details on SouthGrid blog at http://bit.ly/Iu7BfS

13 Virtual Machines Current set-up: – Xen VMs spread between a couple of servers – Local storage, nothing clever Currently in test: – Cluster running HyperV Yes, we’ll be running Linux VMs on Windows – EqualLogic storage iSCSI Mirroring, etc.

14 Configuration Management Already much discussed yesterday, but here’s our perspective... We currently rely on cfEngine v2 This is not supported natively on SL6 (or at all) Main options seem to be: – Crowbar in legacy cfEngine – cfEngine v3 – will need configs rewritten – Switch to Puppet – will need configs rewritten

15 Puppet Puppet seems to be a strong choice Particularly as other Tier 2s are coming to the same decision Not got far yet We have a working Puppet Master with some basic manifests set up We have an SL6 client for test purposes Planning to use Puppet for SL6 hosts as we set them up – leaving SL5 kit on cfEngine

16 Puppet Our cfEngine config relies massively on EditFiles functionality Puppet does not have this – Can run scripts to do edits – Can use modules (eg. iptables) that do the work for you We need to learn to think in a different way to take advantage of Puppet

17 Things to come... Getting network configuration updated Start deploying VMs in HyperV Getting Puppet configuration management running properly Start using SL6 as a standard install for services where we have no reason not to Improved monitoring


Download ppt "RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper."

Similar presentations


Ads by Google