3 ACRC Resources Phase 1 – ~ March 07 –384 core - AMD Opteron 2.6 Ghz dual-socket dual-core system, 8GB Mem. –MVB server room –CVOS and SL 4 on WN. GPFS, Torque/Maui, QLogic InfiniPath Phase 2 - ~ May 08 –3328 core - Intel Harpertown 2.8Ghz dual-socket, quad-core, 8GB Mem. –PTR server room - ~600 meter from MVB server room. –CVOS and SL? WN. GPFS, Torque/Moab, QLogic InfiniPath Storage Project (2008 - 2011) –Initial purchase of additional 100 TB for PP and Climate Modelling groups –PTR server room –Operational by ~ sep 08. –GPFS will be installed on initial 100TB.
4 ACRC Resources 184 Registered Users 54 Projects 5 Faculties Eng Science Social Science Medicine & Dentistry Medical & Vet.
5 PP Resources Initial LCG/PP setup –SE (DPM), CE and 16 core PP Cluster, MON and UI –CE for HPC (and SE and GridFTP servers for use with ACRC facilities) HPC Phase 1 –PP have a 5% target fair-share, and up to 32 concurrent jobs –New CE, but uses existing SE - accessed via NAT (and slow). –Operational since end of Feb 08 HPC Phase 2 –SL 5 will limit PP exploitation in short term. –Exploring Virtualization – but this is a medium- to long-term solution –PP to negotiate larger share of Phase 1 system to compensate Storage –50TB to arrive shortly, operational ~ Sep 08 –Additional networking necessary for short/medium-term access.
6 Storage Storage Cluster –Separate to HPC cluster –Will run GPFS –Being installed and configure as we speak Running a test Storm SE –This is the second time Due to changes in the underlying architecture –Passing simple SAM SE tests But, now removed from BDII –Direct access between storage and WN Through multi-cluster GPFS (rather than NAT) Test and Real system may differ in the following ways… –Real system will have a separate GridFTP server –Possibly NFS export for Physics Cluster –10Gb NICs (Myricom Myri10G PCI-Express)
17 SoC Separation of Concerns –Storage/Compute managed independently of Grid Interfaces –Storage/Compute managed by dedicated HPC experts. –Tap into storage/compute in the manner the electricity grid analogy suggested Provide PP with centrally managed compute and storage –Tarball WN install on HPC –Storm writing files to a remote GPFS mount (devs. and tests confirm this) In theory this is a good idea - in practice it is hard to achieve –(Originally) implicit assumption that admin has full control over all components Software now allows for (mainly) non-root installations –Depend on others for some aspects of support Impact on turn-around times for resolving issues (SLAs?!?!!)
18 General Issues Limit the number of task that we pass on to HPC admins –Set up user, admin accounts (sudo) and shared software areas –Torque - allow remote submission host (i.e. our CE) –Maui – ADMIN3 access for certain users (All users are A3 anyway) –NAT Most other issues are solvable with less privileges –SSH Keys –RPM or rsync for Cert updates –WN tarball for software Other issues –APEL accounting assumes ExecutingCE == SubmitHost (Bug report) –Work around for Maui client - key embedded in binaries!!! (now changed) –Home dir path has to be exactly the same on CE and Cluster. –Static route into HPC private network
19 Qs? Any questions… https://webpp.phy.bris.ac.uk/wiki/index.php/Grid/HPC_Documentation http://www.datadirectnet.com/s2a-storage-systems/capacity-optimized-configuration http://www.datadirectnet.com/direct-raid/direct-raid hepix.caspur.it/spring2006/TALKS/6apr.dellagnello.gpfs.ppt