Presentation on theme: "Southgrid Status 2001-2011 Pete Gronbech: 30 th August 2007 GridPP 19 Ambleside."— Presentation transcript:
Southgrid Status 2001-2011 Pete Gronbech: 30 th August 2007 GridPP 19 Ambleside
RAL PPD 2006 installed large upgrade 200 (Opteron 270) cpu cores equiv. to and extra 260 KSI2k plus 86TB of storage. The 50TB which was loaned to RAL Tier 1, and is now being returned. 10Gb/S Connection to RAL Backbone 2007 upgrade Disk and CPU: –13 x 6TB SATA Disk servers, 3Ware RAID controllers 14 x 500GB WD disks –32 x Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM Will be installed in the Atlas Centre, due to power/cooling issues in R1 2005 30 Xeon cpus, and 6.5 TB storage supplemented by upgrade mid 2006
RAL PPD (2) Supports 22 Different VOs of which 18 have run jobs in the last year. RAL PPD has always supported a large number of VOs which has helped ensure the cluster is fully utilised. Yearly upgrades are planned for the forseable future. The current computer room will have incremental upgrades to house the increased capacity. The RAL Tier 1 computer room can be used for over flow when needed.
Status at Cambridge 2001 20 * 1.3Ghz P3 for EDG 2002 3 TB Storage 2004 CPUs: 32 * 2.8GHz Xeon 2005 DPM enabled 2006 Local computer room upgraded. Christmas 2006 32 Intel Woodcrest servers, giving 128 cpu cores equiv. to 358 KSI2k. Jun 2007 Storage upgrade of 40TB running DPM on SL4 64 bit Condor version 6.8.5 is being used
Cambridge Futures CAMGRID –430 cpus across campus, mainly running debian & Condor. Upgrades expected in 2008. Special Projects: –CAMONT VO supported at Cambridge, Oxford and Birmingham. Job submission by Karl Harrison and David Sinclair –LHCb on Windows project (Ying Ying Li) Code ported to windows –HEP 4 node cluster –MS Research Lab 4 node cluster (Windows compute cluster) Code running on a server at Oxford, expansion on OERC windows cluster
Bristol History Early days with Owen Maroney on the UK EDG testbed. Bristol started in Gridpp in July 2005 with 4 LCG service nodes & one 2-CPU WN. 2006 upgraded to 8 dual-CPU WN & more may be added. 2007 10TB storage upgrade When LCG is integrated to the Bristol HPC cluster (Blue Crystal) very soon, there will be a new CE & SE, providing access to 2048 2.6GHz cores, and it will use StoRM to make over 50TB of GPFS storage available to the Grid. This is a Cluster vision / IBM SRIF funded project. The HPC WN number should be closer to 3712 cores (96 2 x dual core Opteron (4 cores/WN) + 416 2 x quad-core opterons (8 cores / WN) ) Large water cooled computer room being built on the top floor of the physics building. Currently integrating the first phase of the HPC cluster (Baby Blue) with the LC software.
Status at Bristol Current Gridpp cluster as at August 2007
Status at Birmingham Currently SL3 with glite 3 CPUs: 28 2.0GHz Xeon (+98 800MHz ) 10TB DPM Storage service Babar Farm will be phased out as the new HPC cluster comes on line. Run Pre Production Service which is used for testing new versions of the middleware. SouthGrid Hardware support (Yves Coppens) based here.
Birmingham Futures Large SRIF funded Clustervision / IBM University HPC cluster. The name of the cluster is Blue Bear. It has 256 64-bit AMD Opteron 2.6GhZ dual core sockets (1024 processing cores) with 8.0GB each. Gridpp should get at least 10% of the cluster usage. A second phase is planned for 2008.
Early beginnings at Oxford grid.physics.ox.ac.uk circa 2000-2003 Following attendance at RAL 21-23 rd June 2000 course by Ian Fosters Globus team. Initial installations on grid test system used globus 1.1.3 installed using the globus method. (Aug - Oct 2000) Later reinstalled from Andrew McNabs RPMs 1.1.3-5 (July 2001) and 1.1.3-6 with UK host certificate (Nov 2001) grid machine modified to be front end for lhcb Monte Carlo; OpenAFS, Java, OpenPBS, Openssh installed( Nov 2001) First attempt using kickstart (RH6.2) method. Crashed with anaconda errors. –Read on TB-support mail list that kickstart method no longer supported. Decide to try manual EDG method. –Pulled all CE rpms to my NFS server. Tried simple rpm -i *.rpm which failed Converted to using the LCFG method.
Oxford goes to production status 2004-2007 Early 2004 saw the arrival of two racks of Dell equipment providing: CPUs: 80 2.8 GHz, 3.2TB of disk storage. (£60K investment (local Oxford Money)) –Compute Element 37 Worker Nodes, 74 Jobs Slots, 67 KSI2K –37 Dual 2.8GHz P4 Xeon, 2GB RAM –DPM SRM Storage Element 2 Disks servers 3.2TB Disk Space 1.6 TB DPM server – second 1.6TB DPM disk pool node. –Mon, LFC and UI nodes –GridMon Network Monitor 1Gb/s Connectivity to the Oxford Backbone –Oxford currently connected at 1Gb/s to TVM Submission from the Oxford CampusGrid via the NGS VO is possible. Working towards NGS affiliation status. Planned upgrades for 05 and 06 were hampered by lack of decent computer room with sufficient power and cooling.
Oxford Upgrade 2007 11 systems, 22 servers, 44 cpus, 176 cores. Intel 5345 clovertown cpus provide ~350KSI2K 11 servers each providing 9TB usable storage after RAID 6, total ~99TB Two racks, 4 Redundant Management Nodes, 4 PDUs, 4 UPSs
Two New Computer Rooms will provide excellent infrastructure for the future 2007-2011 The New Computer room being built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, will provide space for 55 (11KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre. This £1.5M project is funded by SRIF and a contribution of ~£200K from Oxford Physics. All new Physics HPC clusters including the Grid will be housed here when it is ready in October / November 2007.
Local Oxford DWB Computer room Completely separate from the Begroke Science park a computer room with 100KW cooling and >200KW power is being built. ~£150K Oxford Physics Money. Local Physics department Infrastructure computer room (100KW) has been agreed. Will be complete next week (Sept 2007). This will relieve local computer rooms and house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.
Summary SouthGrid is set for substantial expansion following significant infrastructure investment at all sites. Birmingham existing HEP and PPS clusters running well, new University Cluster will be utilised shortly. Bristol small cluster is stable, new University HPC cluster is starting to come on line. Cambridge cluster upgraded as part of the CamGrid SRIF3 bid. Oxford resources will be upgraded in the coming weeks being installed into the new local computer room. RAL PPD has expanded last year and this year, way above what was originally promised in the MoU. Continued yearly expansion planned. SouthGrid Striding out into the future To reach the summit of our ambitions for the Grid users of the Future !!!
Enjoy your walks, and recruit some new gridpp members