Presentation on theme: "Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow."— Presentation transcript:
Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow
RAL PPD Existing 30 Xeon cpus, and 6.5 TB storage supplemented by upgrade mid 2006 RALPPD installed large upgrade 200 (Opteron 270) cpu cores equiv. to and extra 260 KSI2k plus 86TB of storage. The 50TB which was loaned to RAL Tier 1, and is now being returned. 10Gb/S Connection to RAL Backbone –RAL Currently connected at 1GB/s to TVM –Will be connected at 10Gb/s to SJ5 Backbone by 01/04/2007
RAL PPD (2) Supports 22 Different VOs of which 18 have run jobs in the last year. 1,000,000 kSI2k Hours delivered in last 12 months 2007 upgrade ordered Disk and CPU: –13 x 6TB SATA Disk servers, 3Ware RAID controllers 14 x 500GB WD disks –32 x Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM Orders Placed, Delivery expected in the next 7 days Will be installed in the Atlas Centre, due to power/cooling issues in R1
Status at Cambridge Currently glite 3 on SL3 CPUs: 32 2.8GHz Xeon 3 TB Storage –DPM enabled Oct 05 Upgrade arrived Christmas 2006 32 Intel Woodcrest based servers, giving 128 cpu cores equiv. to approx 358 KSI2k. Local computer room upgraded. Storage upgrade to 40-60TB expected this summer. Condor version 6.8.4 is being used but the latest LCG updates have a dependency for condor-6.7.10-1. This development release should not be used in a production environment. LCG/glite should not be requiring this release.
Cambridge (2) CAMONT VO supported at Cambridge, Oxford and Birmingham. Job submission by Karl Harrison and David Sinclair LHCb on Windows project (Ying Ying Li) –Code ported to windows HEP 4 node cluster MS Research Lab 4 node cluster (Windows compute cluster) –Code running on a server at Oxford, possibly expansion on OERC windows cluster –Possible Bristol nodes soon
Status –All nodes running SL3.0.5 with glite 3 –DPM enabled Oct 05, LFC installed Jan 06 Existing resources –GridPP nodes plus local cluster nodes used to bring site on line. Local cluster being integrated. New resources –10TB of storage coming on line soon. –Bristol expect to have a percentage of the new Campus cluster from early 2007. Includes CPU, high quality and scratch disk resources Status at Bristol Jon Wakelin is working 50% on GPFS and Storm. IBM loan kit
Status at Birmingham Currently SL3 with glite 3 CPUs: 28 2.0GHz Xeon (+98 800MHz ) 1.9TB DPM, being replaced by new 10TB array Babar Farm starting to become unreliable due to many disk and PSU failures. Run Pre Production Service which is used for testing new versions of the middleware. Birmingham will have a percentage of the new Campus Cluster due May/June 2007. First phase: 256 nodes each with two dual core opteron CPUs.
Status at Oxford Currently glite 3 on SL305 CPUs: 80 2.8 GHz –Compute Element 37 Worker Nodes, 74 Jobs Slots, 67 KSI2K –37 Dual 2.8GHz P4 Xeon, 2GB RAM –DPM SRM Storage Element 2 Disks servers 3.2TB Disk Space 1.6 TB DPM server – second 1.6TB DPM disk pool node. Bug in DPM stopped load balancing across pools, will be fixed with the latest glite update. –Logical File Catalogue –Mon and UI nodes –GridMon Network Monitor 1Gb/s Connectivity to the Oxford Backbone –Oxford currently connected at 1Gb/s to TVM Submission from the Oxford CampusGrid via the NGS VO is possible.
Usage Oxford supports 20 VOs. 17 of which have run jobs in the last year. Most active VOs are LHCb (38.5%), Atlas (21.3%) and Biomed (21%). 300,000 kSI2k hours delivered in the last 12 months.
New Computer Room The New Computer room being built at Begbroke Science Park jointly for the Oxford Super Computer and the Physics department, will provide space for 55 (11KW) computer racks. 22 of which will be for Physics. Up to a third of these can be used for the Tier 2 centre. Disk and CPU (Planned purchase for Summer 07) 32 * Dual Intel 5150 Dual Core CPU Nodes with 8GB RAM giving 353kSI2k 10 * 12TB SATA Disks servers giving 105 TB usable (after RAID 6)Quad core CPUs will be benchmarked, both for SPEC rates and power consumption. Newer 1TB disks will be more common place by the Summer.
Oxford DWB Computer room Local Physics department Infrastructure computer room (100KW) has been agreed. Should be ready in May/June 07. This will relieve local computer rooms and possible house T2 equipment until the Begbroke room is ready. Racks that are currently in unsuitable locations can be re housed.
Other Southgrid sites Other groups within the Southgrid EGEE area are; EFDA-JET with 40 cpus up and running The Advanced Computing and Emerging Technologies (ACET) Centre, School of Systems Engineering, University of Reading started setting up their cluster in Dec 06.
SouthGrid CPU delivered SouthGrid provided 1.4MKSI2K hours in the year March 06 -07
Site Monitoring Grid wide provided monitoring –GSTAT –SAM –GOC Accounting –Steve Lloyds Atlas test page Local Site Monitoring –ganglia –pakiti –torque/maui monitoring CLIs –Investigating MonAMI Developing –Nagios; RAL PPD have developed many plugins, Other SouthGrid sites are just setting up
Summary SouthGrid continues to run well, and its resources are set to expand throughout this year. Birmingham new University Cluster will be ready in the Summer. Bristol small cluster is stable, new University cluster is starting to come on line. Cambridge cluster upgraded as part of the CamGrid SRIF3 bid. Oxford will be able to expand resources this Summer when the new computer room is built. RAL PPD has expanded last year and this year, way above what was originally promised in the MoU.