Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.

Similar presentations


Presentation on theme: "RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004."— Presentation transcript:

1 RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004

2 24 May 2004CCLRC E-Science Centre2 Overview GRIDPP Tier1 Service Particle Physics Department Tier2 Grid Operations Centre (GOC) Other E-Science Systems

3 24 May 2004CCLRC E-Science Centre3 Tier1 in GRIDPP2 (2004-2007) The Tier-1 Centre will provide GRIDPP2 with a large computing resource of a scale and quality that can be categorised as an LCG Regional Computing Centre January 2004 – GRIDPP2 confirm RAL to host Tier1 Service –GRIDPP2 to commence September 2004 Tier1 Hardware budget: –£2.3M over 3 years Staff –Increase from 12.1 to 16.5 by September

4 24 May 2004CCLRC E-Science Centre4 Current Tier1 Hardware CPU –350 dual Processor Intel – PIII and Xeon servers mainly rack mounts –About 400KSI2K Disk Service – mainly “standard” configuration –Dual Processor Server –Dual channel SCSI interconnect –External IDE/SCSI RAID arrays (Accusys and Infortrend) –ATA drives (mainly Maxtor) –About 80TB disk –Cheap and (fairly) cheerful Tape Service –STK Powderhorn 9310 silo with 8 9940B drives

5 24 May 2004CCLRC E-Science Centre5 Layout

6 24 May 2004CCLRC E-Science Centre6 New Hardware Arrives 7 th June CPU Capacity (500 KSI2K) –256 dual processor 2.8GHz Xeons –2/4GB Memory –120GB HDA Disk Capacity (140TB) –Infortrend SATA/SCSI RAID Arrays –16*250GB Western Digital SATA per array –Two arrays per server

7 24 May 2004CCLRC E-Science Centre7 Development Areas Storage architecture –70 Disk Servers by July. Management becoming harder –Wish to decouple storage hardware from Middleware (which middleware) –Still using ext2/ext3 – consider alternatives … ClusterFS etc. –Maybe (at last) need a SAN of some sort –Becoming interested in iSCSI (or maybe Fibre) Fabric Management –About 800 systems by July – LCG nodes managed by LCFG but most still managed using “simple” kickstart. This is getting harder. New LAN backbone needed soon –But too early for 10 Gigabit –Maybe new backbone router or switch stack for LAN depends on iSCSI plans Simplify cluster configuration. Clean up the spaghetti diagram of services and interfaces. Upgrade from Redhat 7.3 to Redhat Enteprise??????

8 24 May 2004CCLRC E-Science Centre8 RAL PP Tier2 Run by Particle Physics Department. Acts as a peer with other UK University Tier2 systems Currently 30 Nodes Running LCG2 Hardware upgrades expected each year –Additional 24 systems and 8TB disk in July –50 CPUs and 5TB disk each year

9 24 May 2004CCLRC E-Science Centre9 GRID Operations CCLRC is involved in Grid Operations for –LCG –GridPP –NGS –CCLRC –EGEE This means different things for different grids

10 24 May 2004CCLRC E-Science Centre10 Within the scope of LCG we are responsible for monitoring how the grid is running – who is up, who is down, and why Identifying Problems, Contact the Right People, Suggest Actions Providing scalable solutions to allow other people to monitor resources Manage site Information – definitive source of information Accounting – Aggregate Job Throughput (per Site, per VO) Established at CCLRC (RAL) Status of LCG2 Grid here: http://goc.grid-support.ac.uk/ LCG GOC Monitoring

11 24 May 2004CCLRC E-Science Centre11 LCG2 CORE SITES Status: 12th May 2004 10.20 ~30 SITES http://goc.grid-support.ac.uk/

12 24 May 2004CCLRC E-Science Centre12 LCG Accounting Overview CE PBS/LSF Jobmanager Log GateKeeper Listens on port 2119 GRAM Authentication GIIS LDAP Information Server MON RGMA Database We have an accounting solution. The Accounting is provided by RGMA At each site, log-file data is processed from different sources and published into a local database.

13 24 May 2004CCLRC E-Science Centre13 LCG Accounting – How it Works GOC provides an interface to produce accounting plots “on-demand” Total Number of Jobs per VO per Site (ok) Total Number of Jobs per VO aggregated over all sites (to be done) Tailor plots according to the requirements of the user community ~ 1000 Alice Jobs Taipei Statistics Feb/Mar

14 24 May 2004CCLRC E-Science Centre14 Other Major Developments Major new scientific compute facilities: NGS: UK National Grid Service Storage Node –18TB SAN –40 CPUs –Myrinet and ORACLE SCARF: 128 Node Opteron Cluster (myrinet) –64bit Scientific Compute Service for CCLRC science Two more large 64bit Computational Chemistry service by Xmass. About 500KW equipment by Christmas 2004. Power and cooling currently under review


Download ppt "RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004."

Similar presentations


Ads by Google