Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.

Slides:



Advertisements
Similar presentations
Liverpool HEP – Site Report May 2007 John Bland, Robert Fay.
Advertisements

UCL HEP Computing Status HEPSYSMAN, RAL,
24-Apr-03UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS SECURITY SOFTWARE MAINTENANCE BACKUP.
A couple of slides on RAL PPD Chris Brew CCLRC - RAL - SPBU - PPD.
Computing Infrastructure
Alastair Dewhurst, Dimitrios Zilaskos RAL Tier1 Acknowledgements: RAL Tier1 team, especially John Kelly and James Adams Maximising job throughput using.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
17th October 2013Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Site Report HEPHY-UIBK Austrian federated Tier 2 meeting
Site Report US CMS T2 Workshop Samir Cury on behalf of T2_BR_UERJ Team.
Tripwire Enterprise Server – Getting Started Doreen Meyer and Vincent Fox UC Davis, Information and Education Technology June 6, 2006.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
RAL PPD Site Update and other odds and ends Chris Brew.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
Cloning NT Using DriveImage Chris Brew Particle Physics Department Rutherford Appleton Laboratory rl.ac.uk.
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
RAL PPD Computing A tier 2, a tier 3 and a load of other stuff Rob Harper, June 2011.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
13th October 2011Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
1st July 2004HEPSYSMAN RAL - Oxford Site Report1 Oxford University Particle Physics Site Report Pete Gronbech Systems Manager.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
ATLAS Great Lakes Tier-2 (AGL-Tier2) Shawn McKee (for the AGL Tier2) University of Michigan US ATLAS Tier-2 Meeting at Harvard Boston, MA, August 17 th,
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Gareth Smith RAL PPD RAL PPD Site Report. Gareth Smith RAL PPD RAL Particle Physics Department Overview About 90 staff (plus ~25 visitors) Desktops mainly.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
11th October 2012Graduate Lectures1 Oxford University Particle Physics Unix Overview Pete Gronbech Senior Systems Manager and GridPP Project Manager.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
BaBar Cluster Had been unstable mainly because of failing disks Very few (
Scientific Computing in PPD and other odds and ends Chris Brew.
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
TCD Site Report Stuart Kenny*, Stephen Childs, Brian Coghlan, Geoff Quigley.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Glasgow Site Report (Group Computing)
Oxford Site Report HEPSYSMAN
QMUL Site Report by Dave Kant HEPSYSMAN Meeting /09/2019
Presentation transcript:

Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD

Chris Brew Outline Hardware –Current Grid User –New Machine Room Issues –Power, Air Conditioning & Space Plans –Tier 3 –Configuration Management –Common Backup Issues –Log processing Windows

Chris Brew Current Grid Cluster CPU: –52 x Dual Opteron 270 Dual Core CPUs, 4GB RAM –40 x Dual PIV Xeon 2.8Ghz, 2GB RAM –All running SL3 glite-WN Disk: –8 x 24 Slot dCache Pool Servers Areca ARC RAID cards 22 x WD5000YS RAID 6 (Storage) – 10TB 2 x WD1600YD RAID 1 (System) 64 bit SL4, Single large xfs file system Misc: –GridPP Front Ends running, Torque, LFC/NFS, R-GMA, dCache Head –Ex WNs running CE, DHCPD/TFTP pxeboot server Network now at 10Gb/s but external link still limited by Firewall

Chris Brew Current User Cluster User Interfaces –7 ex WNs from dual 1.4GHz PIII to dual 2.8 GHz PIV 6 x SL3 (1 test, 2 general, 3 expt) 1 SL4 test UI 2 x Dell PowerEdge 1850 Disk Servers –Dell PERC 4/DC RAID card –6 x 300GB disks in Dell PowerVault 220 SCSI shelf –Serves Home and experiment areas via NFS Master copy on one server rsync’d to backup server 1-4 times daily Home area backed up to ADS daily Same hardware as Windows solution, common spares

Chris Brew Other Miscellaneous Boxen Extra Boxes –Install/Scratch/Internal Web server –Monitoring Server –External Web Server –Minos CVS Server –NIS Master –Security Box (Central Logger and Tripwire) New Kit (undergoing burnin now) –32 x Dual Intel Woodcrest 5130 Dual Core CPUs, 8GB RAM (Streamline) –13 Viglen HS160a Disk servers

Chris Brew Machine Room Issues Too much equipment for our small departmental Computer room Taken over adjacent “Display” area –Historically part of computer room –Already has raised floor, and three phase power, though new distribution panel needed for latter –Common air conditioning with Computer Room Refurbished power distribution, installed kit and powered on: –Temp in new area rose to 26°C, temp in old area fell by 1 °C –“Consulting” engineer called in by estates to “rebalance” air conditioning. Very successful - Old/New now 21.5/22.7 °C –Also calculated total capacity of plant at 50kW of cooling currently we are using ~30kW Next step is to refurbish the power in the old machine room to reinstate the three phase supply

Chris Brew Monitoring 2 Different monitoring systems –Ganglia: Monitors per host metrics and records histories to produce graphs, good for trending and viewing current and historic status –Nagios: Monitors “services” and issues alerts, good for raising alerts and viewing “what’s currently bad”. See other talk In view of current lack of effort, program to get as much monitoring as possible in Nagios to be automatically alerted on. –Recently added alerts for SAM tests and Yumit/Patiki updates

Chris Brew Plans 1: Tier 3 Physicists seem to want access to batch other than on the grid so need to provide local access Rather then run 2 batch systems want to give local user access to Grid batch workers Need to: –Merge grid and user cluster account databases Modify YAIM to use NIS pool accounts –Change maui settings to Fairshare Grid/Non-Grid before VO before Users

Chris Brew Plans 2: cfengine Getting to be too many worker nodes to manage with current ad hoc system need to move towards a full configuration management system After asking around decide upon cfengine Test deployment promising Working on re-implementing the Worker Node install in cfengine Still need to find good solution for secure key distribution to newly installed nodes

Chris Brew Plans 3: Common Backup Current backup of important files for Unix is to the Atlas Data Store –Not sure how much longer the ADS is going to be around, need to look for another solution Was intending to look at Amanda but… –Dept bought new 30 slot tape robot for Windows Backup –Veritas Backup software in use on Windows supports Linux Clients Just starting tests on a single node. Will keep you posted.

Chris Brew Plan 4: Reliable Hardware Plan to purchase an new class of “more reliable” worker node type machines –Dual system disks in hot swap caddys –Possibly redundant hot swap power supplies Use this type of machines for running Grid services, Local services (Databases, web servers etc.) and User Interfaces

Chris Brew Issues 1: Log Processing Already running Central Syslog Server (soon to be expanded to 2 hosts for redundancy). As with our Tripwire a fairly passive system –Hope to get enough info off the system to get some useful info after the event Would like some system to monitor these logs and flag “interesting” events. Would prefer little or no training required.

Chris Brew Windows, etc. Still using Windows XP, with Office 2003 and Hummingbird eXceed –Are looking at Vista and Office 2007 but not yet seriously and have no plans for rollout yet Now managed at Business Unit level rather than department Looking for synergies between Unix and Windows support: –Common file server hardware –Common Backup Solution Recently equipped PPD Meeting room with Polycom rollabout VideoConferencing system.