Tier1A Status Martin Bly 28 April 2003. CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
13th November 2002Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting University of Bristol 13 th November.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
UCL HEP Computing Status HEPSYSMAN, RAL,
24-Apr-03UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS SECURITY SOFTWARE MAINTENANCE BACKUP.
A couple of slides on RAL PPD Chris Brew CCLRC - RAL - SPBU - PPD.
PowerEdge T20 Customer Presentation. Product overview Customer benefits Use cases Summary PowerEdge T20 Overview 2 PowerEdge T20 mini tower server.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Title US-CMS User Facilities Vivian O’Dell US CMS Physics Meeting May 18, 2001.
Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Edinburgh Site Report 1 July 2004 Steve Thorn Particle Physics Experiments Group.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
Tier 1A Storage Procurement 2001/2002 Andrew Sansum CLRC eScience Centre.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005.
HEPiX/HEPNT TRIUMF,Vancouver 1 October 18, 2003 NIKHEF Site Report Paul Kuipers
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
Andrew McNabNorthGrid, GridPP8, 23 Sept 2003Slide 1 NorthGrid Status Andrew McNab High Energy Physics University of Manchester.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Linux Servers with JASMine K. Edwards, A. Kowalski, S. Philpott HEPiX May 21, 2003.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
JLab Scientific Computing: Theory HPC & Experimental Physics Thomas Jefferson National Accelerator Facility Newport News, VA Sandy Philpott.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.
2-3 April 2001HEPSYSMAN Oxford Particle Physics Site Report Pete Gronbech Systems Manager.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
Your university or experiment logo here Tier1 Deployment Steve Traylen.
RAL Site report John Gordon ITD October 1999
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
D0 Taking Stock1 By Anil Kumar CD/CSS/DSG June 06, 2005.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Sydney Region Servers. Windows 2003 Standard Configuration Able to be supported remotely Antivirus updates managed from server.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
LSST Cluster Chris Cribbs (NCSA). LSST Cluster Power edge 1855 / 1955 Power Edge 1855 (*LSST1 – LSST 4) –Duel Core Xeon 3.6GHz (*LSST1 2XDuel Core Xeon)
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
UK GridPP Tier-1/A Centre at CLRC
Presentation transcript:

Tier1A Status Martin Bly 28 April 2003

CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery: –80 dual 2.66GHz P4 Xeon –533MHz FSB, 2GB memory Next delivery expected in the summer

Operating Systems Operating Systems: –Redhat 6.2 service will close in May –Redhat 7.2 service has been in production for Babar for 6 months. –New Redhat 7.3 service now available for LHC/other experiments Increasing demands for security updates becoming problematic.

Disk Farm (last Year) Last year – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server: –Excellent performance, well balanced system –Problems with a bad batch of Maxtor drives – many failures and high error rate – all 620 drives now replaced by Maxtor. –Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.

Disk Farm (this year) Recent upgrade to disk farm. –11 dual P4 servers (with PCIx), each with 2 Infortrend IFT-6300 arrays –12 Maxtor 200GB Diamondmax Plus 9 drives per array. Not yet in production – but a few snags: –Original tendered Maxtor: Maxline Plus II drive was found not to exist. –Infortrend array has 2TB limit per RAID set – some (10%) wasted space! Nick White for more

New Projects Basic fabric performance monitoring (ganglia) Resource CPU accounting (based on PBS accounts/mysql) New CA in production New batch scheduler (MAUI) Deploy new helpdesk (May)

Ganglia Monitoring Urgently needed live performance and utilisation monitoring –RAL Ganglia Monitoring (live)RAL Ganglia Monitoring (live) –RAL Ganglia Monitoring (Static)RAL Ganglia Monitoring (Static) Scalable solution based on multicast Very rapidly deployable - reasonable support on all Tier1A Hardware See:

PBS Accounting Software Need to keep track of system CPU and disk usage. Home grown PBS accounting package (Derek Ross): –Upload PBS and disk stats into MYSQL –Process with perl DBI script –Serve via Apache Contact Derek for more

MAUI/PBS Maui scheduler has been in production for last 3 months. Allows extremely flexible scheduling with many features. But …. –Not all of it works – we have done much work with developers for fixes. –Major problem – MAUI schedules on wall clock time – not CPU time. Had to bodge it!!

New Helpdesk Software Old helpdesk mail based/unfriendly. With additional staff, urgently need to deploy new solution. Expect new system to be based on free software – probably Request Tracker Hope that deployed system will also meet needs of Testbed and may also satisfy Tier 2 sites. Expect deployment by end of May. (Static)

Outstanding Issues/worries We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG … Farm management is getting very complex. We need better tools and automation. Security Is becoming a big concern again.