Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.

Slides:



Advertisements
Similar presentations
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Advertisements

Status Report University of Bristol 3 rd GridPP Collaboration Meeting 14/15 February, 2002Marc Kelly University of Bristol 1 Marc Kelly University of Bristol.
Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Presenter Name Facility Name EDG Testbed Status Moving to Testbed Two.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
IFIN-HH LHCB GRID Activities Eduard Pauna Radu Stoica.
London Tier 2 Status Report GridPP 13, Durham, 4 th July 2005 Owen Maroney, David Colling.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
Computing/Tier 3 Status at Panjab S. Gautam, V. Bhatnagar India-CMS Meeting, Sept 27-28, 2007 Delhi University, Delhi Centre of Advanced Study in Physics,
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
30-Jun-04UCL HEP Computing Status June UCL HEP Computing Status April DESKTOPS LAPTOPS BATCH PROCESSING DEDICATED SYSTEMS GRID MAIL WEB WTS.
Federico Ruggieri INFN-CNAF GDB Meeting 10 February 2004 INFN TIER1 Status.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
28 April 2003Imperial College1 Imperial College Site Report HEP Sysman meeting 28 April 2003.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
São Paulo Regional Analysis Center SPRACE Status Report 22/Aug/2006 SPRACE Status Report 22/Aug/2006.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
RAL Site report John Gordon ITD October 1999
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
DataTAG Work Package 4 Meeting Bologna Simone Ludwig Brunel University 23rd and 24th of May 2002.
Partner Logo A Tier1 Centre at RAL and more John Gordon eScience Centre CLRC-RAL HEPiX/HEPNT - Catania 19th April 2002.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
Tier1 Status Report Andrew Sansum Service Challenge Meeting 27 January 2004.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
BaBar Cluster Had been unstable mainly because of failing disks Very few (
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
The EDG Testbed Deployment Details
IT-DB Physics Services Planning for LHC start-up
LCG Deployment in Japan
SAM at CCIN2P3 configuration issues
UK GridPP Tier-1/A Centre at CLRC
Presentation transcript:

Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003

Contents GRID Stuff – clusters and interfaces Hardware and utilisation Software and utilities

Layout

EDG Status 1 (Steve Traylen) EDG 2.0.x deployed on production testbed since early September. Provides: –EDG RGMA info catalogue: –RLS for lhcb, biom, eo, wpsix, tutor and babar EDG 2.1 deployed on dev testbed. VOMS integration work underway. May be found useful by small GRIDPP experiments (eg NA48, MICE and MINOS)

EDG Status (2) EDG 1.4 gatekeeper continues to provide gateway into main CSF production farm. Provides access for small amount of Babar and ATLAS work. Being prepared for forthcoming D0 production via SAMGrid Along with IN2P3, CSFUI provides main UI for EDG Many WP3 and WP5 mini testbeds Further GRID integration into production farm via LCG – not EDG

LCG Integration (M. Bly) LCG 0 mini testbed deployed in July LCG 0 upgraded to LCG 1 in September. Consists of: –Lcgwst regional GIIS –RB –CE, SE, UI, BDII, PROXY –Five worker nodes Soon need to make important decisions about how much hardware to deploy into LCG – whatever experiments/EB want.

LCG Experience Mainly known issues: –Installation and configuration still difficult for non experts. –Documentation still thin in many places. –Support often very helpful but answers not always forthcoming for some problems. –Not everything works – all of the time. Beginning to discuss internally how to interoperate with production farm.

SRB Service For CMS Considerable learning experience for Datastore team (and CMS)! SRB MCAT for whole CMS production. Consists of enterprise class ORACLE servers and thin” MCAT ORACLE client. SRB interface into Datastore SRB enabled disk server to handle data imports. SRB clients on disk servers for data moving

New Hardware (March) 80 Dual Processor P4 2.66GHz Xeon 11 disk servers: 40TB IDE disk –11 dual P4 servers (with PCIx), each with 2 Infortrend IFT-6300 arrays –12 Maxtor 200GB Diamondmax Plus 9 drives per array. Major Datastore upgrade over summer

P4 Operation Problematic Disappointing performance with gcc –Hope for 2.66P4/1.4P3=1.5 – see Can obtain more by exploiting hyper- threading but Linux CPU scheduling causes difficulties (ping pong effects) CPU accounting now depends on number of jobs running. Beginning to look closely at Opteron solutions.

Datastore Upgrade STK 9310 robot, 6000 slots –IBM 3590 drives being phased out (10GB 10MB/Sec) –STK 9940B drives in production (200GB 30MB/sec) 4 IBM 610+ servers with two FC connections and Gbit networking on PCI-X –9940 drives FC connected via 2 switches for redundancy –SCSI raid 5 disk with hot spare for 1.2Tbytes cache space

Switch_1Switch_2 RS6000 fsc0fsc1 fsc0 9940B fsc1fsc0fsc1fsc rmt1 rmt4rmt3rmt2 rmt5-8 AAAAAAAA STK 9310 “Powder Horn” Gbit network 1.2TB

Operating Systems Redhat 6.2 finally closed in August Redhat 7.2 remains in production for Babar. Will migrate all batch workers to Redhat 7.3 shortly. Redhat 7.3 service now main workhorse for LHC experiments. Need to start looking at Redhat 9/10 Need to deploy Redhat Advanced Server 

Next Procurement Based on experiments expected demand profile (as best they can estimate). Exact numbers still being finalised, but about: –250 dual processor CPU nodes –70TB available disk –100TB tape

CPU Requirements (KSI2K)

New Helpdesk Need to deploy new helpdesk (had Remedy). Wanted: –Web based. –Free open source. –Multiple queues and personalities. Looked at Bugzilla, OTRS and Requestracker. Finally selected request tracker. Available for other Tier 2 sites and other GRIDPP projects if needed.

YUMIT: RPM Monitoring Many nodes on the farm. Need to make sure RPMs are up to date. Wanted light-weight solution until full fabric management tools are deployed. Package written by Steve Traylen: –Yum installed on hosts –Nightly comparison with YUM database uploaded to MYSQL server. –Simple web based display utility in perl

Exception Monitoring: Nagios Already have an exception handling system (CERN’s SURE coupled with the commercial Automate). Looking at alternatives – no firm plans yet but currently looking at NAGIOS:

Summary: Outstanding Issues Many new developments and new services deployed this year. We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG, CMS DC03, SRB etc. Waiting to hear when the experiments want LCG in volume. The Pentium 4 processor is performing poorly. Redhat’s changing policy is a major concern