BaBar Cluster Had been unstable mainly because of failing disks Very few (<20 out of 120) healthy workers nodes left Many workers died during two shut.

Slides:



Advertisements
Similar presentations
Southgrid Status Pete Gronbech: 21 st March 2007 GridPP 18 Glasgow.
Advertisements

SouthGrid Status Pete Gronbech: 12 th March 2008 GridPP 20 Dublin.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Birmingham site report Lawrie Lowe: System Manager Yves Coppens: SouthGrid support HEP System Managers’ Meeting, RAL, May 2007.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Northgrid Status Alessandra Forti Gridpp25 Ambleside 25 August 2010.
Site Report HEPHY-UIBK Austrian federated Tier 2 meeting
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Cambridge Site Report Cambridge Site Report HEP SYSMAN, RAL th June 2010 Santanu Das Cavendish Laboratory, Cambridge Santanu.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
UCL Site Report Ben Waugh HepSysMan, 22 May 2007.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator GridPP 24 - RHUL 15 th April 2010.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
RAL PPD Site Update and other odds and ends Chris Brew.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
Winnie Lacesso Bristol Storage June DPM LCG Storage lcgse01 = DPM built in 2005 by Yves Coppens & Pete Gronbech SuperMicro X5DPAGG (Streamline.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
GridKa SC4 Tier2 Workshop – Sep , Warsaw Tier2 Site.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPIX 2009 Umea, Sweden 26 th May 2009.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN RAL 30 th June 2009.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
Tier1 Site Report HEPSysMan, RAL May 2007 Martin Bly.
THE UNIVERSITY OF MELBOURNE Melbourne, Australia ATLAS Tier 2 Site Status Report Marco La Rosa
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
HEPSYSMAN May 2007 Oxford & SouthGrid Computing Status (Ian McArthur), Pete Gronbech May 2007 Physics IT Services PP Computing.
Grid DESY Andreas Gellrich DESY EGEE ROC DECH Meeting FZ Karlsruhe, 22./
LCG Storage Accounting John Gordon CCLRC – RAL LCG Grid Deployment Board September 2006.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
EGEE is a project funded by the European Union under contract IST VO box: Experiment requirements and LCG prototype Operations.
Status of India CMS Grid Computing Facility (T2-IN-TIFR) Rajesh Babu Muda TIFR, Mumbai On behalf of IndiaCMS T2 Team July 28, 20111Status of India CMS.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
RALPP Site Report HEP Sys Man, 11 th May 2012 Rob Harper.
Evangelos Markatos and Charalampos Gkikas FORTH-ICS Athens, th Mar Institute of Computer Science - FORTH Christos.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Cambridge Site Report John Hill 20 June 20131SouthGrid Face to Face.
Pete Gronbech GridPP Project Manager April 2016
Moroccan Grid Infrastructure MaGrid
October 28, 2013 at 14th CERN-Korea Committee, Geneva
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
Update on Plan for KISTI-GSDC
Oxford Site Report HEPSYSMAN
Christof Hanke, HEPIX Spring Meeting 2008, CERN
Presentation transcript:

BaBar Cluster Had been unstable mainly because of failing disks Very few (<20 out of 120) healthy workers nodes left Many workers died during two shut downs ( no power to motherboards?) Very time consuming to maintain Recently purchased 4 twin Viglen quad core workers – two will go to the grid (2 Twin quad core nodes = 3 racks with 120 nodes! )  BaBar cluster withdrawn from the Grid as effort better spent getting new resources online

Atlas Farm Added 12 local workers to the grid 21 workers in total -> 42 job slots Will provides 60 jobs slots after local twin boxes are installed Upgraded to SL4 Installation with kickstart / Cfengine, maintained with Cfengine VOS: alice atlas babar biomed calice camont cms dteam fusion gridpp hone ilc lhcb ngs.ac.uk ops vo.southgrid.ac.uk zeus Several broken CPUs fans are being replaced Monitoring (pakiti, nagios) is being moved from BaBar farm node to Mon Box

Grid Storage 1 DPM SL 3 head node with 10 TB attached to it Mainly dedicated to Atlas – no used by Alice but... Latest SL4 DPM provides xrootd needed by Alice Have just bought an extra 40 TB Upgrade strategy: current DPM head node will be migrated to new SL4 server, then a DPM pool node will be deployed on new DPM head node Performance issues with deleting files on ext3 fs were observed -> Should we move to XFS? SRMv2.2 with 3TB space token reservation for Atlas published Latest srmv2.2 clients (not in gLite yet) installed on BlueBear UI but not on PP desktops

eScience Cluster 31 nodes (servers included) with 2 Xeon CPU 3.06GHz and 2GB of RAM hosted by IS All on a private network but one NAT node Torque server on private network Connected to the grid via SL4 CE in Physics – more testing needed Serves as model for gLite deployment on BlueBear cluster -> installation assume no root access to workers Setup use gLite relocatable distribution and is similar to the installation we performed on BlueCrystal in Bristol Aimed to have it passing SAM test by GridPP20, but may not meet target as delayed by security challenge and helping with setting up Atlas on BlueBear Software area is not large enough to meet Atlas 100GB requirement :( ~150 cores will be allocated to Grid on BlueBear

Issues? How can SouthGrid become more pro-active with VOs (Atlas)? Alice is very specific with its VOBOX. Will need to publish arch type soon, eSc nodes are 32 bits and BlueBear runs x86_64 SL4 -> Can we publish info with one CE? Considering importance of accounting, do we need independent cross- checks? Manpower issues supporting APEL? Bham PPS nodes are broken -> PPS service suspended :( What strategy should SouthGrid adopt (PPS needs to do 64 bit testing) ?