Tier1 Status Report Martin Bly RAL 27,28 April 2005.

Slides:



Advertisements
Similar presentations
Storage Procurements Some random thoughts on getting the storage you need Martin Bly Tier1 Fabric Manager.
Advertisements

UK Status for SC3 Jeremy Coles GridPP Production Manager: Service Challenge Meeting Taipei 26 th April 2005.
Chris Brew RAL PPD Site Report Chris Brew SciTech/PPD.
Martin Bly RAL CSF Tier 1/A RAL Tier 1/A Status HEPiX-HEPNT NIKHEF, May 2003.
Tier1A Status Andrew Sansum GRIDPP 8 23 September 2003.
Martin Bly RAL Tier1/A RAL Tier1/A Site Report HEPiX-HEPNT Vancouver, October 2003.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Tier1 Site Report HEPSysMan, RAL June 2010 Martin Bly, STFC-RAL.
Tier1 - Disk Failure stats and Networking Martin Bly Tier1 Fabric Manager.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
RAL Tier1 Report Martin Bly HEPSysMan, RAL, June
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
RAL Tier 1 Site Report HEPSysMan – RAL – May 2006 Martin Bly.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
RAL Site Report Martin Bly HEPiX Fall 2009, LBL, Berkeley CA.
ScotGRID:The Scottish LHC Computing Centre Summary of the ScotGRID Project Summary of the ScotGRID Project Phase2 of the ScotGRID Project Phase2 of the.
Martin Bly RAL Tier1/A RAL Tier1/A Report HepSysMan - July 2004 Martin Bly / Andrew Sansum.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
RAL Site Report Andrew Sansum e-Science Centre, CCLRC-RAL HEPiX May 2004.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
Tier1 Site Report HEPSysMan, RAL May 2007 Martin Bly.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
Southgrid Technical Meeting Pete Gronbech: May 2005 Birmingham.
Martin Bly RAL Tier1/A Centre Preparations for the LCG Tier1 Centre at RAL LCG CERN 23/24 March 2004.
Tier1 Andrew Sansum GRIDPP 10 June GRIDPP10 June 2004Tier1A2 Production Service for HEP (PPARC) GRIDPP ( ). –“ GridPP will enable testing.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Tier1A Status Andrew Sansum 30 January Overview Systems Staff Projects.
RAL Site report John Gordon ITD October 1999
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
US ATLAS Western Tier 2 Status Report Wei Yang Nov. 30, 2007 US ATLAS Tier 2 and Tier 3 workshop at SLAC.
RAL Site Report HEPiX - Rome 3-5 April 2006 Martin Bly.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Tier1 Status Report Andrew Sansum Service Challenge Meeting 27 January 2004.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Tier1A Status Martin Bly 28 April CPU Farm Older hardware: –108 dual processors (450, 600 and 1GHz) –156 dual processor 1400MHz PIII Recent delivery:
BaBar Cluster Had been unstable mainly because of failing disks Very few (
RAL Site Report Martin Bly SLAC – October 2005.
The RAL Tier-1 and the 3D Deployment Andrew Sansum 3D Meeting 22 March 2006.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
IT-INFN-CNAF Status Update LHC-OPN Meeting INFN CNAF, December 2009 Stefano Zani 10/11/2009Stefano Zani INFN CNAF (TIER1 Staff)1.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
RAL Plans for SC2 Andrew Sansum Service Challenge Meeting 24 February 2005.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
“A Data Movement Service for the LHC”
Service Challenge 3 CERN
GridPP Tier1 Review Fabric
HEPiX IPv6 Working Group F2F Meeting
Presentation transcript:

Tier1 Status Report Martin Bly RAL 27,28 April 2005

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Topics Hardware Atlas DataStore Networking Batch services Storage Service Challenges Security

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Hardware Approximately 550 CPU nodes –~980 processors deployed in batch –Remainder are services nodes, servers etc. 220TB disk space ~ 60 servers, ~120 arrays Decommissioning –Majority of the P3/600MHz systems decommissioned Jan 05 –P3/1GHz systems to be decommissioned in July/Aug 05 after commissioning of Year 4 procurement. –Babar SUN systems decommissioned by end Feb 05 –CDF IBM systems decommissioned and sent to Oxford, Liverpool, Glasgow and London Next procurement –64bit AMD or Intel CPU nodes – power, cooling –Dual cores possibly too new –Infortrend Arrays / SATA disks / SCSI connect Future –Evaluate new disk technologies, dual core CPUs, etc.

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Atlas DataStore Evaluating new disk systems for staging cache –FC attached SATA arrays –Additional 4TB/server, 16TB total –Existing IBM/AIX servers Tape drives –Two additional 9940B drives, FC attached –1 for ADS, 1 for test CASTOR installation Developments –Evaluating a test CASTOR installation –Stress testing ADS components to prepare for Service Challenges –Planning for a new robot –Considering next generation of tape drives –SC4 (2006) requires step in cache performance –Ancillary network rationalised

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Networking Planned upgrades to Tier1 production network –Started November 04 –Based on Nortel T `stacks’ for large groups of CPU and disk server nodes (up to 8/stack, 384 ports) –High speed backbone inter-unit interconnect (40Gb/s bi- directional) within stacks –Multiple 1Gb/s uplinks aggregated to form backbone currently 2 x 1Gb/s, max 4 x 1Gb/s –Update to 10Gb/s uplinks and head node as cost falls –Uplink configuration with links to separate units within each stack and the head switch will provide resilience –Ancillary links (APCs, disk arrays) on separate network Connected to UKLight for SC2 (c.f. later) –2 x 1Gb/s links aggregated from Tier1

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Batch Services Worker node configuration based on traditional style batch workers with LCG configuration on top. –Running SL with LCG 2_4_0 –Provisioning by PXE/Kickstart –YUM/Yumit, Yaim, Sure, Nagios, Ganglia… All rack-mounted workers dual purpose, accessed via a single batch system PBS server (Torque). Scheduler (MAUI) allocates resources for LCG, Babar and other experiments using Fair Share allocations from User Board. Jobs able to spill into allocations for other experiments and from one `side’ to the other when spare capacity is available, to make best use of the capacity. Some issues with jobs that use excess memory (memory leaks) not being killed by Maui or Torque – under investigation.

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Systems Service systems migrated to SL 3 –Mail hub, NIS servers, UIs –Babar UIs configured as DNS triplet NFS / data servers –Customised RH7.n  Driver issues NFS performance of SL 3 uninspiring c/w 7.n –dCache systems at SL 3 LCG service nodes at SL 3, LCG-2_4_0 Need to migrate to LCG-2_4_0 or loose work

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Storage Moving to SRMs from NFS for data access –dCache successfully deployed in production Used by CMS, ATLAS… See talk by Derek Ross –Xrootd deployed in production Used by Babar Two `redirector’ systems handle requests –Selected by DNS pair –Hand off request to appropriate server –Reduces NFS load on disk servers Load issues with Objectivity server –Two additional servers being commissioned Project to look at SL 4 for servers –2.6 kernel, journaling file systems - ext3, XFS

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Challenges I The Service Challenges are a program infrastructure trials designed to test the LCG fabric at increasing levels of stress/capacity in the run up to LHC operation. SC2 – March/April 05: –Aim: T0->T1s aggregate of >500MB/s sustained for 2 weeks –2Gb/sec link via UKlight to CERN –RAL sustained 80MB/sec for two weeks to dedicated (non-production) dCache 11/13 gridftp servers Limited by issues with network –Internal testing reached 3.5Gb/sec (~400MB/sec) aggregate disk to disk –Aggregate to 7 participating sites: ~650MB/sec SC3 – July 05 -Tier1 expects: –CERN -> RAL at 150MB/s sustained for 1 month –T2s -> RAL (and RAL -> T2s?) at yet-to-be-defined rate Lancaster, Imperial … Some on UKlight, some via SJ4 Production phase Sept-Dec 05

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Service Challenges II SC4 - April 06 –CERN-RAL T0-T1 expects 220MB/sec sustained for one month –RAL expects T2-T1 traffic at N x 100MB/sec simultaneously. June 06 – Sept 06: production phase Longer term: –There is some as yet undefined T1 -> T1 capacity needed. This could be add 50 to 100MB/sec. –CMS production will require 800MB/s combined and sustained from batch workers to the storage systems within the Tier1. –At some point there will be a sustained double rate test – 440MB/sec T0-T1 and whatever is then needed for T2-T1. It is clear that the Tier1 will be able to keep a significant part of a 10Gb/sec link busy continuously, probably from late 2006.

27/28 April 2005 Tier1 Status Report - HEPSysMan, RAL Security The Badguys™ are out there –Users are vulnerable to loosing authentication data anywhere Still some less than ideal practices –All local privilege escalation exploits must be treated as a high priority must-fix –Continuing program of locking down and hardening exposed services and systems –You can only be more secure See talk by Roman Wartel