UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, 23-25 February 2015.

Slides:



Advertisements
Similar presentations
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
Advertisements

Your university or experiment logo here What is it? What is it for? The Grid.
GridPP News NeSC opening “Media” dissemination Tier 1/A hardware Web pages Collaboration meetings Nick Brook University of Bristol.
S.L.LloydGridPP CB 29 Oct 2002Slide 1 Agenda 1.Introduction – Steve Lloyd 2.Minutes of Previous Meeting (23 Oct 2001) 3.Matters Arising 4.Project Leader's.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Birmingham Particle Physics Masterclass 23 th April 2008 Birmingham Particle Physics Masterclass 23 th April 2008 The Grid What & Why? Presentation by:
IPv6 testing plans 25 Jan Short term – next 6 weeks Add sites to testbed – Glasgow (DPM storage end point) – Fix DESY – Others? Is GridFTP mesh.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPiX, Oxford 24 Mar 2015.
Intro to grid computing Cristy Burne GridTalk Queen Mary University of London.
SouthGrid Status Pete Gronbech: 4 th September 2008 GridPP 21 Swansea.
UKI-SouthGrid Overview Face-2-Face Meeting Pete Gronbech SouthGrid Technical Coordinator Oxford June 2013.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Vendor Day 30 th April.
London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
SouthGrid Status Pete Gronbech: 2 nd April 2009 GridPP22 UCL.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
The production deployment of IPv6 on WLCG David Kelsey (STFC-RAL) CHEP2015, OIST, Okinawa 16 Apr 2015.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
1 st EGEE Conference – April UK and Ireland Partner Dave Kant Deputy ROC Manager.
Tier1 Status Report Martin Bly RAL 27,28 April 2005.
Organisation Management and Policy Group (MPG): Responsible for setting and policy decisions and resolving any issues concerning fractional usage, acceptable.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Southgrid Technical Meeting Pete Gronbech: 26 th August 2005 Oxford.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
UKI-SouthGrid Update Hepix Pete Gronbech SouthGrid Technical Coordinator April 2012.
25th October 2006Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar UK Physics Meeting Queen Mary, University of London 25 th October 2006.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
RAL Site Report HEPiX FAll 2014 Lincoln, Nebraska October 2014 Martin Bly, STFC-RAL.
GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside.
Your university or experiment logo here Tier1 Deployment Steve Traylen.
Your university or experiment logo here What is it? What is it for? The Grid.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team ATHIC2012, Busan,
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
UKI-SouthGrid Overview and Oxford Status Report Pete Gronbech SouthGrid Technical Coordinator HEPSYSMAN – RAL 10 th June 2010.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
The GridPP DIRAC project DIRAC for non-LHC communities.
Performance analysis extracts from GridPP OC metrics report For UKI operations meeting 15 th June 2005.
1 Update at RAL and in the Quattor community Ian Collier - RAL Tier1 HEPiX FAll 2010, Cornell.
Your university or experiment logo here User Board Glenn Patrick GridPP20, 11 March 2008.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
LHCb Grid MeetingLiverpool, UK GRID Activities Glenn Patrick Not particularly knowledgeable-just based on attending 3 meetings.  UK-HEP.
Development of a Tier-1 computing cluster at National Research Centre 'Kurchatov Institute' Igor Tkachenko on behalf of the NRC-KI Tier-1 team National.
The GridPP DIRAC project DIRAC for non-LHC communities.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
Slide § David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP delivering The UK Grid.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP33 Ambleside 22 Aug 2014.
RAL Site Report HEP SYSMAN June 2016 – RAL Gareth Smith, STFC-RAL With thanks to Martin Bly, STFC-RAL.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
WLCG IPv6 deployment strategy
LCG Service Challenge: Planning and Milestones
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
Update on Plan for KISTI-GSDC
Update from the HEPiX IPv6 WG
Bernd Panzer-Steindel CERN/IT
HEPiX IPv6 Working Group F2F Meeting
IPv6 update Duncan Rand Imperial College London
Presentation transcript:

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015

Content UK GridPP Collaboration Tier2s Status and Plans –Birmingham –Oxford RAL Tier-1 Centre –Last 12 months –Status –Plans and future Updates on IPv6 ALICE Tier-1/2 Workshop, Torino, February 2015

UK Grid Collaboration GridPP is a collaboration of 19 UK universities + RAL + CERN with the primary goal of providing computing resources to LHC particle physics experiments ALICE Tier-1/2 Workshop, Torino, February 2015

GridPP resources SiteCPU kSI2k (vs 03/2014)Storage TB (vs 03/2014) RAL (24,107)10, ,612 (10, ,390) Edinburgh21,624 (9,015)1,186 (355) Glasgow10,992 (8,472)1,402 Queen Mary, London7,946 (7,946)1,675 Imperial College, London26,872 (6,833)2,872 Lancaster7,820 (5,210)1,729 (1,135) Manchester8,920 (5,798)1,430 (951) Sheffield3,264 (2,540)521 (363) RALPP8,940 (6,200)2,686 (1,607) Royal Holloway, London6,946 (3,704)1,768 (728) Brunel, London5,412 (4,027)634 Oxford4,197 (3.107)1,219 (709) Liverpool4,337 (2,840)784 (544) Birmingham2,130 (2,130)600 (389) Cambridge932 (707)350 (278) Sussex502 (502)82 (77) Bristol2,610 (1,467)209 (120) University College, London502 (502)215 EFDA Jet332 (332)2 Durham2,491 (2,419)53 TOTAL 162,365 (91,566)29,733+13,612 (23, ,390) from Gstat – 1 st Feb 2015 ALICE Tier-1/2 Workshop, Torino, February 2015

Tier2s Status and Plans Birmingham –in the last 12 months from 816 cores to 1216 cores (8121 HS06 to HS06) from 389 TB to 600 TB –~69% of UK T2 ALICE CPU allocation –currently CREAM, ready for ARC if ALICE happy –storage for ALICE from 120 TB to 280 TB native XRootD ALICE Tier-1/2 Workshop, Torino, February 2015

Tier2s Status and Plans Birmingham –58% overall CPU fairshare for ALICE ATLAS 28%, LHCb 9%, others 5% ALICE Tier-1/2 Workshop, Torino, February 2015

Tier2s Status and Plans Birmingham –running and queued jobs since October 2014 ALICE Tier-1/2 Workshop, Torino, February 2015

Oxford –need to supplement the support given by B’gham –~31% of UK T2 ALICE CPU allocation –no storage provided Tier2s Status and Plans

Oxford –soft limit – 150 jobs no upper hard limit (if cluster is free) very efficient to fill the empty job slots –gradual move from CREAM+torque to ARC+condor two thirds of the Grid Cluster now running HT Condor behind ARC CE remaining third running legacy torque/maui driven by CREAM CE –will continue to provide only CPU ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre …in the last 12 months - Hardware –CPU: ~127k HS06 (~13k cores) –storage: ~13PB disk –FY14/15 procurement expected 6PB and 42k HS06 storage capable of both Castor and CEPH –Extra SSDs for CEPH journals 10GbE for WNs –tape: 10k slot SL8500 migration to T10KD tapes (finished – Jan 2014) –RAL Tier-1 network upgrade – new pair of routers ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre …also in the last 12 months – Middleware –migration to EMI3 – early 2014 –CREAM-CE still kept for ALICE, small VOs only ALICE – since January 2015 –FTS2 stopped – September 2014 –various developments on HT Condor, ARC –ALICE jobs increased – August – since October 2014 ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre …more in the last 12 months - Storage –Castor upgrade to new major version (2.1.14) – Jun 2014 various improvements (disk rebalancing, xroot internal protocol) upgrade complete –Castor logging system with ElasticSearch –Draining disk servers still slow – major production problem –CEPH Evaluations continue on small test cluster SSDs for journals installed on cluster nodes –mixed performance results – needs more study ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE CPU fairshare – 1.9% (2200 HS06) spare cycles up to 3500 jobs CPU efficiencies - 90 – 91% since August nd among LHC VOs 200 TB disk ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE Good use of a significant amount of opportunistic CPU In the graph below, the expectation is 100 for everyone Average for ALICE – at least 10 times the fairshare!! ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE Room to increase the share up to 4K+ jobs, but… high ALICE zombie process numbers risk to get alarms on a too high number of ‘zombies’ per WN made RAL Tier-1 to stay conservative some feedback from ALICE, but no final resolution ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE Vulnerable Apache and OpenSSL releases within ALICE software every now and then query specific ports on RAL Tier-1 hosts reports are issued for each host found vulnerable (i.e. ALICE VOBOX nodes) and actions are expected to be taken some feedback from ALICE… ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE After weeks and months of testing… …first production jobs on - Wed 8 October 2014 Smooth migration of production jobs from CREAMs to ARCs… but… ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE …ALICE SAM tests still rely on CREAM (and WMS) at RAL Not really worthy to keep 2 CREAMs and one CLUSTER node (+ another one in standby) in production (and part of the Call Out system) for ~30 jobs ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More on ALICE Deadline to get rid of CREAM not yet met (mid January, February, March 201…) The best outcome of this workshop for RAL Tier-1 –decision to terminate use of CREAM at RAL with immediate effect (or by 1 st March) –any chance? ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – immediate future Storage for ALICE –356TB (from 200TB) – by April 2015 –the extra capacity will be in Castor CPU for ALICE –10% more CPU –expect the amount of work to drop though… …as less spare CPU capacity available… …as other LHC experiments will increase their usage ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre – More Plans Deployment of a new storage endpoint –CEPH as backend –additional storage capacity could be provided but not all the traditional Grid functionality only GridFTP and xrootd (ALICE specific xrootd security model?) also S3 and Swift API access ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre - Future GridPP project –currently GridPP4+ (1 year extension of GridPP4) - until March 2016 –GridPP5 proposal underway April 2016 – March 2020 ALICE Tier-1/2 Workshop, Torino, February 2015

RAL Tier-1 Centre - Future (UK & RAL Tier-1) –by the time GridPP5 starts – ALICE supported for 3 years without any explicit funding to do so –however funding in place until March 2016 –HW, other resources available during Run2 in 2016 –the experiment features into the GridPP5 proposal ~2% of the Tier-1 and Tier-2 global requirements –do not know yet the outcome ALICE Tier-1/2 Workshop, Torino, February 2015

IPv6 Updates RAL –IPv6 new testbed separate from main production network plan to install a dual stacked UI box and a perfsonar box –requirement on all Tier1s to provide IPv6 connectivity to the perfsonar machines by April 2015 will not be met at RAL first need to move OPN links from UK Light router to the Tier-1 routers ALICE Tier-1/2 Workshop, Torino, February 2015

IPv6 Updates Birmingham –IPv6 addresses available at University level, but not yet for Tier-2 –plans to get some services running in next 6 months will greatly depend on central IT (routing etc) Oxford –IPv6 service is bandwidth limited –some test services available on either dual-stack or IPv6 only –plans for a production ready IPv6 system in ~1 year ALICE Tier-1/2 Workshop, Torino, February 2015

Any (other) questions? Thank You! ALICE Tier-1/2 Workshop, Torino, February 2015