ETHZ, Zürich September 1st , 2016

Slides:



Advertisements
Similar presentations
Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.
Advertisements

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Site Report BEIJING-LCG2 Wenjing Wu (IHEP) 2010/11/21.
KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.
IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.
STATUS OF KISTI TIER1 Sang-Un Ahn On behalf of the GSDC Tier1 Team WLCG Management Board 18 November 2014.
RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June
Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.
Status of Tokyo LCG tier-2 center for atlas / H. Sakamoto / ISGC07 Status of Tokyo LCG Tier 2 Center for ATLAS Hiroshi Sakamoto International Center for.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.
A Nordic Tier-1 for LHC Mattias Wadenstein Systems Integrator, NDGF Grid Operations Workshop Stockholm, June the 14 th, 2007.
A Distributed Tier-1 for WLCG Michael Grønager, PhD Technical Coordinator, NDGF CHEP 2007 Victoria, September the 3 rd, 2007.
Australia Site Report Lucien Boland Goncalo Borges Sean Crosby
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
WLCG IPv6 deployment strategy
COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017
WLCG Workshop 2017 [Manchester] Operations Session Summary
Status of BESIII Distributed Computing
GRID OPERATIONS IN ROMANIA
SuperB – INFN-Bari Giacinto DONVITO.
Pete Gronbech GridPP Project Manager April 2016
Bob Ball/University of Michigan
The Beijing Tier 2: status and plans
LCG Service Challenge: Planning and Milestones
Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing
U.S. ATLAS Tier 2 Computing Center
Operations and plans - Polish sites
HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL
INFN Computing infrastructure - Workload management at the Tier-1
Andrea Chierici On behalf of INFN-T1 staff
Database Services at CERN Status Update
CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd
CREAM Status and Plans Massimo Sgaravatto – INFN Padova
Brief overview on GridICE and Ticketing System
AGLT2 Site Report Shawn McKee/University of Michigan
Vanderbilt Tier 2 Project
Update on Plan for KISTI-GSDC
Sergio Fantinel, INFN LNL/PD
Grid services for CMS at CC-IN2P3
Luca dell’Agnello INFN-CNAF
Oxford Site Report HEPSYSMAN
Edinburgh (ECDF) Update
CREAM-CE/HTCondor site
STORM & GPFS on Tier-2 Milan
Mattias Wadenstein Hepix 2010 Fall Meeting , Cornell
Статус ГРИД-кластера ИЯФ СО РАН.
BEIJING-LCG2 Site Report
Simulation use cases for T2 in ALICE
Bernd Panzer-Steindel CERN/IT
Pierre Girard ATLAS Visit
Grid Computing 6th FCPPL Workshop
High Performance Computing in Bioinformatics
LHConCRAY/ System-centric status update
CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.
Proposal: LHConCRAY operations in 2017
LHConCRAY Acceptance Tests 2017 – Configuration overview
RHUL Site Report Govind Songara, Antonio Perez,
Kashif Mohammad VIPUL DAVDA
Frontier Status Alessandro De Salvo on behalf of the Frontier group
The LHCb Computing Data Challenge DC06
Presentation transcript:

ETHZ, Zürich September 1st , 2016 CHIPP - CSCS F2F meeting ETHZ, Zürich September 1st , 2016

Tier 2 status and plans CSCS Statistics Plans Availability/Reliability Phases of Phoenix CPU usage Pledges Storage usage Open discussion: Operations Memory Usage Updates Support Changes Main Issues

Statistics

Statistics Two maintenances in one month (dCache issue) Incident/Maintenance on IB network

Statistics – CPU usage

Statistics – Storage usage (2/2) Allocated space atlas 1377.6 TB cms 1383.6 TB lhcb 789.8 TB

Operations

Operations – Updates: Phoenix Services Overview

Operations – Updates since march 2016: All partition are shared between the VOs Partition WNs CPU Cores arc01 wn[15-79] Xeon(R) CPU E5-2670 0 @ 2.60GHz 32 arc02 wn[80-128] Xeon(R) CPU E5-2680 v2 @ 2.80GHz 40 arc03 wn[129-167] Xeon(R) CPU E5-2680 v4 @ 2.40GHz 56 TOTAL 149   6080 Cluster completely re-installed with Puppet still missing VO boxes All VMs services now running on CSCS VMware infrastructure New worked nodes installed and updated to v4 New Slurm (v15.08) and ARC CE Setup. CREAM CE decommisioned

Operations – Updates since march 2016: BDII reinstalled in HA mode with Keepalived and floating Ips 3 BDII with 3 floating ip and load distribution (Round Robin), in case of failure ip will be moved to the one alive perfsonar puppetized and reinstalled Using CSCS central log/dashboards/metrics facility Mirroring to log.lcg.cscs.ch, kibana.lcg.cscs.ch and graylog.lcg.cscs.ch Local elasticsearch instances/local kibana dashboards (Demo) Allocated 1PB to dCache Integrated the storage into the CSCS SAN New CHIPPonCRAY Scratch

GPFS Reads last year

GPFS Writes last year

GPFS IOPS last year

Scratch performance during high load Insert_Footer

Operations – Main issues Inodes – April 2016 Set quota limits user based on fileset Net-IB – June Replaced few cables OFED upgraded and alined around the cluster Datagram mode enable everywhere Lcg-util/Scratch – July Moved to gfal2-util (VO related config) GPFS tuning for the new load Insert_Footer

Plans

Plans – Phases of Phoenix 2013 2014 2015 2016 2017 Phase H Phase J Phase K Phase L Phase M NOW

Plans – Pledges Plans for 2017 Need to set pledges for Phase M Storage: Today 3.4 PB Decommission ~0.8 PB Buy ~1 PB Total 3.6 PB Pledge 3.5 PB Cost ~250K CHF Compute: Today 69K HS06 Decommission 3% (~2K HS06) Buy ~12K Total 79K HS06 Pledge 62K HS06 Cost ~200K CHF Need to set pledges for Phase M Phase Compute power pledged [HS06] Storage [TB] Phase H – April 2014 26000 1800 Phase J – April 2015 35000 2300 Phase K – April 2016 49000 3070 Phase L – April 2017 62000 3500 Phase M – April 2018 72000 4000 Scratch: ~ 6 new NSD servers for Scratch (Cost ~ 40K CHF) GPFS License Cost ~ 60K CHF Network: Replacement of old IB switches (Cost ~30K CHF) VMWARE (Cost ~20K CHF) ----- Meeting Notes (08/03/16 12:04) ----- Change Phase J stoirage pledge to 2300 instead 2000

Discussion

Memory Usage If a job ask for more than 2.0G in ARC01/ARC03 does make sense that SLURM allocates 2 CPUs?

Support Changes In order to improve the support CSCS is delivering we would like to propose: Bi-weekly operations meetings (30’ phone call) Common list of ALL open issues (tickets) Discuss & re-prioritize (or even reject) requests based on priorities Closer VO-Rep to CSCS relationship Periodic 1-day visit on-site (Bern/PSI/anywhere) for CSCS to see the “world” through the VO- Rep’s eyes Strengthen the understanding of each individual’s role. Develop ways to find problems sooner Close the VO-Box discussion Please send us your feedback!

Thank you for your attention.