ETHZ, Zürich September 1st , 2016

Slides:

Advertisements

Similar presentations

Northgrid Status Alessandra Forti Gridpp24 RHUL 15 April 2010.

Advertisements

Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.

S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.

Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.

Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.

David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\

Oxford Update HEPix Pete Gronbech GridPP Project Manager October 2014.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Site Report BEIJING-LCG2 Wenjing Wu (IHEP) 2010/11/21.

KISTI-GSDC SITE REPORT Sang-Un Ahn, Jin Kim On the behalf of KISTI GSDC 24 March 2015 HEPiX Spring 2015 Workshop Oxford University, Oxford, UK.

IHEP(Beijing LCG2) Site Report Fazhi.Qi, Gang Chen Computing Center,IHEP.

STATUS OF KISTI TIER1 Sang-Un Ahn On behalf of the GSDC Tier1 Team WLCG Management Board 18 November 2014.

RAL PPD Tier 2 (and stuff) Site Report Rob Harper HEP SysMan 30 th June

Eygene Ryabinkin, on behalf of KI and JINR Grid teams Russian Tier-1 status report May 9th 2014, WLCG Overview Board meeting.

Status of Tokyo LCG tier-2 center for atlas / H. Sakamoto / ISGC07 Status of Tokyo LCG Tier 2 Center for ATLAS Hiroshi Sakamoto International Center for.

WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.

UK Status and Plans Catalin Condurache – STFC RAL ALICE Tier-1/Tier-2 Workshop University of Torino, February 2015.

Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.

The CMS Beijing Tier 2: Status and Application Xiaomei Zhang CMS IHEP Group Meeting December 28, 2007.

A Nordic Tier-1 for LHC Mattias Wadenstein Systems Integrator, NDGF Grid Operations Workshop Stockholm, June the 14 th, 2007.

A Distributed Tier-1 for WLCG Michael Grønager, PhD Technical Coordinator, NDGF CHEP 2007 Victoria, September the 3 rd, 2007.

Australia Site Report Lucien Boland Goncalo Borges Sean Crosby

Dynamic Extension of the INFN Tier-1 on external resources

Extending the farm to external sites: the INFN Tier-1 experience

WLCG IPv6 deployment strategy

COMPUTING FOR ALICE IN THE CZECH REPUBLIC in 2016/2017

WLCG Workshop 2017 [Manchester] Operations Session Summary

Status of BESIII Distributed Computing

GRID OPERATIONS IN ROMANIA

SuperB – INFN-Bari Giacinto DONVITO.

Pete Gronbech GridPP Project Manager April 2016

Bob Ball/University of Michigan

The Beijing Tier 2: status and plans

LCG Service Challenge: Planning and Milestones

Mattias Wadenstein Hepix 2012 Fall Meeting , Beijing

U.S. ATLAS Tier 2 Computing Center

Operations and plans - Polish sites

HEPiX Spring 2014 Annecy-le Vieux May Martin Bly, STFC-RAL

INFN Computing infrastructure - Workload management at the Tier-1

Andrea Chierici On behalf of INFN-T1 staff

Database Services at CERN Status Update

CC - IN2P3 Site Report Hepix Spring meeting 2011 Darmstadt May 3rd

CREAM Status and Plans Massimo Sgaravatto – INFN Padova

Brief overview on GridICE and Ticketing System

AGLT2 Site Report Shawn McKee/University of Michigan

Vanderbilt Tier 2 Project

Update on Plan for KISTI-GSDC

Sergio Fantinel, INFN LNL/PD

Grid services for CMS at CC-IN2P3

Luca dell’Agnello INFN-CNAF

Oxford Site Report HEPSYSMAN

Edinburgh (ECDF) Update

CREAM-CE/HTCondor site

STORM & GPFS on Tier-2 Milan

Mattias Wadenstein Hepix 2010 Fall Meeting , Cornell

Статус ГРИД-кластера ИЯФ СО РАН.

BEIJING-LCG2 Site Report

Simulation use cases for T2 in ALICE

Bernd Panzer-Steindel CERN/IT

Pierre Girard ATLAS Visit

Grid Computing 6th FCPPL Workshop

High Performance Computing in Bioinformatics

LHConCRAY/ System-centric status update

CHIPP - CSCS F2F meeting CSCS, Lugano January 25th , 2018.

Proposal: LHConCRAY operations in 2017

LHConCRAY Acceptance Tests 2017 – Configuration overview

RHUL Site Report Govind Songara, Antonio Perez,

Kashif Mohammad VIPUL DAVDA

Frontier Status Alessandro De Salvo on behalf of the Frontier group

The LHCb Computing Data Challenge DC06

Presentation transcript:

ETHZ, Zürich September 1st , 2016 CHIPP - CSCS F2F meeting ETHZ, Zürich September 1st , 2016

Tier 2 status and plans CSCS Statistics Plans Availability/Reliability Phases of Phoenix CPU usage Pledges Storage usage Open discussion: Operations Memory Usage Updates Support Changes Main Issues

Statistics

Statistics Two maintenances in one month (dCache issue) Incident/Maintenance on IB network

Statistics – CPU usage

Statistics – Storage usage (2/2) Allocated space atlas 1377.6 TB cms 1383.6 TB lhcb 789.8 TB

Operations

Operations – Updates: Phoenix Services Overview

Operations – Updates since march 2016: All partition are shared between the VOs Partition WNs CPU Cores arc01 wn[15-79] Xeon(R) CPU E5-2670 0 @ 2.60GHz 32 arc02 wn[80-128] Xeon(R) CPU E5-2680 v2 @ 2.80GHz 40 arc03 wn[129-167] Xeon(R) CPU E5-2680 v4 @ 2.40GHz 56 TOTAL 149 6080 Cluster completely re-installed with Puppet still missing VO boxes All VMs services now running on CSCS VMware infrastructure New worked nodes installed and updated to v4 New Slurm (v15.08) and ARC CE Setup. CREAM CE decommisioned

Operations – Updates since march 2016: BDII reinstalled in HA mode with Keepalived and floating Ips 3 BDII with 3 floating ip and load distribution (Round Robin), in case of failure ip will be moved to the one alive perfsonar puppetized and reinstalled Using CSCS central log/dashboards/metrics facility Mirroring to log.lcg.cscs.ch, kibana.lcg.cscs.ch and graylog.lcg.cscs.ch Local elasticsearch instances/local kibana dashboards (Demo) Allocated 1PB to dCache Integrated the storage into the CSCS SAN New CHIPPonCRAY Scratch

GPFS Reads last year

GPFS Writes last year

GPFS IOPS last year

Scratch performance during high load Insert_Footer

Operations – Main issues Inodes – April 2016 Set quota limits user based on fileset Net-IB – June Replaced few cables OFED upgraded and alined around the cluster Datagram mode enable everywhere Lcg-util/Scratch – July Moved to gfal2-util (VO related config) GPFS tuning for the new load Insert_Footer

Plans

Plans – Phases of Phoenix 2013 2014 2015 2016 2017 Phase H Phase J Phase K Phase L Phase M NOW

Plans – Pledges Plans for 2017 Need to set pledges for Phase M Storage: Today 3.4 PB Decommission ~0.8 PB Buy ~1 PB Total 3.6 PB Pledge 3.5 PB Cost ~250K CHF Compute: Today 69K HS06 Decommission 3% (~2K HS06) Buy ~12K Total 79K HS06 Pledge 62K HS06 Cost ~200K CHF Need to set pledges for Phase M Phase Compute power pledged [HS06] Storage [TB] Phase H – April 2014 26000 1800 Phase J – April 2015 35000 2300 Phase K – April 2016 49000 3070 Phase L – April 2017 62000 3500 Phase M – April 2018 72000 4000 Scratch: ~ 6 new NSD servers for Scratch (Cost ~ 40K CHF) GPFS License Cost ~ 60K CHF Network: Replacement of old IB switches (Cost ~30K CHF) VMWARE (Cost ~20K CHF) ----- Meeting Notes (08/03/16 12:04) ----- Change Phase J stoirage pledge to 2300 instead 2000

Discussion

Memory Usage If a job ask for more than 2.0G in ARC01/ARC03 does make sense that SLURM allocates 2 CPUs?

Support Changes In order to improve the support CSCS is delivering we would like to propose: Bi-weekly operations meetings (30’ phone call) Common list of ALL open issues (tickets) Discuss & re-prioritize (or even reject) requests based on priorities Closer VO-Rep to CSCS relationship Periodic 1-day visit on-site (Bern/PSI/anywhere) for CSCS to see the “world” through the VO- Rep’s eyes Strengthen the understanding of each individual’s role. Develop ways to find problems sooner Close the VO-Box discussion Please send us your feedback!

Thank you for your attention.