OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.

Slides:



Advertisements
Similar presentations
Clara Gaspar on behalf of the LHCb Collaboration, “Physics at the LHC and Beyond”, Quy Nhon, Vietnam, August 2014 Challenges and lessons learnt LHCb Operations.
Advertisements

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 17 Scheduling III.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Experience with analysis of TPC data Marian Ivanov.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
Machine/Job Features Update Stefan Roiser. Machine/Job Features Recap Resource User Resource Provider Batch Deploy pilot Cloud Node Deploy VM Virtual.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Marco Cattaneo LHCb computing status for LHCC referees meeting 14 th June
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
LHCbComputing Resources requests : changes since LHCb-PUB (March 2013) m Assume no further reprocessing of Run I data o (In.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
AliRoot survey: Analysis P.Hristov 11/06/2013. Are you involved in analysis activities?(85.1% Yes, 14.9% No) 2 Involved since 4.5±2.4 years Dedicated.
Workflows and Data Management. Workflow and DM Run3 and after: conditions m LHCb major upgrade is for Run3 (2020 horizon)! o Luminosity x 5 ( )
LHCb Readiness for Run WLCG Workshop Okinawa
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Computing Operations Report 29 Jan – 7 June 2015 Stefan Roiser NCB 8 June 2015.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
ALICE Grid operations +some specific for T2s US-ALICE Grid operations review 7 March 2014 Latchezar Betev 1.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
ALICE Physics Data Challenge ’05 and LCG Service Challenge 3 Latchezar Betev / ALICE Geneva, 6 April 2005 LCG Storage Management Workshop.
MAUS Status A. Dobbs CM43 29 th October Contents MAUS Overview Infrastructure Geometry and CDB Detector Updates CKOV EMR KL TOF Tracker Global Tracking.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 24/05/2015.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
LHCb distributed computing during the LHC Runs 1,2 and 3
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Overview of the Belle II computing
Akiya Miyamoto KEK 1 June 2016
LHCb Software & Computing Status
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
Bernd Panzer-Steindel CERN/IT
CPU efficiency Since May CMS O+C has launched a dedicated task force to investigate CMS CPU efficiency We feel the focus is on CPU efficiency is because.
ILD Ichinoseki Meeting
Development of LHCb Computing Model F Harris
Production Manager Tools (New Architecture)
The LHCb Computing Data Challenge DC06
Presentation transcript:

OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN

CPU Time Provided by Sites Not much news: as usual T0, T1 sites + Yandex are the top providers Still some news… … a cloud site (CERN) for the first time in the top 20 contributors Was 28 th in last report … and naturally no more OFFLINE processing at the HLT farm since Run2 start 14 Sep '15 - NCBOperations Report - StR2

Job Types Distribution & CPU Efficiency Main Job Types continue to be executed with very high CPU efficiency 14 Sep '15 - NCBOperations Report - StR3 Period still dominated by Simulation Jobs DataReconstruction is picking up

Running Jobs Average of 20k running jobs, is below usual Problem with Dirac job submission now fixed Note: Same fluctuations in summer 2014, where all year average was then significantly higher 14 Sep '15 - NCBOperations Report - StR4 Avg Summer 2014

Job Success Rates Job success rate decreased over reference period to ~ 90 % (Done + Completed) Used to be at ~ 95 % for very long time Increase in “stalled” jobs b/c of Simulation productions for Trigger Upgrade studies with high μ & occupancy Failure causes are under investigation Most of requests are finished by now -> sneak preview on Run3 data processing + working on calculation queue time left problem 14 Sep '15 - NCBOperations Report - StR5 increase Stalled MC Jobs

CERN Slow Worker Nodes 14 Sep '15 - NCBOperations Report - StR6 Some CERN worker nodes process data by factors slower In general we are interested in data throughput which is OK … but additional hassle for production team to do “wait for the last job” before e.g. closing a production CERN/LSF looking into it with highest priority

Run 2 Data Validation & Production 14 Sep '15 - NCBOperations Report - StR7 Reco15/Stripping22 Reco15a/Stripping22 Reco15/Turbo01a Reco15/Turbo01b Reco15RDST Turbo01c Reco15c Reco15em/Stripping22 & Turbo01em Reco15em/Stripping22a Reco14/…/MDST01 Reco15 Reco15a Turbo01 Reco15b/Stripping23 Reco15b/Stripping23a Reco15pne Reco15c/Stripping23b (run range extension) Early Measurements 25ns ramp 41 Bookkeeping processing passes produced for Run2 data val & prod

Verification of Processed Run 2 Validation Data Quality by Analysts 4 % of files accessed during this period were done on the grid by analysts for quality checking of the validation data 14 Sep '15 - NCBOperations Report - StR8 Access of validation data

ONLINE Calibration & Turbo Stream Major change in the data processing workflow Detector calibration and alignment done in the pit … allows “physics quality” data reconstruction in the pit So produced Turbo stream for part of the HLT selected data shipped to storage sites and ready for analysis right away Until end of the year also RAW information is exported and re- constructed to check ONLINE/OFFLINE equivalence Validation of Turbo (and Turbo Validation) workflows successfully conveyed 14 Sep '15 - NCBOperations Report - StR9

Run 2 Data Processing Lessons Learnt Validation chain is essential and working very well for both infrastructure and data quality checking Fast turnaround times (O(day(s)) Very good & fluent communication between ONLINE, OFFLINE & analysts Allowed to detect issues early on Until the end of 2015 unlikely that we need T2 sites for data processing “Mesh processing” was tested and works as expected It was initially foreseen to have additional data processed at T2 sites but due to lower LHC luminosity not needed for now 14 Sep '15 - NCBOperations Report - StR10

Summary LHCb continues to use the pledged resources at very high CPU efficiency ¾ of resources used for Simulation, Data Processing increasing Currently below average usage but picking up after summer One issue with killed batch jobs is understood and worked on Lots of work in validation of Run2 data Successful validation of the distributed computing infrastructure, applications and quality of processed data Turbo data workflow successfully validated Now looking forward to moving on to stable operations … 14 Sep '15 - NCBOperations Report - StR11