1. Maria Girone, CERN  Q1 2014 WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.

Slides:



Advertisements
Similar presentations
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Advertisements

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Successful Common Projects: Structures and Processes WLCG Management.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
The CMS CERN Analysis Facility (CAF) Peter Kreuzer (RWTH Aachen) - Stephen Gowdy (CERN), Jose Afonso Sanches (UERJ Brazil) on behalf.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
U.S. ATLAS Facility Planning U.S. ATLAS Tier-2 & Tier-3 Meeting at SLAC 30 November 2007.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
1. Maria Girone, CERN  Highlights of Run I  Challenges of 2015  Mitigation for Run II  Improvements in Data Management, Organized Processing and Analysis.
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Virtualization and Clouds ATLAS position
for the Offline and Computing groups
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Readiness of ATLAS Computing - A personal view
The ADC Operations Story
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
CPU efficiency Since May CMS O+C has launched a dedicated task force to investigate CMS CPU efficiency We feel the focus is on CPU efficiency is because.
The ATLAS Computing Model
Presentation transcript:

1

Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data Management, Analysis Tools evolution  Progress on the Tier-0 and AI  Data Federation  Dynamic Data Placement  CRAB3  The CMS summer 2014 Computing, Software and Analysis challenge (CSA14)  Outlook 2

Maria Girone, CERN  Resources fully used so far, with high-priority requests for selected physics results (Heavy Ion, precision W mass measurements,..) CSA14 preparation samples and detector upgrade studies  Tier-1 pledge utilization has been close to 125%  89% CPU efficiency for organized processing activities  Tier-2s are still the bulk of the first step of simulation production, and the primary analysis resource  99.8% pledge utilization  80% average CPU efficiency, factoring in the analysis efficiency as well  The opening the Tier-0 for production tasks like a Tier-1, and analysis activity like a Tier-2 have given CMS the opportunity to start commissioning the CERN AI infrastructure, building on the success of the HLT farm commissioning 3

Maria Girone, CERN  Computing is utilizing the resources for the high priority tasks of CMS  Simulation is divided into GEN-SIM, which is concentrated at Tier-2s, and reconstruction (AOD), which is currently Tier-1  Events may be reconstructed more than once for different pile-up conditions  System extremely busy with many production tasks running, all high priority!  Upgrade, W mass measurements, CSA14 sample preparation, and Higgs and Exotica Studies 650M evts/month 1B evts/month 4

Maria Girone, CERN  Excellent test of a thorough commissioning work of 6000 virtual cores set up as Openstack Cloud resources over 8 months  during the winter shutdown the HLT farm will roughly increase the Tier-1 reprocessing resources by 40%  Able to start all virtual machines on the HLT quickly compatibly with usage in inter-fill periods, which can provide some contingency with additional resources during the year The HLT, with an AI contribution, has been very successfully used for a high-priority data reprocessing of the 2011 Heavy Ion data – Made a very tight schedule of one month before Quark Matter 2014 – Whole node cloud provisioning allows to provide higher memory slots for HI – Upgraded network allows high CPU efficiency for data reprocessing using the whole farm No bottleneck reading from EOS Events Processed Heavy Ion 2011 Data HLT CPU Efficiency 5

Maria Girone, CERN  CMS computing has evolved around two concepts  Flexibility in where resources are used  We are developing the ability to share workflows across sites and access the input from remote storage  Data Federation and disk/tape separation  Dedicated cloud resources from AI (Tier-0) and HLT  Opportunistic Computing  Bootstrapping CMS environment on any resource using pilots, utilizing allocations on super computers and leadership class machines  Academic Cloud Provisioned resources  Commissioned new opportunistic cloud resource in Taiwan  Efficiency in how resources are used  Dynamic data placement and cache release  Single global queue for analysis and production with Glide-In WMS  New distributed analysis tools for users: CRAB3 6

Maria Girone, CERN Data Management milestone: 30 April 2014  Demonstrate the functionality and scale of the improved Data Management system  Disk and tape separated to manage Tier-1 disk resources and control tape access  Data federation and data access (AAA)  Analysis Milestone: 30 June 2014  Demonstrate the full production scale of the new CRAB3 distributed analysis tools  Profiting from the refactoring in the Common Analysis Framework project  Main changes are reducing the job failures in handling of user data products, improved job tracking and automatic resubmission  Organized Production Milestone: 31 October 2014  Exercise the full system for organized production  Utilize the Agile Infrastructure (IT-CC and Wigner) at 12k cores scale for the Tier-0  Run with multi-core at Tier-0 and Tier-1 for data reconstruction  Demonstrate shared workflows to improve latency and speed 7

Maria Girone, CERN  We are planning to provision the Tier-0 resources directly on OpenStack through glide-In WMS  We currently cannot reach the same ramp-up performance as the HLT due to a high rate of VMs failures on AI  Discussion planned with AI experts  Wigner CPU efficiency has been measured for both CMS production and analysis jobs  We see only a small decrease in CPU efficiency in I/O bound applications reading data from EOS Meyrin  CMSSW is optimized for high latency I/O performance using advanced TTree secondary caching techniques Requesting 100 VMs of various flavors Analysis jobs: 6% drop 8

Maria Girone, CERN  Small penalty in CPU efficiency for analysis jobs that read data remotely using data federation  Provide access to all CMS data for analysis over the wide area through Xrootd  access 20% of data across wide area, 60k files/day, O(100TB)/day  File-opening test complete  Sustained rate is 200Hz – much higher than what we expect in production use  File-reading test: we see about 400MB/s (35TB/day) read over the wide area  Thousands of active transfers Purely Local Reading Purely Federation Reading 9

Maria Girone, CERN  CMS would like to make use of data federations for reprocessing use cases  Allowing shared production workflows to use CPU at multiple sites and data served over the data federation FNALCNAF  We had an unintended proof of principle. FNAL accidently pointed to an obsolete DB and production traffic failed over to Xrootd  Data was read from CNAF disk at 400MB/s for 2 days  Regular tests have been added to our 2014 schedule, also during CSA14; larger reprocessing campaigns will be executed in Q3-Q

Maria Girone, CERN  The development of dynamic data placement and clean-up is progressing  Prototype placement and clean-up algorithms are in place  The current CSA14 and upgrade samples are placed at Tier-2 sites automatically and the centrally controlled space is under automatic management  Next step is to more aggressively clear the caches of underutilized samples  Will be exercised during the CSA14 challenge and monitor that we improve the storage utilization efficiency  Due to effort redirection in CERN-IT, CMS has been building expertize  Exploring with ATLAS on expanding the scope of central operations 11

Maria Girone, CERN  CRAB3 is currently in integration and validation tests before being handed to users for CSA14  Improved elements for job resubmission and task completion, output data handling and monitoring  20k running cores have been achieved in the validation, production target is 50% higher  Physics power users are expected to give feedback this month  Underlying services have also been updated  FTS3 for output file handling  Central Condor queue for scheduling 12

Maria Girone, CERN  CMS is preparing for nominal LHC beam conditions of Run II  Unique opportunity for early discovery we need to make sure that all the key components will be in place and working since the very beginning, to ensure this CMS is performing a Computing Software and Analysis Challenge in July and August  Users will perform analysis exclusively using CRAB3 and Data Federation and Dynamic Data techniques  Production will exercise access through data federation  System performance will be monitored  Ongoing preparatory work, including samples preparation for 2B events at 50ns and =40 & 20 and 25ns and =20 13

Maria Girone, CERN 14  We believe we are on schedule for the improvement work to allow CMS to function in the challenging 2015 environment and within the resources requested  The schedule had little contingency, but we are making steady progress  The CSA14 challenge will be important to expose the users to new data access techniques and new analysis tools  Effort to sustain the progress on improvements continues to be an issue  CMS has a brainstorming workshop in mid-June to discuss Upgrade activities  There are innovative ideas, but a coherent direction is still forming

Maria Girone, CERN 15 Backup Slides

Maria Girone, CERN 16