Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.

Slides:



Advertisements
Similar presentations
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.
Status of 2015 pledges 2016 requests RRB Report Concezio Bozzi INFN Ferrara LHCb NCB, November 3 rd 2014.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting September 22, 2015.
1 M. Paganoni, HCP2007 Computing tools and analysis architectures: the CMS computing strategy M. Paganoni HCP2007 La Biodola, 23/5/2007.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
1 Evolving ATLAS Computing Model and Requirements Michael Ernst, BNL With slides from Borut Kersevan and Karsten Koeneke U.S. ATLAS Distributed Facilities.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
Claudio Grandi INFN Bologna CERN - WLCG Workshop 13 November 2008 CMS - Plan for shutdown and data-taking preparation Claudio Grandi Outline: Global Runs.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
LHC Computing, CERN, & Federated Identities
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
ATLAS Distributed Computing perspectives for Run-2 Simone Campana CERN-IT/SDC on behalf of ADC.
Predrag Buncic CERN ALICE Status Report LHCC Referee Meeting 01/12/2015.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Alessandro De Salvo CCR Workshop, ATLAS Computing Alessandro De Salvo CCR Workshop,
Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Predrag Buncic CERN Plans for Run2 and the ALICE upgrade in Run3 ALICE Tier-1/Tier-2 Workshop February 2015.
DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.
LHCb LHCb GRID SOLUTION TM Recent and planned changes to the LHCb computing model Marco Cattaneo, Philippe Charpentier, Peter Clarke, Stefan Roiser.
1. Maria Girone, CERN  Highlights of Run I  Challenges of 2015  Mitigation for Run II  Improvements in Data Management, Organized Processing and Analysis.
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
Federating Data in the ALICE Experiment
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
evoluzione modello per Run3 LHC
for the Offline and Computing groups
LHCb Software & Computing Status
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
CPU efficiency Since May CMS O+C has launched a dedicated task force to investigate CMS CPU efficiency We feel the focus is on CPU efficiency is because.
The ATLAS Computing Model
Presentation transcript:

Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7

Winding down Run 1 physics  Mostly ‘discovery physics’ groups (Higgs, SUSY, exotica) winding down publication Run 1 publications this year  Others (standard model, b- physics, heavy ions) continue to publish Run 1 physics into 2015  No remaining large MC requests or reprocessing  No major software developments  Continued need or analysis resources into 2015 Status of CMS 2 Time # of publications

LHC schedule (as of today) Status of CMS 3 LHC_Schedule_2015.pdf  June 1: Start of collisions for physics  3 weeks of 50ns bunch spacing (as in Run 1)  Few week technical stop for “beam scrubbing”  6 25 ns, large β*  3 week technical stop / special runs  7 25ns, nominal β*  1 month heavy-ion run (PbPb)

Run 2 physics goals  Energy = 13 TeV (still TBC)  Modest upgrades for CMS from LS1: muon coverage completed, new beampipe, etc.  Discovery potential depends on channel  specifically 13/8 TeV cross section  E.g., high mass dijet resonances will be interesting from Day 1  Precision Higgs meas. by the end of Run 2  In the early days will have to verify SM predictions at new energy Status of CMS 4

Challenges for Run 2  As for all expt’s, data volumes increasing faster than resources (Moore’s law, flat budgets, etc.)  Conditions are becoming more demanding, e.g, larger pile-up, with sizeable component coming ‘out-of-time’ at 25 ns  LS1 was dedicated to evolution of CMS computing model and software to meet these needs Status of CMS 5

CSA14  Computing, Software and Analysis challenge: large-scale tests of complete data processing, software and analysis chain  Injection of o large samples of simulated events, analyzed using new computing tools and data access techniques  Samples include different bunch spacing and PU conditions  Spanned July – September, with > 150 users Status of CMS 6

2014 Computing Milestones Data Management milestone: 30 April 2014 o More transparent data access Disk and tape separated to manage Tier-1 disk resources and control tape access Data federation and data access (AAA) Developing Dynamic Data Placement for handling centrally managed disk space Analysis Milestone: 30 June 2014 o Demonstrate the full scale of the new CRAB3 distributed analysis tool Reduce job failures in handling of data, improved job tracking and automatic resubmission  Organized Production Milestone: Ongoing… o Exercise the full system for organized production Cloud-based Tier-0 using the Agile Infrastructure (IT-CC and Wigner) Run with multi-core at Tier-0 and Tier-1 for data reconstruction 7 Done! Status of CMS

Data Federation  Relax paradigm of data locality, taking advantage of better bandwidth reliability than anticipated: Any data, Anytime, Anywhere (AAA)  Implementation: o xrootd data federation o ‘Fall-back’ mechanism o Disk/tape separation at T1s  Advantages o Slight loss of CPU efficiency, but overall more efficient use of resources o Sites w/ storage failures can continue to operate o Can imagine disk-less T2s Status of CMS 8 An early example: “Legacy” reprocessing of 2012 data samples.

Federation scalability Status of CMS 9 GRIF_LLR scaling very well at least up to 800 jobs Strategy to optimize performance depends storage system, i.e., may be different at our T2s (DPM) vs T1 (dcache)

Dynamic Data Management  CMS manages 100 Pb of disk at 50 computing centers  ~ 100k datasets (mostly MC), which were distributed essentially by hand  Idea: Instead use data popularity to determine dataset replication and deletion o Release least popular cached copies once 90% of space is used, until 80% usage o Create new copies when data becomes popular  CMSSW reports on file-level access as of version 6 Status of CMS 10

CRAB3  New grid submission tool enables option to ignore data locality, i.e., use AAA  Tested by artificially forcing jobs to run w/ remote access  During CSA14: 20k cores in production, 200k jobs/day, average of 300 users/week  Improves handling of read failures and monitoring Status of CMS 11 CMS Remote Analysis Builder

A new data-tier: mini-AOD  New thin data tier  10x compression, kB/event  Process 2B events w/ 100 slots in 24h  800M events produced in CSA14  Update centrally ~ monthly Status of CMS 12

Multicore  Full thread-safety achieved in CMS software version 7  Different levels of concurrency: Module and sub-module level  TBB: threaded building blocks  Multi-core CMSSW available in nightly builds since July  Reconstruction using threads tested at Tier0  Work ongoing on simulation  Performance bottleneck o Modules that must be run sequentially o Lumi block and run boundaries  Substantial gains in memory consumption and network load Status of CMS 13 ACAT2014 talk, Sexton-Kennedy

Multicore deployment  Scaling to large-scale multi-core production not trivial  Production tool (WMAgent) still being tested  Many challenges: Pilot optimization, job monitoring, etc. Status of CMS 14 Muti-core pilots at the CCIN2P3 T1 I believe testing is also underway at GRIF Follow the projection here: CMSMulticoreSchedulingProject CMSMulticoreSchedulingProject

 With 10k cores, the HLT farm represents a massive resource  Implemented cloud middleware (OpenStack) to use HLT during downtime  Heavy ion reprocessing campaign in April 2014 successful test case  Steady use for production afterwards  Use of HLT during inter-fill periods under investigation HLT as a cloud resource 15 Status of CMS

Prompt Reco on the cloud  Prompt reco requires nearly all of CPU resources and a good fraction of T1 CPU  Move to a Openstack based Cloud-like virtualized resources located at CERN (1/3 CPUs, 2/3 disk storage) and Wigner (2/3 CPUs, 1/3 disk storage)  Based on the Agile Infrastructure  New method of resource allocation: GlideinWMS  Work underway to commission this system Status of CMS 16

Towards physics  1B events simulated by June 1 st  Digitized with both 25ns and 50ns bunch spacing  Reprocessing during technical stops, e.g., to update o Machine parameters o Alignment calibrations o Reconstruction developments  4B MC events by end of 2015 Status of CMS 17

Resource Utilization Status of CMS 18 Utilization beyond WLCG pledges Main usage: Mostly reprocessing Now open for user analysis jobs Mostly also beyond WLCG pledges Main usage: Analysis and MC production Tier 1 Tier 2

Resource demands Status of CMS T1 pledge breakdown Evolution of resources demands

Conclusions  Run 2 is around the corner  Presents new challenges for computing  Lots of effort went into preparing for this challenge during LS1 o Data federation o Multicore processing o New grid submission tools o Use of cloud resources  Large scale production of simulation is underway  Plenty of work left to in terms of commissioning Status of CMS 20