Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Distributed Xrootd Derek Weitzel & Brian Bockelman.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
CMS Diverse use of clouds David Colling
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
Status of CMS Matthew Nguyen Recontres LCG-France December 1 st, 2014 *Mostly based on information from CMS Offline & Computing Week November 3-7.
ALICE data access WLCG data WG revival 4 October 2013.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
DONE-08 Sizing and Performance Tuning N-Tier Applications Mike Furgal Performance Manager Progress Software
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
US-CMS T2 Centers US-CMS Tier 2 Report Patricia McBride Fermilab GDB Meeting August 31, 2007 Triumf - Vancouver.
LHCbComputing LHCC status report. Operations June 2014 to September m Running jobs by activity o Montecarlo simulation continues as main activity.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Analysis Performance and I/O Optimization Jack Cranshaw, Argonne National Lab October 11, 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
If you have a transaction processing system, John Meisenbacher
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Any Data, Anytime, Anywhere Dan Bradley representing the AAA Team At OSG All Hands Meeting March 2013, Indianapolis.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
Using the CMS Higher Level Trigger Farm as a Cloud Resource David Colling Imperial College London.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
DATA ACCESS and DATA MANAGEMENT CHALLENGES in CMS.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
1. Maria Girone, CERN  Highlights of Run I  Challenges of 2015  Mitigation for Run II  Improvements in Data Management, Organized Processing and Analysis.
Maria Girone, CERN CMS Status Report Maria Girone, CERN David Lange, LLNL.
Getting the Most out of Scientific Computing Resources
Dynamic Extension of the INFN Tier-1 on external resources
Atlas IO improvements and Future prospects
Getting the Most out of Scientific Computing Resources
Operating System.
for the Offline and Computing groups
The ADC Operations Story
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Thoughts on Computing Upgrade Activities
Ákos Frohner EGEE'08 September 2008
ALICE Computing Model in Run3
Grid Canada Testbed using HEP applications
Any Data, Anytime, Anywhere
Presentation transcript:

Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT  Wigner  AAA  Outlook

Maria Girone, CERN  What follows is the work of a number of people  Optimization and development by CMS Offline (Especially Brian Bockelman)  More details at ‘Optimizing High-Latency I/O in CMSSW’, CHEP2013  CMS Commissioning and Operations (especially David Colling, James Letts, Nicolò Magini, Ian Fisk and Daniele Bonacorsi)  All the work from the CMS data federation activity (especially Ken Bloom)

Maria Girone, CERN  Even two halves of the Tier-0 are separated by 30ms  To efficiently read data over high latency connections, CMS has invested effort in the last 3 years in code improvements and I/O optimizations using ROOT best practices and other additional improvements 130ms 250ms 300ms

Maria Girone, CERN  CMS has built on the work of the ROOT team in TTreeCache  ROOT does a good job in reading ahead objects/branches that are frequently accessed in a local environment using a cache  The cache training techniques do not work well in a high latency environment as during the training (first 20 events) every object is read with a separate network access  A typical file format in CMS has more than 1000 branches; if you read half of them at 130ms latency then the first event took longer than a minute to read (20 minutes for the training)

Maria Girone, CERN  The two highest impact techniques used by CMS both rely on making a secondary TTreeCache with a small number of calls  To speed up the cache training CMS loads all branches for the first 20 events in 1 call  Loading them into a (<20MB) memory cache and using this for access  More data than needed is read for the first 20 events, but many fewer calls than getting individual baskets from every branch

Maria Girone, CERN  A common analysis application is reading only some of the branches to select (trigger branches) and then write out all the objects for a limited number of events  The default TTreeCache deals well with the triggering quantities, but issues single reads for each non- trigger branch that was not in the cache  When CMSSW requests a non-trigger branch, instead of passing the read request to ROOT, creates a temporary TTreeCache and trains it for the non-trigger branches  The temporary TTreeCache will fetch objects from each branch in one network request 1000 branches unused and uncached, but written if there is a selection Used for selectio n

Maria Girone, CERN  CMS ran 2 tests reading 30% of the branches for every event and then selecting every 50 th event selecting all branches  This is a sparse selection skimming application, I/O intensive  The tests were run reading 1000 events from a local server (0.3msRTT) and from CERN servers from Nebraska (137ms RTT)  The results of removing the most significant improvements are shown in the table. Results are normalized to the local read with all improvements  The startup optimization even gives a 20% improvement even in the local environment

Maria Girone, CERN  We use CPU efficiency for demonstrating how well resources are used  It is not a perfect metric, because it’s coupled with the speed of the CPU  A high value only tells that you have sufficient bandwidth for the particular resource being measured  A low value tells us the application is waiting for data  We look at 3 environments  HLT – CERN P5- 60Gbps link P5 to CC (low latency, 0.7ms)  Wigner – Budapest – 200Gbps link Wigner to CC (Medium Latency, 30ms)  AAA (with large variations of latency up to 300ms)

Maria Girone, CERN  The Higher Level Trigger farm is 6k cores  By k-15k cores, representing about 40% of the size of the total Tier-1 resources  Same order as Tier-0 AI  It’s a resource fully available only during shutdown periods and fully configured as an OpenStack Cloud  We will try to use opportunistically during the year during inter-fill and machine studies CPU efficiency close to 100% for reconstruction applications

Maria Girone, CERN  CERN split the Tier-0 between 2 physical centers  CC in Meyrin and Wigner in Budapest  Physical disk resources are currently located entirely in Meyrin, so any CPU in Wigner reads with 30ms latency  We measure the results with the dashboard logs, comparing similar applications running at CERN and at Wigner  Same OS (SLC6) and all virtual machines Production jobs (little data read): 3% increase Analysis jobs: 6% drop

Maria Girone, CERN  US-CMS pioneered this effort, with the “Any data, anywhere, anytime (AAA)” project  Federation should allow the sharing of data serving to processing resources across sites  By Summer 2014, CMS will complete the deployment and testing of the data federation in preparation for Run2  All Tier1s and 90% of Tier2s serving data  Nearly all files from data collected or derived in 2015 should be accessible interactively  Scale tests currently ongoing: file opening at 250Hz! Now moving to file reading scale tests Desired capability: access 20% of data across wide area; 200k jobs/day, 60k files/day, O(100TB)/day

Maria Girone, CERN  About 7% of CMS analysis jobs read data (at least one file) over the WAN during 2013  Includes jobs deliberately “overflowed” to other sites, as well as “fallback” for locally unreadable files

Maria Girone, CERN  Slight efficiency cost for remote access  However, “fallback” saves jobs which otherwise would have failed and wasted CPU time…  And “overflow” uses CPU cycles which may have otherwise sat idle, allowing tasks to be completed more quickly

Maria Girone, CERN  CMSSW makes use of advanced caching techniques (available to the community)  The first technique takes advantage of understanding how ROOT works in a high latency environment  The second technique takes advantage of understanding the data access pattern of CMS analysis applications  These improvements have allowed CMS to have good CPU efficiencies reading in environments with O(100ms) latencies (AAA, within regions, …)  We are confident we can increase the scale of remote data access over the WAN to the targeted 20%  CMS would like to make use of data federations for production use cases too  Allowing shared production workflows to use CPU at multiple sites and data served over the data federation  CMS would like Data Federation to be an enabling technology for physics discovery in 2015

Maria Girone, CERN

 Assuming mostly local access  CMS refreshes the data samples roughly twice per year. In a nominal Tier-2 site there is ~1PB of disk space. To refresh a 1PB disk requires access to nearly all a 10GB/s link for 10 days  A user or group in CMS frequently requests samples of O(10TB). If we say a user is willing to wait 24 hours for the transfer to complete, it also requires ~2.5Gb/s of networking. A nominal Tier-2 supports 40 users, and if each user only made one request per business day per month, on average 5Gb/s would be needed  Assuming some provisioning factor, we can say a nominal Tier-2 site in CMS needs 10Gb/s of networking for 14kHS06 of processing and 1PB of disk. If we say each of the 7 Tier-1s should support 5 nominal Tier-2s, the export rate from Tier-1 should evolve to 50Gb/s during Run2

Maria Girone, CERN  Assuming data federation access  Analysis applications average read rate is 300kB/s  Averaged over every thing users submit to the grid  Reconstruction Applications are similar including RAW data, code, and conditions  A nominal Tier-2 in 2015 has 2k cores. For analysis the average reading rate is 300kB/s. If half the site was performing analysis, ~2.5Gb/s are needed to keep 1000 cores busy