LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.

LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015

LHCb Computing Model (TDR) m RAW data: 1 copy at CERN, 1 copy distributed (6 Tier1s) o First pass reconstruction runs democratically at CERN+Tier1s o End of year reprocessing of complete year’s dataset P Also at CERN+Tier1s m Each reconstruction followed by “stripping” pass P Event selections by physics groups, several 1000s selections in ~10 streams P Further stripping passes scheduled as needed m Stripped DSTs distributed to CERN and all 6 Tier1s o Input to user analysis and further centralised processing by analysis working groups P User analysis runs at any Tier1 P Users do not have access to RAW data or unstripped Reconstruction output m All Disk located at CERN and Tier1s o Tier2s dedicated to simulation P And analysis jobs requiring no input data o Simulation DSTs copied back to CERN and 3 Tier1s 2

Problems with TDR model m Tier1 CPU power sized for end of year reprocessing o Large peaks, increasing with accumulated luminosity m Processing model makes inflexible use of CPU resources o Only simulation can run anywhere m Data management model very demanding on storage space o All sites treated equally, regardless of available space 3

Changes to processing model in 2012 m In 2012, doubling of integrated luminosity c.f. 2011 o New model required to avoid doubling Tier1 power m Allow reconstruction jobs to be executed on a selected number of Tier2 sites o Download the RAW file (3GB) from a Tier1 storage o Run the reconstruction job at the Tier2 site (~ 24 hours) o Upload the Reco output file to the same T1 storage m Rethink first pass reconstruction & reprocessing strategy o First pass processing mainly for monitoring and calibration P Used also for fast availability of data for ‘discovery’ physics o Reduce first pass to < 30% of RAW data bandwidth P Used exclusively to obtain final calibrations within 2-4 weeks o Process full bandwidth with 2-4 weeks delay P Makes full dataset available for precision physics without need for end of year reprocessing 4

“Reco14” processing of 2012 and 2011 data 45 % of reconstruction CPU time provided by 44 additional Tier2 sites But also outside WLCG (Yandex) 5 “Stop and Go” for 2012 data as it needed to wait calibration data from the first pass processing. Power required for continuous processing of 2012 data roughly equivalent to power required for reprocessing of 2011 data at end of year 2012 data 2011 data

2015: suppression of reprocessing m During LS1, major redesign of LHCb HLT system o HLT1 (displaced vertices) will run in real time o HLT2 (physics selections) deferred by several hours P Run continuous calibration in the Online farm to allow use of calibrated PID information in HLT2 selections P HLT2 reconstruction becomes very similar to offline m Automated validation of online calibration for use offline o Includes validation of alignment o Removes need for “first pass” reconstruction m Green light from validation triggers ‘final’ reconstruction o Foresee up to two weeks’ delay to allow correction of any problems flagged by automatic validation o No end of year reprocessing P Just restripping m If insufficient resources, foresee to ‘park’ a fraction of the data for processing after the run o Unlikely to be needed before 2017 but commissioned from the start 6

Going beyond the Grid paradigm o In 2014, ~10% of CPU resources from LHCb HLT and Yandex farms o Vac infrastructure P Virtual machines created and contextualised for virtual organisations by remote resource providers o Clouds P Virtual machines running on cloud infrastructures collecting jobs from the LHCb central task queue o Volunteer computing P Use the BOINC infrastructure to enable payload execution on arbitrary compute resources 7 DIRAC allows easy integration of non WLCG resources

Changes to data management model m Increases in trigger rate and expanded physics programme put strong pressure on storage resources m Tape shortages mitigated by reduction in archive volume o Archives of all derived data exist as single tape copy P Forced to accept risk of data loss o Re-introduce a second tape copy in Run2, to cope with data preservation “obligations” P Re-generation in case of data loss is an operational nightmare and an overload of computing resources m Disk shortages addressed by o Introduction of Disk at Tier 2 o Reduction of event size in derived data formats o Changes to data replication and data placement policies o Measurement of data popularity to guide decisions on replica removals 8

Tier2Ds m Tier2Ds are a limited set of Tier2 sites which are allowed to provide disk capacity for LHCb o Introduced in 2013 to circumvent shortfall of disk storage P To provide disk storage for physics analysis files (MC and data) P Run user analysis jobs on the data stored at the sites m Blurs even more functional distinction between Tier1 and Tier2 o A large Tier2D is a small Tier1 without Tape m Status (Jan 18 th 2015): 2.4 PB available, 0.83 PB used 9 Monte Carlo Real data

Data Formats m Highly centralised LHCb data processing model allows to optimise data formats for operation efficiency m Large shortfalls in disk and tape storage (due to larger trigger rates and expanded physics programme) drive efforts to reduce data formats for physics: o DST used by most analyses in 2010 (~120kB/event) P Contains copy of RAW and full Reco information  Strong drive to  DST (~13kB/event) P Save information for signal only P Suitable for most exclusive analyses, but many iterations required to get content correct P User-defined data can be added on demand (tagging, isolation,…) o “Legacy” stripping campaign of Run1 data just completed  Will allow to test  DST  MDST.DST == FULL.DST of all events passing a  DST stream. Temporary format (2015-2016) to allow regeneration of  DST in case of missing information without running the stripping again 10

11 Data placement of (  )DSTs m Data-driven automatic replication o Archive systematically all analysis data (T1D0) o Real Data: 4 disk replicas, 1(  2) archives o MC: 3 disk replicas, 1(  2) archives m Selection of disk replication sites: o Keep together whole runs (for real data) P Random choice per file for MC o Chose storage element depending on free space P Random choice, weighted by the free space P Should allow no disk saturation d Exponential fall-off of free space d As long as there are enough non-full sites! m Removal of replicas o For processing n-1: reduce to 2 disk replicas (randomly) P Possibility to preferentially remove replicas from sites with less free space o For processing n-2: only keep archive replicas

Data popularity m Enabled recording of information as of May 2012 m Information recorded for each job: o Dataset (path) o Number of files for each job o Storage element used m Allows currently by visual inspection to identify unused datasets m Plan: o Establish, per dataset: P Last access date P Number of accesses in last (n) months (1<n<12) P Normalise number of dataset accesses to its size o Prepare summary tables per dataset P Access summary (above) P Storage usage at each site o Allow to trigger replica removal when space is required 12

Examples of popularity plots 13 2011 datasets 2012 datasets m We are also working on a classifier that, based on all metadata and popularity history, allows to classify datasets into those that are likely to be used within the next n weeks and those that are not.

Usage of datasets 14 13 weeks 26 weeks 52 weeks ~1PB disk space recovered by purging old data

Summary of 2015-2017 requests CPU Running assumptions Disk 15

Breakdown of CPU requests 16 HI Running pp Running

Breakdown of DISK requests 17 HI Running pp Running

Tape HI Running pp Running 18 Please note: WLCG estimates of tape costs include a 10% cache disk. This is too large for our purposes.

Comparison with “flat budget” Definition of flat budget: same money will buy – 20% more CPUs – 15% more disk – 25% more tape 19 CPU projections ~OK to 2017 Disk ~OK Tape explodes

Mitigation strategies m Increase in tape request is beyond flat-budget expectation o Ask resource providers for advance purchases in order to ease ramp-up o Trade some other resources for tape P But lever arm is short! m Remove second tape copy of derived dataset? o Regeneration of even a small portion of data implies massive tape recalls and computing load, which might jeopardize other production activities m Continue developing data popularity algorithms and data placement strategies o Potential significant savings on disk space m Continue using available CPU resources “parasitically” 20

LHCbComputing LHCb computing plans for Run 3

22 Towards the LHCb Upgrade (Run 3, 2020) m We do not plan a revolution for LHCb Upgrade computing m Rather an evolution to fit in the following boundary conditions: o Luminosity levelling at 2x10 33 P Factor 5 c.f. Run 2 o 100kHz HLT output rate for full physics programme P Factor 8-10 more than in Run 2 o Flat funding for offline computing resources m Computing milestones for the LHCb upgrade (shift proposed to gain experience from 2015-2016 data taking): o TDR: 2017Q1  shifting to Q3 o Computing model: 2018Q3  shifting to 2019 Q1 m Therefore only brainstorming at this stage, to devise model that keeps within boundary conditions

Run 2: computing resources o Tape requirement driven by: P Two copies of RAW d Incompressible, but ~never accessed P One copy Reconstruction output (FULL.DST) o Flat funding cannot accommodate order of magnitude more data expected in Run 3 - need new ideas 23 flat budget: same money will buy 20%, 15%, 25% more CPU, disk, tape CPU projections ~OK to 2017 Disk ~OK Tape explodes

Evolution of LHCb data processing model m Run 1: o Loose selection in HLT (no PID) o First pass offline reconstruction o Stripping P selects ~50% of HLT output rate for physics analysis o Offline calibration o Reprocessing and Restripping m Run 2: o Online calibration o Deferred HLT2 (with PID) o Single pass offline reconstruction P Same calibration as HLT P No reprocessing before LS2 o Stripping and Restripping P Selects ~90% of HLT output rate for Physics analysis m Given sufficient resources in HLT farm, online reconstruction could be made ~identical to offline 24

Run 2: Reconstruction streams 25 m Full stream: prompt reconstruction as soon as RAW data appears offline m Parked stream: safety valve, probably not needed until 2017 m Turbo stream: no offline reconstruction, analysis objects produced in HLT o Important test for Run3

TurboDST: brainstorming for Run 3 m In Run 2, Online (HLT) reconstruction will be very similar to offline (same code, same calibration, fewer tracks) P If it can be made identical, why then write RAW data out of HLT, rather than Reconstruction output? m In Run 2 LHCb will record 2.5 kHz of “TurboDST” P RAW data plus result of HLT reconstruction and HLT selection P Equivalent to a microDST (MDST) from the offline stripping o Proof of concept: can a complete physics analysis be done based on a MDST produced in the HLT? P i.e. no offline reconstruction d no offline realignment, reduced opportunity for PID recalibration P RAW data remains available as a safety net o If successful, can we drop the RAW data? P HLT writes out ONLY the MDST ??? m Currently just ideas, but would allow a 100kHz HLT output rate without an order of magnitude more computing resources. 26

Simulation o LHCb offline CPU usage is dominated by simulation P Already true in Run 2: simulation >60% of CPU needs in 2016 d Many measurements start to be limited by simulation statistics o Simulation suited for execution on heterogeneous resources P Pursue efforts to interface Dirac framework to multiple computing platforms d Allow opportunistic and scheduled use of new facilities P Extend use of HLT farm during LHC stops o Several approaches to reduce CPU time per event P Code optimisation, vectorisation etc. d Contribute to and benefit from community wide activities, e.g. for faster transport P Fast simulations d Not appropriate for many detailed studies for LHCb precision measurements d Nevertheless many generator level studies are possible P Hybrid approach d Full simulation for signal candidates only d Fast techniques for the rest  e.g. skip calorimeter simulation for out of time pileup o To avoid being limited by disk space P Deploy MDST format also for simulated data 27

Conclusions m LHCb event output rate will be an order of magnitude larger in Run 3 (2020) m Currently brainstorming on ideas for reducing data rate without reducing physics reach o Run 2 as a test bed m Computing efforts concentrated on o Code optimisation, e.g. o vectorization and GPU usage in HLT o Opportunistic use of diverse resources 28

LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.

Similar presentations

Presentation on theme: "LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015.

Similar presentations

Presentation on theme: "LHCbComputing LHCb computing model in Run1 & Run2 Concezio Bozzi Bologna, Feb 19 th 2015."— Presentation transcript:

Similar presentations

About project

Feedback