Download presentation
Presentation is loading. Please wait.
Published byHarvey Webb Modified over 8 years ago
1
Status report from the Deferred Trigger Study Group John Baines, Giovanna Lehmann Miotto, Wainer Vandelli, Werner Wiedenmann, Eric Torrence, Armin Nairz
2
Use cases Deferred Triggers: Subset of events stored in DAQ system & processed later in run - separate stream potentially useful when CPU is saturated at start of fill Broadly two different classes of use case: – Deferred HLT processing: Deferred stream based on L1. Build event, cache, then run HLT later caching@~5-10kHz, deferred processing ~ 1s/event, rejection ~50-100 High cache rate => need high replay rate => shorter per-event processing time EB rate for deferred + prompt must fit in budget (~20kHz for 2 nd generation ROS). e.g. cache all L1 multi-jet events (3-5kHz for 4J20 as 2-3x10 34 ) & run topoclustering e.g. cache L1 multi-jet and/or high p T dilepton triggers & run HLT tracking for displaced vertex trigger – post-HLT processing: Deferred stream based on HLT result. Very similar to L4 case caching at ~0.5-1Hz, deferred processing ~10s/event, rejection ~5-10 Lower cache rate => lower replay rate => longer per-event processing time could be used to increase effic. for same T0 rate: apply looser selection in HLT then deferred trigger runs slower offline selection & applies tighter cuts e.g. deferred stream for triggers requiring full-event EF tracking e.g. MET, b-jet, Tau
3
Deferred Trigger Processing Options Processing options: Inter-fill processing: only processed deferred stream between physics fills – different processes from normal prompt triggers (file based like offline debug stream recovery) – Started between fills when farm relatively idle. Stopped when new fill starts. => Baseline option in-fill plus inter-fill processing: attempt to make use also of spare CPU capacity later in run – little gain if LHC luminosity levelling – Could be in competition with end-of-fill triggers – Dynamic partitioning of HLT farm: Different processes for normal prompt processing & processing cached events. Need to dynamically vary partitioning as CPU usage changes for prompt processing Delay to reconfigure partition and start/stop process Significant difficulties for DAQ => Disfavoured option – Variable deferral fraction: Still have only inter-fill processing of cached events but add ability to process some fraction of deferred-triggers promptly in normal trigger processes. mechanism similar to pre-scales use to update fraction of events cached during run Relatively small change for online but events from same LB split between prompt and deferred files => Significant additional complexity for Tier0 => Disfavoured option
4
Storage options Distributed storage: local disks of HLT nodes + Potentially large ~1600TB, but not RAID disk => not secure + Distributed => play-back not limited by data rates from disk - Book-keeping & operations difficulties, Can’t balance load for playback => Disfavoured option Central storage: expand existing SFO + Secure storage; much higher fault tolerance + Can balance load across farm during play-back + straight forward book-keeping + minimizes changes needed to current system -Playback limited to data rates ~5GB/s (2.5 kHz event rate) => Baseline option Clustered storage: per-rack SFO-like disk-server Lower number of disks than distributed scheme => Retains some of the advantage of the central scheme More distributed than central scheme higher playback rates Solution for higher-rates
5
Order of Magnitude Cost Estimate Baseline: Inter-fill processing, Central Storage 1kHz caching rate, 2.5 kHz playback 8s/event processing time 210 TB Disk Cache Hardware Cost: ~100 kCHF Possible Use case: EF fullscan for MET/Tau/b-jet High Rate System: (Baseline x 10) Inter-fill processing, Clustered Storage 10kHz caching rate, 25 kHz playback 0.8s/event processing time 2100 TB Disk Cache Hardware Cost ~1MCHF Possible use case: Multi-jets: Topocluster Displaced vertex trigger: L2 ID fullscan for multijets or high p T mu Processing power equivalent to 40% of current farm capacity Wall-time to process: <30 hours based on 2012 fill data: could be longer for more efficient LHC Effort Needed: 3.5 SY for online sw infrastructure changes + 0.25 SY for Tier-0 sw infrastructure changes excludes effort to develop, configure & install h/w & operational effort Time-scale: 1 year for sw development + commissioning during extended break, e.g. winter shutdown Processing time factor 10 less
6
Summary Deferred stream could have significant benefits for a CPU limited farm BUT: – Deferred stream processing only suitable for specific use cases (low rate, high processing time) – much less flexible than normal prompt processing preferable to address need for added CPU by upgrading nodes or adding racks. – significant cost: both hardware & effort Preferred scheme is inter-fill processing: – In-fill processing unattractive due to added complexity: Online for dynamic partitioning or Offline for variable deferral fraction Central or clustered storage preferred A base-line infrastructure could provide: – up to 2.5 kHz deferred stream rate – 8s/event for processing – processing completed within 48 hrs (under 2012 operating conditions) In the case of more efficient LHC, would need to lower deferred stream rate
7
Additional Material
8
Introduction Deferred Triggers: Subset of events stored in DAQ system & processed later in run Two processing options considered: Inter-fill processing: only processed deferred stream between physics fills Dynamic processing: process both in-fill and inter-fill – attempt to make use also of spare CPU capacity later in run Potential competition with end of fill triggers ~50% decrease after 4 hours
9
Assumptions Events built before being cached – may contain intermediate HLT result in case HLT run before caching Deferred stream consists of a specific subset of triggers: – must not include triggers needed by calibration stream to produce constants for bulk processing Deferred triggers output to a separate stream Deferred stream needs: – Different constants – possible from a different run – Separate monitoring – relates to past not current condition – Independent of state of on-going run Need separate processes for deferred stream processing. File-based processing is the most straight-forward Need to partition farm between prompt and deferred processing and dynamically balance resources – Relatively straight forward in inter-fill scheme – Difficult in dynamic scheme => Inter-fill scheme is the baseline
10
Disk size & Total Processing time Inter-fill scheme: Includes delays due to pausing of reprocessing during subsequent physics fills Disk Usage by Deferred Stream (TB) Wall-time to process deferred stream (hours) Result’s of Eric’s model based on 2012 fill information
11
Cache: 0.5kHz playback: 2.5kHz Cache: 1 kHz playback: 2.5kHz Cache: 2.5kHz playback: 2.5kHz Time to process Inter-fill scheme: Includes delays due to pausing of reprocessing during subsequent physics fills
12
Cache: 0.5kHz playback: 2.5kHz Cache: 1 kHz playback: 2.5kHz Cache: 2.5kHz playback: 2.5kHz Disk Usage Inter-fill scheme: Includes effect of delays due to pausing of reprocessing during subsequent physics fills
13
Requirements - Some examples: Inter-fill processing Event (Data) Rate Max. wall- time to process [h] Max. Disk Usage [TB] Average HLT Processing Time [s/event] Effective inc. in farm proc. capacity [cores] Caching [kHz (GB/s)] Playback [kHz (GB/s)] 0.5 (1)2.5 (5)2385820% 1 (2)2.5 (5)29210840% 2.5 (5) 496608100% 10 (20)25(50)29 21000.840% 10 (20) 4926402100% From Model 20k cores/ playback rate HLT proc. Time * caching rate/20k = caching rate/ Playback rate Current SFO : 6x21 TB + 3x10 TB disks => 156TB Write: 1.6 GB/s; Read: 2GB/s Input Clustered storage Baseline High-Rate
14
In-fill & Inter-fill processing Dynamic Partitioning of farm has to dynamically take into account changes in CPU requirement Each change imposes delays to configure & start/abort processes hard! Relatively small potential gains (except in special case): Event (Data) Rate Max. wall- time to process [h] Max. Disk Usage [TB] Caching [kHz (GB/s)] Playback [kHz (GB/s)] 0.5 (1)2.5 (5)0.8 c.f. 2314 c.f. 85 1 (2)2.5 (5)25 c.f. 29113 c.f. 210 1.5 (3)2.5 (5)31253 Special case: in-fill processing rate = caching rate Would it be possible to use a mechanism similar to end of fill triggers? Define a variable deferral fraction Set to 1 at start of run Set to e.g. 0.8 during run => 80% of deferred triggers cached, 20% processed promptly Big disadvantage: events from same lumi block in o/p files produces up to 48 hrs apart Assume 20% of farm used for prompt processing after 4 hours
15
DAQ & HLT Activation of deferred stream processing should be automatic – But can be stopped/aborted by expert Error handling should not normally require operator intervention – But alert expert if system cannot restart correctly Must be possible to rapidly stop partition when needed – And re-start again from this point when CPU becomes available Need to define action in case disks become full – Stop deferred stream, – Exceptionally transfer events unprocessed to Tier0? (if rate ~500Hz) Extensive book-keeping framework needed: – To drive play-back – to account for data possible loses
16
Tier0 While technically possible to deal with delays > 48 hours, anything that deviates from standard work-flow is significant extra work => should keep within 48 hours except in very rare exceptions Important that output files are LB-aware i.e. closed at LB boundaries In the case of the clustered or distributed options would need to make a significant addition to T0 to merge files: – Multi-step RAW file merging needed (more complicated than current 1-step process) – Currently ~10 files per LB, could be ~200 smaller files for clustered storage (even more for distributed storage) Completeness of dataset is an issue: rely on completeness in many places – e. g. RAW merging job only defined for complete data – Would need to adapt T0 workflow to enable processing of prompt stream with only partially complete LBs Extra infrastructure needed if, in exceptional circumstances, unprocessed events are streamed to Tier-0: – Complete HLT processing & re-streaming needed offline - similar to debug reproc. but much bigger scale (~10M events c.f. few hundred) – Retro-active insertion of prcessed data into handshake DB – Merge many small files produced – Need to add to files from truncated online processing before bulk reconstruction
17
DQ Online monitoring should be separate Offline: should be possible to treat deferred steam in same way as other streams => Deferred triggers adequately represented in express stream Deferred stream available for bulk processing within 48 hours of run-end Need stream-dependant good run list
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.