Computing model and data handling

Computing model and data handling
Paolo Valente INFN Roma Summary of current scheme + Some questions and open issues

Rates Take into account duty-cycle:
NA62 Take into account duty-cycle: 9.6 s flat top/27.6 s (or longer) SPS super-cycle Is this still valid? Are there “special” configurations to be taken into account? / IRC / 1

Data flow Are the steps RAW  RECO  THIN enough?
Consider further “thinning” (Super-THIN)? Consider the L3 possibility (even in auto-pass): F(iltered)-RAW D(ownscaled)-RAW How many formats? (RAW, F-RAW, D-RAW, RECO, THIN, S-THIN…) + calibration data How/many, which DB? LKr Data L0 trigger L0 data Downscaled auto-pass L1 trigger Calibration data L2 trigger Reconstruction Calibration RAW RECO Analysis Analysis Data stripping THIN

Trigger rates and latencies
1 MHz L0 L1 100 kHz 15 kHz L2 … common TDAQ questions & issues (simulation, algorithms, computing power, use of GPU’s, etc…) Has an impact on the farm size/topology We tried to be flexible in the infrastructure Room for improvement/changes (octopus/hexapus)… … but need to buy PC’s, sooner or later

Event size LKr calorimeter will not be zero-suppressed in hardware
But some data reduction can be foreseen at later stage Huge amount of data: 40 MHz × 14 bits × cells = 930 GB/s Buffer in local memories and download on positive L1 decision only 8 samples × 14 bits × cells = 185 kB/event (18.5 GB/s at L2 input, 100 kHz) Gigatracker silicon pixel tracker, 3 stations × pixels 180 Mhit/s, 900 GB/s per station, in a 75 ns window: 3×135 hits×32 bits/hit, average 2.2 kB/event (2.2 GB/s at L1 input) CEDAR 75 ns window, 4-12 particles × 18 hits × 32 bits = 0.3/0.9 kB/event Event size Need to cross-check this table! Maybe more information wrt few months ago on some detector 0.3/0.9 kB 2.2 kB 185 kB + IRC ChANTI

Event size ChANTI (charged veto)
276 channels, 8 × 6 layers = 48 hits ×2 × 32 bits = 400 B/event (0.4 GB/s at L1 input) LAV (Large Angle Veto) 12 stations × 7 hits × 6 words = 500 B/event (0.5 GB/s at L1 input) Inner Ring + Small Angle Calorimeters (IRC+SAC) 0.5 (or 1.0) GHz × 14 bits × (16+4) channels = 14 (28) GB/s 75 samples (× 1 ns) × 14 bits × (16+4) channels = 2.7 kB (2.7 GB/s at L1 input) Event size Anything changed? 0.3/0.9 kB 2.2 kB 0.5 kB 185 kB + IRC 2.7 kB ChANTI 0.4 kB

Event size RICH STRAW tracker MUV Anything changed?
25 hit × 8 Bytes, 0.2 kB/event (200 MB/s at L1 input) STRAW tracker 40 bits/hit× 4 straws/view × 4 views × 4 chambers ×2 = 5.4 kB/evet (5.4 GB/s at L1 input) MUV 560 channels total, 3 stations × 32 bits ×2 ×10 hits/stations = 0.2 kB/event (200 MB/s at L1 input) Event size Anything changed? 0.3/0.9 kB 2.2 kB 0.5 kB 5.4 kB 0.2 kB 185 kB 0.2 kB + IRC 2.7 kB ChANTI 0.4 kB

GTK … Sub-detector … LKr TEL62 L0TP Anything changed? CREAM
Digital sums ADC TDC RAM CREAM GTK readout LKr L0 1 Gbit Eth TEL62 LKr L0 L0 Underground area Computing room LKr L0 L0 Each sub-detector sends summary information of the event (L0 primitives) to a central processor (L0TP) L0TP Anything changed? Contributes to L0 / IRC / 1 ChANTI 2

L0 selection More input from simulation?

12.GB/s GTK Sub-detector L0TP L1TP
readout L0 data fragments Underground area data Computing room When L0 is distributed, data are transmitted to the L1 PC’s Sub-detector L1 algorithms are run on the sub-event and “primitives” are sent to L1 trigger processor (L1TP) The L1TP will then produce a L1 decision data fragments L0TP sub-event sub-event L1 PCs L0 L1TP Input to L1 Cross-check with online software being developped now… 0.5 GB/s 2.25 GB/s 0.5 GB/s 5.4 GB/s 0.2 GB/s 12.GB/s - 0.25 GB/s / IRC / 1 2.7 GB/s ChANTI 2 0.4 GB/s

L1 selection ... but you need to reconstruct tracks, either with the spectrometer or with RICH+hodoscope, i.e. more computing power at L1 Vertex distribution for L0 triggered events (m) What about computing power needed at L1?

Connected to previous one: What about computing power needed at L2?
LKr Connected to previous one: What about computing power needed at L2? ADC RAM CREAM L1 LKr fragments Underground area Computing room sub-event GTK sub-event full-event L1 L1 PCs L2 PCs L1TP L1 trigger is sent to L1PC’s with accepted time-stamps… …and to the CREAM system L1-accepted sub-events and LKr fragments are sent to L2

Total event size L2 rate = 15 kHz, to be averaged with a duty-cycle of 0.3 (5000 × ×100 = 41010 events, ≈100 pnn events) Reduce calorimeter cells from to about 1000 at L2: kB  30 kB/event × 15 kHz = 450 MB/s Duty cycle = 30%  tape speed = 150 MB/s 13 TB/day RAW data Farm disk storage = 200 TB Some insight from dry/technical run: need not-suppressed LKr events 1 MHz 100 kHz Input to L1 Input to L2 (MB/s) 0.5 GB/s 50 2.25 GB/s 225 0.5 GB/s 50 5.4 GB/s 540 0.2 GB/s 20 20 GB/s - 18500 0.25 GB/s 25 / IRC / 1 2.7 GB/s 270 ChANTI 2 0.4 GB/s 40

Downstairs network Electronics barrack To surface OK “Local” switches
24× 1GbE 2× 10 GbE

Farm network Link to CERN-IT for dry/technical run:
10 Gb pieces missing Old 1 Gb fiber?

Data processing

Analysis tasks Reconstruction Skimming RECO RECO RECO COST COND RAW
THIN THIN Skimming RECO RECO RECO Calibration task Reconstruction COST COND (Offline) RAW CALB Calibration task CONF COND (Online) TDAQ DCS

Data handling NA62 computing room CERN computing center
Off-site computing center Online farm Assume some CPU & disk No tapes Assume plenty of tapes Some disk and CPU Assume large centers with plenty of CPU and disk Some tapes

Off-site computing center RAW RAW Online farm processing RAW buiilding RECO THIN analysis A Reconstruct/reprocess off-site Use CERN for tape storage only

Off-site computing center Online farm RAW RECO THIN RAW buiilding processing analysis RECO THIN B Reconstruct/reprocess at CERN Use off-site resources for analysis only

Off-site computing center Online farm re-processing RAW RAW RECO-1 THIN-1 RAW buiilding RECO THIN processing analysis analysis RECO THIN A/B Reconstruct at CERN Distribute RAW/RECO off-site

Off-site computing center Online farm RAW RAW RECO THIN buiilding analysis processing RECO THIN Not a “practical” hypothesis: Need to wait end of data taking before start processing Huge disk space No tape recording until end of data taking Still possible to find a good use of computing power in the farm when not taking data RECO THIN C Reconstruct/reprocess at online farm Problem: retrieve RAW from tapes

Reconstruction NA62RECO LAV stations only: 0.4 HS06s/event
- No optimization for L2 NA62RECO LAV stations only: 0.4 HS06s/event Full detector: 1.2 HS06s/event L1 farm: 0.01 HS06s × 1Mevents/1 s = 10 kHS06 (2500 kSi2k) 25 HS06/core  400 cores (10 × 4 processors × 10 cores) L2 farm: 1.2 HS06s × 100kevents/20 s = 6 kHS06 (1500 kSi2k) 25 HS06/core  240 cores (6 × 4 processors × 10 cores) After trigger simulation optimization, NA62RECO seems much better, but still not the final bells & whistles

Reprocessing 150 MB/s × s = 13 TB/day × 100 days = 1.3 PB/year RAW kB  30 kB/event × 15 kHz = 450 MB/s Duty cycle = 30%  tape speed = 150 MB/s Reconstruction:1.2 HS06s  HS06days (+stripping) For reprocessing In 3 months  6600 HS06 (about 300 cores) 80 MB/s read 20 MB/s write 350 TB total THIN reconstructed data (factor 4/5 reduction)

Farm/Processing open points
Event size is known with some uncertainty: review all sub-detector Less precise estimate: CPU for processing/reprocessing Even less precise: CPU needed for L1/L2 algorithms Are about 30 machines enough for the farm? If not, need network trunking How do multi-core and many-core machines perform with our L1/L2 software… Not really known: reconstructed event size More questions: How much downscaled auto-pass data? L3 filtering? Total tapes/disk needed Option of distribution RAW data and run processing/reprocessing outside CERN

Monte Carlo NA62MC generation
Today about HS06 s/event for p+p0 50 times less for K2m NA62MC generation Full detector, p+p0 events: 230 HS06s/event Average event size 1 MB (not optimized) 108 events production in 2011 using UK resources (200 kHS06days, 100 TB) We will need larger productions later on

MC: open points How much Monte Carlo do we need in the next 2 years?
How much CPU and disk Where/How to access Prepare for Grid usage (UK, Italy?) Example: 109 events: 5×106HS06days (550 cores), 1 PB Especially if we plan to use distributed resources for data handling/analysis, Monte Carlo is the natural exercise to start with

Computing model and data handling

Similar presentations

Presentation on theme: "Computing model and data handling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computing model and data handling

Similar presentations

Presentation on theme: "Computing model and data handling"— Presentation transcript:

Similar presentations

About project

Feedback