Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feb. 19, 2015 David Lawrence JLab Counting House Operations.

Similar presentations


Presentation on theme: "Feb. 19, 2015 David Lawrence JLab Counting House Operations."— Presentation transcript:

1 Feb. 19, 2015 David Lawrence JLab Counting House Operations

2 Data Rates ROC Event Builder Event Recorder Tape Library ROC Spec: 100MB/sec Tested: ~30MB/sec Spec: 3000MB/sec Tested: 600MB/sec Spec: 300MB/sec Tested: 600MB/sec Spec: 300MB/sec Tested: 450MB/sec “Tested” means with actual data while it was being acquired. In some cases, offline testing has achieved significantly higher rates. 72TB x2 RAID disk Online Status -- David Lawrence2 (L3 farm) 125.9TB in 147,355 files written to tape in 2014 commissioning run

3 Mode 7 (fADC integrals) Mode 8 (fADC full samples) Online Status -- David Lawrence3 232 kB/event 69 kB/event

4 Mode 7 (fADC integrals) Mode 7 (full samples) fADC250 fADC125 fADC250 fADC250/F1TDCfADC125 Mode 7 (fADC integrals) Online Status -- David Lawrence4 FCAL BCAL FCAL BCAL FDC CDC fADC250/F1TDC

5 Adjusting profile of 2014 commissioning data based on recent or planned firmware upgrades is used to estimate event size for production data in the future. (Additional compression is expected when disentangled data is rebuilt after L3 into an as yet undetermined format.) Online Status -- David Lawrence5 (18kB/event from simulation is used to estimate resources for computer center)

6 EVIO Formatted Raw Data Files File format specified in detail by CODA group (https://coda.jlab.org/drupal/system/files/coda/onlineFormat/eventbuilding.pdf) Some corrupted events encountered – Problem due to race condition in ER and only occurs for high rates. Has since been fixed in CODA. – Wrote new EVIO parser code Error recovery (detects and skips bad blocks/events) Mechanism to efficiently grow buffer size Some “features” still need ironing out (e.g. memory leak) Event parsing implements disentangling in parallel Online Status -- David Lawrence6

7 Online Monitoring Online Status -- David Lawrence7

8 Online Monitoring System did not run consistently – Sometimes sluggish or non-responsive – Processes would crash on some nodes with difficult to access error logs – ROOT archive files often empty or corrupt – Slow event rate seemed to result in tiny processing rate due to “burst” effect These issues are currently being addressed Online Status -- David Lawrence8

9 Preparations for next run L1 coincidence trigger ~10kHz DAQ rate (requires f125 multiblock) – Sync events (will require offline mapping) L3 infrastructure test w/ event tagging Secondary ET system for monitoring Run info database integration/enhancement Auxiliary run data packaging for tape storage – Auto-deletion and RAID disk swapping Controls – Scaler readout into EPICS being reworked more efficiently – Goniometer – Voltage controls Online Status -- David Lawrence9

10 Summary 126 TB written to RAID and copied to tape 600 MB/s written to RAID from DAQ while taking data 450 MB/s copy from RAID to tape Electronic Logbook used successfully – https://logbooks.jlab.org https://logbooks.jlab.org Event size larger than expected, but currently being addressed Several items still need to be addressed prior to 2015 commissioning – Many things were done “by hand” but need to either be automated, or a better procedure developed for long term operations to ensure integrity/consistency of data over a long period of time and efficient use of human resources Online Status -- David Lawrence10

11 Backup Slides Online Status -- David Lawrence11

12 Counting house computer systems Computer(s) processor General Purpose Network DAQ Network I.B. Network comments gluonfs1 N/A X ~1.6TB with snapshot backup gluonraid1-2 Intel E5-2630 v2 @2.6GHz XXX RAID disk host ER process gluon01-05 i5-3570 @3.4GHz X Shift taker consoles gluon20-23 AMD 2347 X Controls 8core gluon24-30 E5-2420 @1.9GHz X Controls (gluon24 is web/DB/cMsg server) 12core + 12ht gluon40-43 AMD 6380 XXX 16core + 16”ht” gluon46-49 E5-2650 v2 @2.6GHz XX (gluon47 &49) X 16core + 16ht gluon100-111 E5-2650 v2 @2.6GHz XX 16core + 16ht rocdev1 Pentium 4 @2.8GHz X RHEL5 system for compiling ROLs for DAQ hdguest0-3 i5-3470 @3.2GHz X (outside network) Guest consoles in cubicles (outside network) Online Status -- David Lawrence12

13 Rough Specs. Review 10 8  /s on LH 2 target -> ~400kHz hadronic rate L1 trigger goal is to cut away ~50% leaving 200kHz L3 trigger goal is to reduce by ~90% leaving 20kHz Early simulation suggested ~15kB/event Design specs*: – 15kB/event @ 200 kHz = 3000 MB/s (front end) – L3 reduction by factor of 10 = 300MB/s to RAID disk – 3 days storage on RAID = 300MB/s*3days = 78TB – Maintain 300MB/s transfer from RAID to tape Online Status -- David Lawrence13 *L3 not officially part of 12GeV upgrade project

14 Mode 7 (fADC Integrals) Mode 8 (fADC full samples) Each 32bit word in the EVIO file tallied to identify what file space is being used for Comparison between mode 7 and mode 8 data made Example: some of the fADC250 word types Online Status -- David Lawrence14

15 Online Status -- David Lawrence15

16 Event Size Simulation was consistent with initial estimate of event size Actual data was more than x4 larger Much of the data was taken in “raw” mode where fADC samples were saved Online Status -- David Lawrence16

17 DAQ to Detector Translation Table The Translation Table is used to convert from DAQ system coordinates (rocid, slot, channel) into detector-specific coordinates (e.g. BCAL module, layer, sector, end) ~23k channels defined in SQLite DB file Stored in CCDB as XML string for offline analysis with complete history: – /Translation/DAQ2detector Online Status -- David Lawrence17

18 hdmon Monitoring Plugins Online Status -- David Lawrence18 BCAL_online CDC_online DAQ_online FCAL_onlineFDC_onlinePS_onlineST_onlineTAGH_onlineTAGM_onlineTOF_online Each detector system provides 1 or more plugins that create histograms for monitoring All plugins are attached to a Common DANA process (hdmon) A “rootspy” plugin publishes all histograms to the network rootspy

19 Raw Data Formatted Files (from simulated data) Online Status -- David Lawrence19 CCDB hdgeant_smeared.hddm run0002.evio (Data file in same format as will be produced by CODA DAQ system) roc002.evio roc003.evio roc004.evio......

20 L3 and monitoring architecture EB ER L3 and monitoring processes are decoupled. They could run on same nodes though if desired. gluon53gluonraid1 gluon46 (Data flows from left to right) farm manager Online Status -- David Lawrence20

21 hdmongui Online Status -- David Lawrence21 multiple “levels” supported processes run multi-threaded

22 Online Status -- David Lawrence22

23 Online Status -- David Lawrence23

24 Online Status -- David Lawrence24

25 Current code Online Status -- David Lawrence25

26 All pool maximums increased x10 Only TrackHit pool max increased x10 Online Status -- David Lawrence26


Download ppt "Feb. 19, 2015 David Lawrence JLab Counting House Operations."

Similar presentations


Ads by Google