Feb. 19, 2015 David Lawrence JLab Counting House Operations.

Slides:



Advertisements
Similar presentations
RPC Trigger Software ESR, July Tasks subsystem DCS subsystem Run Control online monitoring of the subsystem provide tools needed to perform on-
Advertisements

April 28, 2005 EPICS Collaboration Controls Group Status of the Channel Access Zippy Archiver (CZAR) B. Bevins, et. al.
Far Detector Data Quality Andy Blake Cambridge University.
DAQ Considerations For Increased Rep. Rate J. Leaver 01/12/2009.
Database Management Systems (DBMS)
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Offline Software Status Jan. 30, 2009 David Lawrence JLab 1.
Online Update Elliott Wolin GlueX Collaboration Meeting 3-Jun-2013.
Trigger Database Dmitry Romanov. Run Control Super- visor Agent CODA ROC Write Configuration tag – run number Run Config DB CODA ROC Agent    Hardware.
May. 11, 2015 David Lawrence JLab Counting House Operations.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
DAQ Status & Plans GlueX Collaboration Meeting – Feb 21-23, 2013 Jefferson Lab Bryan Moffit/David Abbott.
Online Data Challenges David Lawrence, JLab Feb. 20, /20/14Online Data Challenges.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
HPS Online Software Discussion Jeremy McCormick, SLAC Status and Plans.
IceCube DAQ Mtg. 10,28-30 IceCube DAQ: “DOM MB to Event Builder”
DAQ Status Report GlueX Collaboration – Jan , 2009 – Jefferson Lab David Abbott (In lieu of Graham) GlueX Collaboration Meeting - Jan Jefferson.
Online Monitoring Status David Lawrence JLab Oct. 2, /2/14Monitoring Status -- David Lawrence1.
MICE CM25 Nov 2009Jean-Sebastien GraulichSlide 1 Detector DAQ Issues o Achievements Since CM24 o Trigger o Event Building o Online Software o Front End.
DAQ Issues for the 12 GeV Upgrade CODA 3. A Modest Proposal…  Replace aging technologies  Run Control  Tcl-Based DAQ components  mSQL  Hall D Requirements.
Data Acquisition for the 12 GeV Upgrade CODA 3. The good news…  There is a group dedicated to development and support of data acquisition at Jefferson.
Offline shifter training tutorial L. Betev February 19, 2009.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
JANA and Raw Data David Lawrence, JLab Oct. 5, 2012.
A.Golunov, “Remote operational center for CMS in JINR ”, XXIII International Symposium on Nuclear Electronics and Computing, BULGARIA, VARNA, September,
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Data Acquisition Backbone Core J. Adamczewski-Musch, N. Kurz, S. Linev GSI, Experiment Electronics, Data processing group.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
David Abbott - Jefferson Lab DAQ group Data Acquisition Development at JLAB.
Hall-D/GlueX Software Status 12 GeV Software Review III February 11[?], 2015 Mark Ito.
Report on CHEP ‘06 David Lawrence. Conference had many participants, but was clearly dominated by LHC LHC has 4 major experiments: ALICE, ATLAS, CMS,
Oct. 8, 2015 David Lawrence JLab Counting House Operations.
1 23.July 2012Jörn Adamczewski-Musch TRB / HADAQ plug-ins for DABC and Go4 Jörn Adamczewski-Musch GSI, Experiment Electronics: Data Processing group EE-meeting,
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Online Monitoring for the CDF Run II Experiment T.Arisawa, D.Hirschbuehl, K.Ikado, K.Maeshima, H.Stadie, G.Veramendi, W.Wagner, H.Wenzel, M.Worcester MAR.
Source Controller software Ianos Schmidt The University of Iowa.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
Offline Status Report A. Antonelli Summary presentation for KLOE General Meeting Outline: Reprocessing status DST production Data Quality MC production.
MICE CM28 Oct 2010Jean-Sebastien GraulichSlide 1 Detector DAQ o Achievements Since CM27 o DAQ Upgrade o CAM/DAQ integration o Online Software o Trigger.
Pixel DQM Status R.Casagrande, P.Merkel, J.Zablocki (Purdue University) D.Duggan, D.Hidas, K.Rose (Rutgers University) L.Wehrli (ETH Zuerich) A.York (University.
The JANA Reconstruction Framework David Lawrence - JLab May 25, /25/101JANA - Lawrence - CLAS12 Software Workshop.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Monitoring Update David Lawrence, JLab Feb. 20, /20/14Online Monitoring Update -- David Lawrence1.
STAR Collaboration Meeting, BNL – march 2003 Alexandre A. P. Suaide Wayne State University Slide 1 EMC Update Update on EMC –Hardware installed and current.
DAQ Status & Plans GlueX Collaboration Meeting – Feb 21-23, 2013 Jefferson Lab Bryan Moffit/David Abbott.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Straw VME Data Monitoring M. Koval Straw WG meeting
CODA Graham Heyes Computer Center Director Data Acquisition Support group leader.
Event Management. EMU Graham Heyes April Overview Background Requirements Solution Status.
The ALICE data quality monitoring Barthélémy von Haller CERN PH/AID For the ALICE Collaboration.
October 19, 2010 David Lawrence JLab Oct. 19, 20101RootSpy -- CHEP10, Taipei -- David Lawrence, JLab Parallel Session 18: Software Engineering, Data Stores,
Barthélémy von Haller CERN PH/AID For the ALICE Collaboration The ALICE data quality monitoring system.
David Lawrence JLab May 11, /11/101Reconstruction Framework -- GlueX Collab. meeting -- D. Lawrence.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
CT-PPS DB Info (Preliminary) DB design will be the same as currently used for CMS Pixels, HCAL, GEM, HGCAL databases DB is Oracle based A DB for a sub-detector.
Compute and Storage For the Farm at Jlab
BaBar Transition: Computing/Monitoring
WP18, High-speed data recording Krzysztof Wrona, European XFEL
ALICE – First paper.
Computing Infrastructure for DAQ, DM and SC
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Example of DAQ Trigger issues for the SoLID experiment
John Harvey CERN EP/LBC July 24, 2001
Presentation transcript:

Feb. 19, 2015 David Lawrence JLab Counting House Operations

Data Rates ROC Event Builder Event Recorder Tape Library ROC Spec: 100MB/sec Tested: ~30MB/sec Spec: 3000MB/sec Tested: 600MB/sec Spec: 300MB/sec Tested: 600MB/sec Spec: 300MB/sec Tested: 450MB/sec “Tested” means with actual data while it was being acquired. In some cases, offline testing has achieved significantly higher rates. 72TB x2 RAID disk Online Status -- David Lawrence2 (L3 farm) 125.9TB in 147,355 files written to tape in 2014 commissioning run

Mode 7 (fADC integrals) Mode 8 (fADC full samples) Online Status -- David Lawrence3 232 kB/event 69 kB/event

Mode 7 (fADC integrals) Mode 7 (full samples) fADC250 fADC125 fADC250 fADC250/F1TDCfADC125 Mode 7 (fADC integrals) Online Status -- David Lawrence4 FCAL BCAL FCAL BCAL FDC CDC fADC250/F1TDC

Adjusting profile of 2014 commissioning data based on recent or planned firmware upgrades is used to estimate event size for production data in the future. (Additional compression is expected when disentangled data is rebuilt after L3 into an as yet undetermined format.) Online Status -- David Lawrence5 (18kB/event from simulation is used to estimate resources for computer center)

EVIO Formatted Raw Data Files File format specified in detail by CODA group ( Some corrupted events encountered – Problem due to race condition in ER and only occurs for high rates. Has since been fixed in CODA. – Wrote new EVIO parser code Error recovery (detects and skips bad blocks/events) Mechanism to efficiently grow buffer size Some “features” still need ironing out (e.g. memory leak) Event parsing implements disentangling in parallel Online Status -- David Lawrence6

Online Monitoring Online Status -- David Lawrence7

Online Monitoring System did not run consistently – Sometimes sluggish or non-responsive – Processes would crash on some nodes with difficult to access error logs – ROOT archive files often empty or corrupt – Slow event rate seemed to result in tiny processing rate due to “burst” effect These issues are currently being addressed Online Status -- David Lawrence8

Preparations for next run L1 coincidence trigger ~10kHz DAQ rate (requires f125 multiblock) – Sync events (will require offline mapping) L3 infrastructure test w/ event tagging Secondary ET system for monitoring Run info database integration/enhancement Auxiliary run data packaging for tape storage – Auto-deletion and RAID disk swapping Controls – Scaler readout into EPICS being reworked more efficiently – Goniometer – Voltage controls Online Status -- David Lawrence9

Summary 126 TB written to RAID and copied to tape 600 MB/s written to RAID from DAQ while taking data 450 MB/s copy from RAID to tape Electronic Logbook used successfully – Event size larger than expected, but currently being addressed Several items still need to be addressed prior to 2015 commissioning – Many things were done “by hand” but need to either be automated, or a better procedure developed for long term operations to ensure integrity/consistency of data over a long period of time and efficient use of human resources Online Status -- David Lawrence10

Backup Slides Online Status -- David Lawrence11

Counting house computer systems Computer(s) processor General Purpose Network DAQ Network I.B. Network comments gluonfs1 N/A X ~1.6TB with snapshot backup gluonraid1-2 Intel E XXX RAID disk host ER process gluon01-05 X Shift taker consoles gluon20-23 AMD 2347 X Controls 8core gluon24-30 X Controls (gluon24 is web/DB/cMsg server) 12core + 12ht gluon40-43 AMD 6380 XXX 16core + 16”ht” gluon46-49 E XX (gluon47 &49) X 16core + 16ht gluon E XX 16core + 16ht rocdev1 Pentium X RHEL5 system for compiling ROLs for DAQ hdguest0-3 X (outside network) Guest consoles in cubicles (outside network) Online Status -- David Lawrence12

Rough Specs. Review 10 8  /s on LH 2 target -> ~400kHz hadronic rate L1 trigger goal is to cut away ~50% leaving 200kHz L3 trigger goal is to reduce by ~90% leaving 20kHz Early simulation suggested ~15kB/event Design specs*: – 200 kHz = 3000 MB/s (front end) – L3 reduction by factor of 10 = 300MB/s to RAID disk – 3 days storage on RAID = 300MB/s*3days = 78TB – Maintain 300MB/s transfer from RAID to tape Online Status -- David Lawrence13 *L3 not officially part of 12GeV upgrade project

Mode 7 (fADC Integrals) Mode 8 (fADC full samples) Each 32bit word in the EVIO file tallied to identify what file space is being used for Comparison between mode 7 and mode 8 data made Example: some of the fADC250 word types Online Status -- David Lawrence14

Online Status -- David Lawrence15

Event Size Simulation was consistent with initial estimate of event size Actual data was more than x4 larger Much of the data was taken in “raw” mode where fADC samples were saved Online Status -- David Lawrence16

DAQ to Detector Translation Table The Translation Table is used to convert from DAQ system coordinates (rocid, slot, channel) into detector-specific coordinates (e.g. BCAL module, layer, sector, end) ~23k channels defined in SQLite DB file Stored in CCDB as XML string for offline analysis with complete history: – /Translation/DAQ2detector Online Status -- David Lawrence17

hdmon Monitoring Plugins Online Status -- David Lawrence18 BCAL_online CDC_online DAQ_online FCAL_onlineFDC_onlinePS_onlineST_onlineTAGH_onlineTAGM_onlineTOF_online Each detector system provides 1 or more plugins that create histograms for monitoring All plugins are attached to a Common DANA process (hdmon) A “rootspy” plugin publishes all histograms to the network rootspy

Raw Data Formatted Files (from simulated data) Online Status -- David Lawrence19 CCDB hdgeant_smeared.hddm run0002.evio (Data file in same format as will be produced by CODA DAQ system) roc002.evio roc003.evio roc004.evio......

L3 and monitoring architecture EB ER L3 and monitoring processes are decoupled. They could run on same nodes though if desired. gluon53gluonraid1 gluon46 (Data flows from left to right) farm manager Online Status -- David Lawrence20

hdmongui Online Status -- David Lawrence21 multiple “levels” supported processes run multi-threaded

Online Status -- David Lawrence22

Online Status -- David Lawrence23

Online Status -- David Lawrence24

Current code Online Status -- David Lawrence25

All pool maximums increased x10 Only TrackHit pool max increased x10 Online Status -- David Lawrence26