G ö khan Ü nel / CHEP 2004- Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system Introduction/Generalities –Presentation of the ATLAS DAQ components.

Slides:



Advertisements
Similar presentations
Sander Klous on behalf of the ATLAS Collaboration Real-Time May /5/20101.
Advertisements

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Kostas KORDAS INFN – Frascati XI Bruno Touschek spring school, Frascati,19 May 2006 Higgs → 2e+2  O (1/hr) Higgs → 2e+2  O (1/hr) ~25 min bias events.
Remigius K Mommsen Fermilab A New Event Builder for CMS Run II A New Event Builder for CMS Run II on behalf of the CMS DAQ group.
CHEP04 - Interlaken - Sep. 27th - Oct. 1st 2004T. M. Steinbeck for the Alice Collaboration1/20 New Experiences with the ALICE High Level Trigger Data Transport.
July 7, 2008SLAC Annual Program ReviewPage 1 High-level Trigger Algorithm Development Ignacio Aracena for the SLAC ATLAS group.
1 The ATLAS Online High Level Trigger Framework: Experience reusing Offline Software Components in the ATLAS Trigger Werner Wiedenmann University of Wisconsin,
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
5 Feb 2002Alternative Ideas for the CALICE Backend System 1 Alternative Ideas for the CALICE Back-End System Matthew Warren and Gordon Crone University.
K. Honscheid RT-2003 The BTeV Data Acquisition System RT-2003 May 22, 2002 Klaus Honscheid, OSU  The BTeV Challenge  The Project  Readout and Controls.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.
“ PC  PC Latency measurements” G.Lamanna, R.Fantechi & J.Kroon (CERN) TDAQ WG –
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
The Region of Interest Strategy for the ATLAS Second Level Trigger
Control in ATLAS TDAQ Dietrich Liko on behalf of the ATLAS TDAQ Group.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
ATLAS ATLAS Week: 25/Feb to 1/Mar 2002 B-Physics Trigger Working Group Status Report
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
The Data Flow System of the ATLAS DAQ/EF "-1" Prototype Project G. Ambrosini 3,9, E. Arik 2, H.P. Beck 1, S. Cetin 2, T. Conka 2, A. Fernandes 3, D. Francis.
The ATLAS Trigger: High-Level Trigger Commissioning and Operation During Early Data Taking Ricardo Gonçalo, Royal Holloway University of London On behalf.
Navigation Timing Studies of the ATLAS High-Level Trigger Andrew Lowe Royal Holloway, University of London.
Diana Scannicchio (D.F.N.T. and I.N.F.N of Pavia) - T/DAQ Workshop - Beatenberg 6-10 Dec  Event Filter on SMP architecture  Subfarm design  Tests.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
CHEP March 2003 Sarah Wheeler 1 Supervision of the ATLAS High Level Triggers Sarah Wheeler on behalf of the ATLAS Trigger/DAQ High Level Trigger.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Kraków4FutureDaQ Institute of Physics & Nowoczesna Elektronika P.Salabura,A.Misiak,S.Kistryn,R.Tębacz,K.Korcyl & M.Kajetanowicz Discrete event simulations.
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf.
Kostas KORDAS INFN – Frascati 10th Topical Seminar on Innovative Particle & Radiation Detectors (IPRD06) Siena, 1-5 Oct The ATLAS Data Acquisition.
DAQ interface + implications for the electronics Niko Neufeld LHCb Electronics Upgrade June 10 th, 2010.
New DAQ at H8 Speranza Falciano INFN Rome H8 Workshop 2-3 April 2001.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.
LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.
KIP Ivan Kisel, Uni-Heidelberg, RT May 2003 A Scalable 1 MHz Trigger Farm Prototype with Event-Coherent DMA Input V. Lindenstruth, D. Atanasov,
Artur BarczykRT2003, High Rate Event Building with Gigabit Ethernet Introduction Transport protocols Methods to enhance link utilisation Test.
LECC2004 BostonMatthias Müller The final design of the ATLAS Trigger/DAQ Readout-Buffer Input (ROBIN) Device B. Gorini, M. Joos, J. Petersen, S. Stancu,
Remigius K Mommsen Fermilab CMS Run 2 Event Building.
ANDREA NEGRI, INFN PAVIA – NUCLEAR SCIENCE SYMPOSIUM – ROME 20th October
1 Nicoletta GarelliCPPM, 03/25/2011 Overview of the ATLAS Data-Acquisition System o perating with proton-proton collisions Nicoletta Garelli (CERN) CPPM,
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Jos VermeulenTopical lectures, Computer Instrumentation, Introduction, June Computer Instrumentation Introduction Jos Vermeulen, UvA / NIKHEF Topical.
Jos VermeulenTopical Lectures, Computer Instrumentation, TDAQ, June Computer Instrumentation Triggering and DAQ Jos Vermeulen, UvA / NIKHEF Topical.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
High Rate Event Building with Gigabit Ethernet
CALICE TDAQ Application Network Protocols 10 Gigabit Lab
5/14/2018 The ATLAS Trigger and Data Acquisition Architecture & Status Benedetto Gorini CERN - Physics Department on behalf of the ATLAS TDAQ community.
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
CMS DAQ Event Builder Based on Gigabit Ethernet
Operating the ATLAS Data-Flow System with the First LHC Collisions
Example of DAQ Trigger issues for the SoLID experiment
12/3/2018 The ATLAS Trigger and Data Acquisition Architecture & Status Benedetto Gorini CERN - Physics Department on behalf of the ATLAS TDAQ community.
John Harvey CERN EP/LBC July 24, 2001
Event Building With Smart NICs
1/2/2019 The ATLAS Trigger and Data Acquisition Architecture & Status Benedetto Gorini CERN - Physics Department on behalf of the ATLAS TDAQ community.
LHCb Trigger, Online and related Electronics
Network Processors for a 1 MHz Trigger-DAQ System
The Performance and Scalability of the back-end DAQ sub-system
LHCb Online Meeting November 15th, 2000
Presentation transcript:

G ö khan Ü nel / CHEP Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system Introduction/Generalities –Presentation of the ATLAS DAQ components Functionality & Performance Measurements –Prototype Setup –Event Building, ROI collection, Combined systems –At2sim: discrete data Simulation Conclusions –From Prototype setup & simulations Outlook N Gökhan Ünel on behalf of the ATLAS TDAQ Group

G ö khan Ü nel / CHEP Interlaken ATLAS 2 Generalities : ATLAS DAQ Level1(L1) rate: 75 kHz min, upgradeable to 100 kHz Level2(L2) rate per ROS : 20 kHz ; L2 time budget per event: 10 ms EventBuilding(EB) rate : kHz for 1.5  2 MByte events Recording rate: 200 Hz for 1.5  2 MByte events SFIL2PU L2SV DFM pROS ROS ROI data (100kHz) Event data (100kHz) L2 decision To EventFilter (3kHz) ROI data Event Clear Assign event Request data L2 details L2 decision End of event

G ö khan Ü nel / CHEP Interlaken ATLAS 3 Matching requirements –DataFlowManager(DFM), L2SuperVisor(L2SV): previous work (TDR) has shown currently available hardware can match the requirements. –ReadOutSystem(ROS), SubFarmInput (SFI): Latest studies will be presented in this talk –L2ProcessingUnit (L2PU): Since the physics algorithms for event selection are not finalized, only time to fetch fragments from ROS will be compared to computation budget. –Networking: Discrete event simulation tool will be used to scale from prototype setup up to final ATLAS size.

G ö khan Ü nel / CHEP Interlaken ATLAS 4 EB / L2 Setups EB: up to 16SFIs Up to 24 ROSs L2: up to 14L2PUs up to 6 L2SVs up to 8 ROSs FastIron – 64 ports T6 – 31 ports Few FAST ROS

G ö khan Ü nel / CHEP Interlaken ATLAS 5 EventBuilding Rate Solid lines: ROS=2GHz Dashed line: ROS=3GHz 8.55 kHzx12.4k=106MB/s  ROS cpu limit Small & Large systems have the same max EB rate  no penalty as event size grows Can run 24 ROS vs 16 SFI EB system stably Faster ROS does a better job (we hit the io limit) 110MB/s per SFI  NIC limit ROS : 12 emulated input channels, 1kB /channel SFI : No output to EF More ROS = Bigger Events ! 9.66 kHzx12.4 k = 120MB/s  ROS NIC limit

G ö khan Ü nel / CHEP Interlaken ATLAS 6 Scaling in EB throughput EB throughput scales linearly with Nb of SFIs No show-stoppers Possible to estimate the rate of any EB system in the prototype setup

G ö khan Ü nel / CHEP Interlaken ATLAS 7 Determining Number of SFIs Requirement: kHz of EB for % bandwidth usage per SFI 60% bw 90% bw Typical ATLAS event size At typical event size of 1.5 Mb, 60 SFIs (2.4 GHz SMP) are enough Output to EF + extra SFIs for safety margin should be considered  100 SFIs (2.4 GHz SMP) would easily handle kHz 1.5-2MB events

G ö khan Ü nel / CHEP Interlaken ATLAS 8 ROS cpu limited Level2 Rate dummy algorithms in L2PUs 6 concurrent ROI collection per L2PU Linear scaling when ROS is not the limiting factor

G ö khan Ü nel / CHEP Interlaken ATLAS 9 L2 Time budget If 500 L2PU 3 GHz SMP is used –10 ms /event at 100 kHz L1 rate for L2 decision –Worst case of 16 ROLs all from different ROS < 0.8ms Requirement: 10 ms event for L2 decision, ROI fetch time << 10ms Longest ROI fetch: ROL

G ö khan Ü nel / CHEP Interlaken ATLAS 10 Foundry EI Foundry FastIron 800 SFI(O) SFI01 ROS19L2P01 L2P14 ….. L2SV06 … L2SV01 pROSDFM ROS01 ROS18 … … ROS24 … … Combined setups: EB + L2 BATM T6

G ö khan Ü nel / CHEP Interlaken ATLAS 11 Small system:3ROS x 2SFI x..12 L2PU Since the Max rates for EB and L2 are known,  Use the plateau region to calculate the ROS cpu utilization for “clear” task Plateau: ROS cpu limit

G ö khan Ü nel / CHEP Interlaken ATLAS 12 Analysis for ROS cpu CPU= R EB × CPU EB + R L2 × CPU L2 + R L1 × CPU Cl  CPU EB is the CPU power spend by the ROS on 1 kHz of Event Building  CPU L2 is the CPU power spend by the ROS on 1 kHz of Level 2 ROI  CPU Cl is the CPU power spend by the ROS on 1 kHz of Event Clears Requirement: 100 kHz L1, 20 kHz L2, kHz EB + including clears** using 2 NICs simultaneously 2GHz ROS needs: 20x x x0.0074= 2.6 > 2.0  3GHz ROS needs: 20x x x0.0083= 2.55 < 3.06 

G ö khan Ü nel / CHEP Interlaken ATLAS 13 Combined system Largest possible system using 2GHz ROS 18ROS x 16SFI x 12 L2PU runs stably

G ö khan Ü nel / CHEP Interlaken ATLAS 14 Meeting requirements with 3 GHz ROS Good agreement between data and simulation 3 GHz ROS can do 20 kHz L2 & 3 kHz EB at 100 kHz L1 EB=3 kHz, acc=3% L2 = 20kHz L1=100 kHz

G ö khan Ü nel / CHEP Interlaken ATLAS 15 Final system Simulation ROS x 110SFI x N L2PU Using concentrating switches for PUs (6  1) Realistic Trigger Menu & ROI distribution 75 kHz 95 kHz

G ö khan Ü nel / CHEP Interlaken ATLAS 16 Final system Simulation -2 at2sim: 127ROS, 110 SFIs, 504 L2PUs with concentrator switches time (s) Final size system runs smoothly with fast ROSs (3.06GHz) L1 rate (kHz) EB latency (ms) # events in L2 Slowest ROS Q

G ö khan Ü nel / CHEP Interlaken ATLAS 17 Conclusions - I 3GHz ROS can do 3kHz EB & 20kHz L2 –we need ~140 such nodes Dual 2.4 GHz SFI can do 3kHz EB at 60% of line-speed –We need ~100 such nodes Dual 3GHz L2PU can do ROI collection better than 8% of its time budget –We need ~500 such nodes The largest test system was 18x16x12 –No scalability/functionality problems observed

G ö khan Ü nel / CHEP Interlaken ATLAS 18 at2sim of the final setup:160x100x..500 –Scaling from 20% to 100%: no surprises, no queues, no anomalies Network: we can handle extreme traffic caused by ultra-fast L2 PUs without algorithms Prototype L2PUs 12.5 kHz, ~25 times faster then in the final system Conclusions - II

G ö khan Ü nel / CHEP Interlaken ATLAS 19 Next Steps Test: Prototype custom hardware with 2 input channels Preseries: 10 % setup down in the ATLAS cavern –A bigger switch (128 ports) will be bought –Merge with existing prototype setup –Time scale: Q2 / 2005 Networking aspects: scalability & performance –Separate test bed –Dedicated hardware any Frame-size) –Stress testing candidate switches

G ö khan Ü nel / CHEP Interlaken ATLAS 20 Backup slides

G ö khan Ü nel / CHEP Interlaken ATLAS 21 Hardware inventory –Networking 1 EB switch:Foundry FastIron 800 – 62 Ports 1 L2 switch:BATM T6 – 31 Ports 1 X-over switch:Foundry EdgeIron – 10 Ports –PCs (intel Xeon, 64bit/66MHz PCI) 31 Tower Uni-proc. (2.0 GHz) –25 used as ROS for scaling studies –06 used as L2SVs –01 used as DFM 16 Tower Dual-proc. (3.06 GHz) –Used as L2PUs –5 used as ROS for performance studies 16 rack mountable Dual proc. (2.4 GHz) –Used as SFIs

G ö khan Ü nel / CHEP Interlaken ATLAS 22 EFD setup DFM EFD1 ROS1 SFI ROS2 EFD2 EFD15

G ö khan Ü nel / CHEP Interlaken ATLAS 23 EFD Studies 40% performance loss No EF output Single SFI: small events, WORST case.

G ö khan Ü nel / CHEP Interlaken ATLAS 24 DFM & L2SV performance

G ö khan Ü nel / CHEP Interlaken ATLAS 25 ROS input emulation vs Prototype Hardware Data Emulation