G ö khan Ü nel / CHEP Interlaken ATLAS 1 Performance of the ATLAS DAQ DataFlow system Introduction/Generalities –Presentation of the ATLAS DAQ components Functionality & Performance Measurements –Prototype Setup –Event Building, ROI collection, Combined systems –At2sim: discrete data Simulation Conclusions –From Prototype setup & simulations Outlook N Gökhan Ünel on behalf of the ATLAS TDAQ Group
G ö khan Ü nel / CHEP Interlaken ATLAS 2 Generalities : ATLAS DAQ Level1(L1) rate: 75 kHz min, upgradeable to 100 kHz Level2(L2) rate per ROS : 20 kHz ; L2 time budget per event: 10 ms EventBuilding(EB) rate : kHz for 1.5 2 MByte events Recording rate: 200 Hz for 1.5 2 MByte events SFIL2PU L2SV DFM pROS ROS ROI data (100kHz) Event data (100kHz) L2 decision To EventFilter (3kHz) ROI data Event Clear Assign event Request data L2 details L2 decision End of event
G ö khan Ü nel / CHEP Interlaken ATLAS 3 Matching requirements –DataFlowManager(DFM), L2SuperVisor(L2SV): previous work (TDR) has shown currently available hardware can match the requirements. –ReadOutSystem(ROS), SubFarmInput (SFI): Latest studies will be presented in this talk –L2ProcessingUnit (L2PU): Since the physics algorithms for event selection are not finalized, only time to fetch fragments from ROS will be compared to computation budget. –Networking: Discrete event simulation tool will be used to scale from prototype setup up to final ATLAS size.
G ö khan Ü nel / CHEP Interlaken ATLAS 4 EB / L2 Setups EB: up to 16SFIs Up to 24 ROSs L2: up to 14L2PUs up to 6 L2SVs up to 8 ROSs FastIron – 64 ports T6 – 31 ports Few FAST ROS
G ö khan Ü nel / CHEP Interlaken ATLAS 5 EventBuilding Rate Solid lines: ROS=2GHz Dashed line: ROS=3GHz 8.55 kHzx12.4k=106MB/s ROS cpu limit Small & Large systems have the same max EB rate no penalty as event size grows Can run 24 ROS vs 16 SFI EB system stably Faster ROS does a better job (we hit the io limit) 110MB/s per SFI NIC limit ROS : 12 emulated input channels, 1kB /channel SFI : No output to EF More ROS = Bigger Events ! 9.66 kHzx12.4 k = 120MB/s ROS NIC limit
G ö khan Ü nel / CHEP Interlaken ATLAS 6 Scaling in EB throughput EB throughput scales linearly with Nb of SFIs No show-stoppers Possible to estimate the rate of any EB system in the prototype setup
G ö khan Ü nel / CHEP Interlaken ATLAS 7 Determining Number of SFIs Requirement: kHz of EB for % bandwidth usage per SFI 60% bw 90% bw Typical ATLAS event size At typical event size of 1.5 Mb, 60 SFIs (2.4 GHz SMP) are enough Output to EF + extra SFIs for safety margin should be considered 100 SFIs (2.4 GHz SMP) would easily handle kHz 1.5-2MB events
G ö khan Ü nel / CHEP Interlaken ATLAS 8 ROS cpu limited Level2 Rate dummy algorithms in L2PUs 6 concurrent ROI collection per L2PU Linear scaling when ROS is not the limiting factor
G ö khan Ü nel / CHEP Interlaken ATLAS 9 L2 Time budget If 500 L2PU 3 GHz SMP is used –10 ms /event at 100 kHz L1 rate for L2 decision –Worst case of 16 ROLs all from different ROS < 0.8ms Requirement: 10 ms event for L2 decision, ROI fetch time << 10ms Longest ROI fetch: ROL
G ö khan Ü nel / CHEP Interlaken ATLAS 10 Foundry EI Foundry FastIron 800 SFI(O) SFI01 ROS19L2P01 L2P14 ….. L2SV06 … L2SV01 pROSDFM ROS01 ROS18 … … ROS24 … … Combined setups: EB + L2 BATM T6
G ö khan Ü nel / CHEP Interlaken ATLAS 11 Small system:3ROS x 2SFI x..12 L2PU Since the Max rates for EB and L2 are known, Use the plateau region to calculate the ROS cpu utilization for “clear” task Plateau: ROS cpu limit
G ö khan Ü nel / CHEP Interlaken ATLAS 12 Analysis for ROS cpu CPU= R EB × CPU EB + R L2 × CPU L2 + R L1 × CPU Cl CPU EB is the CPU power spend by the ROS on 1 kHz of Event Building CPU L2 is the CPU power spend by the ROS on 1 kHz of Level 2 ROI CPU Cl is the CPU power spend by the ROS on 1 kHz of Event Clears Requirement: 100 kHz L1, 20 kHz L2, kHz EB + including clears** using 2 NICs simultaneously 2GHz ROS needs: 20x x x0.0074= 2.6 > 2.0 3GHz ROS needs: 20x x x0.0083= 2.55 < 3.06
G ö khan Ü nel / CHEP Interlaken ATLAS 13 Combined system Largest possible system using 2GHz ROS 18ROS x 16SFI x 12 L2PU runs stably
G ö khan Ü nel / CHEP Interlaken ATLAS 14 Meeting requirements with 3 GHz ROS Good agreement between data and simulation 3 GHz ROS can do 20 kHz L2 & 3 kHz EB at 100 kHz L1 EB=3 kHz, acc=3% L2 = 20kHz L1=100 kHz
G ö khan Ü nel / CHEP Interlaken ATLAS 15 Final system Simulation ROS x 110SFI x N L2PU Using concentrating switches for PUs (6 1) Realistic Trigger Menu & ROI distribution 75 kHz 95 kHz
G ö khan Ü nel / CHEP Interlaken ATLAS 16 Final system Simulation -2 at2sim: 127ROS, 110 SFIs, 504 L2PUs with concentrator switches time (s) Final size system runs smoothly with fast ROSs (3.06GHz) L1 rate (kHz) EB latency (ms) # events in L2 Slowest ROS Q
G ö khan Ü nel / CHEP Interlaken ATLAS 17 Conclusions - I 3GHz ROS can do 3kHz EB & 20kHz L2 –we need ~140 such nodes Dual 2.4 GHz SFI can do 3kHz EB at 60% of line-speed –We need ~100 such nodes Dual 3GHz L2PU can do ROI collection better than 8% of its time budget –We need ~500 such nodes The largest test system was 18x16x12 –No scalability/functionality problems observed
G ö khan Ü nel / CHEP Interlaken ATLAS 18 at2sim of the final setup:160x100x..500 –Scaling from 20% to 100%: no surprises, no queues, no anomalies Network: we can handle extreme traffic caused by ultra-fast L2 PUs without algorithms Prototype L2PUs 12.5 kHz, ~25 times faster then in the final system Conclusions - II
G ö khan Ü nel / CHEP Interlaken ATLAS 19 Next Steps Test: Prototype custom hardware with 2 input channels Preseries: 10 % setup down in the ATLAS cavern –A bigger switch (128 ports) will be bought –Merge with existing prototype setup –Time scale: Q2 / 2005 Networking aspects: scalability & performance –Separate test bed –Dedicated hardware any Frame-size) –Stress testing candidate switches
G ö khan Ü nel / CHEP Interlaken ATLAS 20 Backup slides
G ö khan Ü nel / CHEP Interlaken ATLAS 21 Hardware inventory –Networking 1 EB switch:Foundry FastIron 800 – 62 Ports 1 L2 switch:BATM T6 – 31 Ports 1 X-over switch:Foundry EdgeIron – 10 Ports –PCs (intel Xeon, 64bit/66MHz PCI) 31 Tower Uni-proc. (2.0 GHz) –25 used as ROS for scaling studies –06 used as L2SVs –01 used as DFM 16 Tower Dual-proc. (3.06 GHz) –Used as L2PUs –5 used as ROS for performance studies 16 rack mountable Dual proc. (2.4 GHz) –Used as SFIs
G ö khan Ü nel / CHEP Interlaken ATLAS 22 EFD setup DFM EFD1 ROS1 SFI ROS2 EFD2 EFD15
G ö khan Ü nel / CHEP Interlaken ATLAS 23 EFD Studies 40% performance loss No EF output Single SFI: small events, WORST case.
G ö khan Ü nel / CHEP Interlaken ATLAS 24 DFM & L2SV performance
G ö khan Ü nel / CHEP Interlaken ATLAS 25 ROS input emulation vs Prototype Hardware Data Emulation