Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf.

Similar presentations


Presentation on theme: "Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf."— Presentation transcript:

1 Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf of the ATLAS Trigger/DAQ DataFlow, CHEP 2003 conference Performance problems found and solved: STL containers thread scheduling other

2 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"2 ATLAS DataFlow software Flow of data in the ATLAS DAQ system –Data to LVL2 (part of event), to EF (whole event), to mass storage. –See talks by Giovanna Lehman (overview of DataFlow) and by Stefan Stancu (networking). PCs, standard Linux, applications written in C++ (so far using only gcc to compile), standard network technology (Gb ethernet). “Soft” real time system, no guaranteed response time. The average response time is what matters. Common tasks (exchanging messages, state machine, access configuration db, reporting errors, …) using a framework (well, actually two…).

3 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"3 ATLAS Data Flow software (2) State of the project: –development done mostly in 2001-2002, –measurements for Technical Design Report – performance, –preparation for beam test support – stability, robustness and deployment. 7 kinds of applications (+3 kinds of controllers) Always several threads (independent processes within one application without their own resources). Roles, challenges and use of threads very different. In this short talk only a few examples –use of threads, problems, solutions.

4 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"4 Testbed at CERN 1U PCs >= 2 GHz 4U PCs >= 2 GHz FPGA Traffic generators

5 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"5 LVL2 processing unit (L2PU) - role Multiplicties are indicative only L2PU L2SV DataFlow application Interface with control software 1600x 10x Up to 500x MassStorage pROS 1x ROB 140x Detector data! ROS L1 + RoI data data request (RoI only) data Open choice. detailed LVL2 result LVL2 decision gets LVL1 decision asks for data gets it makes LVL2 decision sends it sends detailed result

6 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"6 L2PU design Worker Thread Worker Thread Worker Thread Input Thread RoI Data Requests RoI Data L2SV ROS‘s LVL2 Decision L2PU LVL1 Result Worker Thread pROS LVL2 Result Assemble RoI Data Add to Event Queue Get next Event from Queue If Accept send Result Run LVL2 Selection code Send Decision Request data + wait RoI Data Continue Selection code If complete restart Worker

7 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"7 Sub-farm Interface (SFI) - role Multiplicties are indicative only SFI DataFlow application EF Interface with control 50x 140x 1x MassStorage LVL2 accepts and rejects complete event DFM ROS data request clear assign EoE gets event id (L2 accept) asks for all event data gets it builds complete event buffers it sends it to Event Filter request

8 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"8 Assembly Thread SFI Design Input Thread Request Thread Event Handler Data Requests Event Data Event Assigns Ø Different threads for requesting and receiving data Ø Threads for assembly and for sending to Event Handler DFM EB Rate/SFI ~50 Hz End of Event SFI Reask Fragment IDs Assigns ROSFragments Events EF Full Event ROS

9 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"9 Lesson with L2PU and SFI – STL containers # threads time blocked! With no apparent dependence between threads in code, it was observed that threads were not running independently. No effect from more threads. VisualThreads, using instrumented pthread library: –STL containers use a memory pool, by default one per executable. There is a lock, threads may block each other.

10 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"10 Lesson with L2PU and SFI – STL containers (2) The solution is to use pthread allocator. Independent memory pools for each thread, no lock, no blocking. Use for all containers used at event rate. Careful with creating objects in one thread and deleting in another. blocked less often # threads

11 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"11 SFI History DateChangeEBEB + Output to EF 30 Oct `02First integration on testbed0.5 MB/s- 13 NovSending data requests at a regular pace8.0 MB/s- 14 NovReduce the number of threads15 MB/s- 20 NovSwitch off hyper-threading17 MB/s- 21 NovIntroduce credit based traffic shaping28 MB/s- 13 DecFirst try on throughput-14 MB/s 17 JanChose pthread allocator for STL object53 MB/s18 MB/s 29 JanDC Buffer recycling when sending56 MB/s19 MB/s 05 FebIOVec storage type in the EFormat library58 MB/s46 MB/s 21 FebBuffer pool per thread64 MB/s48 MB/s 21 FebGrouping interthread communication73 MB/s51 MB/s 26 FebAvoiding one system call per message80 MB/s55 MB/s threads Most improvements (and most problems) are related to threads.

12 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"12 Lessons from SFI Traffic shaping (limiting the number of outstanding requests for data) eliminates packet loss. Grouping interthread communication – decrease frequency of thread activation. Some improvements in more predictable areas: avoiding copies and system calls, avoiding creations by recycling buffers, avoiding contention, each thread has its own buffers.  Optimizations driven by measurements with full functionality. Effective development: developer works on a good testbed, tests and optimizes, short cycle.

13 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"13 Performance of the SFI Reaching I/O limit at 95 MB/s otherwise CPU limited 35% performance gain with at least 8 ROLs/ROS Will approach I/O limit for 1 ROL/ROS with faster CPU 95 MB/s – IO limited #ROLs/ROS EB only Throughput CPU limited (2.4 GHz CPU)

14 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"14 Readout System (ROS) - role ROBin I/O Manager ~12 bufers for data ROS controller LVL2 or EB Data request request data ROI collection and partial event building. Not exactly like SFI: ROSSFI Request Rate 24 kHz L2 3 kHz EB 50 Hz Data per req. 2 kB LVL2 8 kB EB 1.5 MB Data rate 72 MB/s75 MB/s All numbers approximate.

15 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"15 IOManager in ROS = Thread = Process = Linux Scheduler Requests (L2, EB, Delete) Request Queue RobIns Request Handlers Control, error Trigger The number of request handlers is configurable

16 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"16 System without interrupt. Poll and yield. Standard linux scheduler puts the thread away until next time slice. Up to 10 ms. Solution is to change scheduling in kernel For 2.4.9 kernels there exists an unofficial patch (tested on CERN RH7.2) For CERN RH7.3 there is a CERN-certified patch linux_2.4.18_18_sched.yield.patch 20  s latency for getting data This is and evolving field, need to continue evaluating thread-related changes of Linux kernels. Thread scheduling problem

17 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"17 Conclusions The DataFlow of ATLAS DAQ has a set of applications managing the flow of data. All prototypes exist, have been optimized, are used for performance measurements and are prepared for Beam Test. Standard technology (Gb ethernet, PCs, standard Linux, C++ with gcc, multi- threaded) meets ATLAS requirements. A few lessons were learned.

18 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"18 Backup slides

19 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"19 Data Flow Manager (DFM) - role Multiplicties are indicative only L2SV SFI DataFlow application EF I/F with OnlineSW 100x 200x 16x 1x MassStorage SFO 30x Disk files LVL2 accepts and rejects data DFM ROS data request clear assign EoE

20 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"20 DFM Design ØBulk of work done in I/O thread ØCleanup thread identifies timed out events ØFully embedded in the DC framework Threads allow for independent and parallel processing within an application DFM I/O Thread Cleanup Thread Load Balancing Bookkeeping L2 Desicions EndOfEvent SFI Assigns Timeouts L2SV L2 Decisions EventAssigns EndOf Event Clears ROS I/O Rate ~4 kHz SFI

21 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"21 STL containers (3)

22 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"22 SFI performance Input up to 95 Mb/s (~3/4 of the 1 Gb line) Input and output at 55 Mb/s (~1/2 line speed) With all the logic of EventBuilding and all the objects involved, the performance is already close to the network limit (on a 2.4 GHz PC).

23 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"23 Performance of Event Building max EB rate with 8 SFIs ~ 350Hz (17% of ATLAS EB rate) N SFIs 1 DFM hardware emulators of ROS

24 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"24 After the patch Xeon/2GHz - Linux 2.4.18+CERN scheduling patch 0 50 100 150 200 010203040 # request handlers L2 request rate (kHz) latency = 2 usecs latency = 5 usecs latency = 10 usecs latency = 20 usecs latency = 50 usecs latency = 100 usecs latency = 1000 usecs 100% L2Requests 1 ROL per L2 request release grouping = 100 Simulated I/O latency

25 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"25 Flow of messages 5a: DFM_Decision SFI ROS/ROB DFM 5b: SFI_EoE SFI_FlowControl Note 6a: SFI_DataRequest associated with 5a: DFM_Decision used for error recovery. 1..n L2PU L2SV 2a: L2PU_Data Request p ROS 3a: L2PU_LVL2Result 4a: L2SV_LVL2 Decision 2b: ROS/ROB_Fragment 1..i DFM_FlowControl Build event 3b: pROS_Ack wait EoE reassign 1..n 1a: L2SV_LVL1Result RoIB EF wait LVL2 decision or time out 1..i 4b: DFM_Ack 1b: L2PU_LVL2Decision 1..n 6b: ROS/ROB_EventFragment 5a': DFM_SFIAssign 6a: SFI_DataRequest 7: DFM_Clear receive or timeout sequential processing or time out time-out event or time out

26 CHEP, March 03S.Gadomski, "Experience with multi-threaded C++ in ATLAS DataFlow"26 LVL2 Processors DFMs Local EF Farms SFIs To Remote EF Farm LVL2 Supervisors RoIB SV Switch DFM Switch SubFarm Switch SubFarm Switch EF Switch RO{B,S} EB Switch LVL2 Switch RO{B,S} RO{B/S} RODs Deployment view


Download ppt "Experience with multi-threaded C++ applications in the ATLAS DataFlow Szymon Gadomski University of Bern, Switzerland and INP Cracow, Poland on behalf."

Similar presentations


Ads by Google