Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP.

Similar presentations


Presentation on theme: "1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP."— Presentation transcript:

1 1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP

2 Niko NEUFELD CERN, EP 2 Overview FEM/RU complex –How many? Readout network –How big?, Which Components? Level 2/3 farm & Sub-farm Controllers –How fast?, How many? Conclusions

3 Niko NEUFELD CERN, EP 3 Acronyms Readout Unit – RU Readout Network – RN Level 1 – L1 Front-end Multiplexer – FEM Gigabit Ethernet – GigE More non-sense Acronyms - MNSA Sub-farm Controller – SFC Network Processor – NP SpecInt95 SI95 – benchmark obtained from a standard test-suite of applications normalised to the performance of a SPARCstation 10/40 (40MHz SuperSPARC) this machine takes 48h to run the suite

4 Niko NEUFELD CERN, EP 4 DAQ Architecture

5 Niko NEUFELD CERN, EP 5 Basic Parameters & Features: Recap L1 trigger rate 40 kHz upgradeable to 100 kHz Total raw data size from all L1 boards approximately 4 GB/s Asynchronous Push-through Strict separation between control and data paths Overflow avoidance via central throttling

6 Niko NEUFELD CERN, EP 6 FEM/RU Complex Read-out Network (RN) RU 6-15 GB/s 50 MB/s Variable latency L2 ~10 ms L3 ~200 ms Control & Monitoring LAN Read-out units (RU) Timing & Fast Control Level-0 Front-End Electronics Level-1 VELO TRACK ECAL HCAL MUON RICH LHCb Detector L0 L1 Level 0 Trigger Level 1 Trigger 40 MHz 1 MHz 40-100 kHz Fixed latency 4.0  s Variable latency <2 ms Data rates 40 TB/s 1 TB/s 1 MHz Front End Links Trigger Level 2 & 3 Event Filter SFC CPU Sub-Farm Controllers (SFC) Storage Throttle Front-EndMultiplexers(FEM)

7 Niko NEUFELD CERN, EP 7 The FEM/RU complex Transports data from L1 links to the Readout Network Performs some multiplexing Assigns destinations (SFC) Is segmented according to the partitioning scheme of LHCb All number shown in this section are for a system which puts 40 MB/s onto the output of an RU

8 Niko NEUFELD CERN, EP 8 “Generic” Sub-detector: From L1 Links to the FEM/RU 1 … 108 L1 board out links 0…70…7 FEM 0…70…70…70…7 0 … 25 total 373 links from L1 front-end

9 Niko NEUFELD CERN, EP 9 “Generic” Sub-detector: From FEM/L1 to the Readout Unit 1 … 60 L1/FEM out links RU 1 … 30 GigE Link to RN total ~95 links to RN GigE Link to RN 1…71…71…71…71…71…7

10 Niko NEUFELD CERN, EP 10 Sub-Detector FEM/RU System A “generic sub-”detector” from the Dataflow-system’s point of view is one of: VELO, IT, OT, RICH1, RICH2, SPD/PS, ECAL, HCAL, MUON, L0-Trigger, L1-Trigger, Readout Supervisor A “generic sub-”detector” has –1…108 output links from Level 1 –0…25 Front-end Multiplexers with a multiplexing factor between 2 and 4 –1…30 Readout Units, with a multiplexing factor between 1 and 7

11 Niko NEUFELD CERN, EP 11 Building the FEM/RU System Baseline for the building block is a Network Processor based module, with 4 or 8 Gigabit Ethernet ports (each module consists of 1 or 2 mezzanine cards with 1 NP and 4 GigE ports and 1 carrier board) The multiplexing factor is thus between 1 and 7 System design proceeds by 1.fixing the output bandwidth from a RU 2.optimising the number of 4-port carrier cards 3.taking into account partitioning (2 partitions must not share a RU)

12 Niko NEUFELD CERN, EP 12 FEM/RU System for 40 MB/s Output Bandwidth

13 Niko NEUFELD CERN, EP 13 FEM/RU System for 60 MB/s Output Bandwidth

14 Niko NEUFELD CERN, EP 14 RU/FEM System Summary 373 L1 boards give a total of average data rate of 2835 MB/s at 40 kHz –average event size 71 kB Fixing the average output bandwidth on the RU link to 40(60) MB/s results in 95(67) output links to the Readout Network This number takes into account – partitioning at the level of sub-detectors –multiplexing factors up to 7 –minimisation of NP carrying mezzanine cards, i.e. cost (228/183 in total)

15 Niko NEUFELD CERN, EP 15 Readout Network Read-out Network (RN) 6-15 GB/s 50 MB/s Variable latency L2 ~10 ms L3 ~200 ms Control & Monitoring Trigger Level 2 & 3 Event Filter SFC CPU Sub-Farm Controllers (SFC) Storage Throttle

16 Niko NEUFELD CERN, EP 16 The Readout Network Must connect ~95 RUs to ~100 SFCs Consists of point-to-point GigE links Uses a custom light-weight connection-less protocol on top of raw Ethernet frames Is asynchronous and relies on back-pressure (via flow-control) to avoid buffer-overflows Must be able to perform non-blocking switching at least up to O(10) GB/s  Is from the RU/SFC point of view just a ~ 100 x 100 port Gigabit Ethernet Switch

17 Niko NEUFELD CERN, EP 17 Building a large GigE Switch Monolithic switches of this size are still not very common and very expensive (but they do exist, e.g. from ALCATEL and CISCO) Medium size commercial switch (e. g. Foundry FastIron) with 120 GigE ports Small switches like our standard NP based module (8 GigE ports) Possibly future custom modules based on next generation NPs (up to 20 ports) Any building block has to fulfil the basic requirements: non- blocking, flow-control, full line-speed: –For our NP based module we know that it complies –For commercial switches this must be / has been tested If the requirements are met, the only criterion is the cost per usable port

18 Niko NEUFELD CERN, EP 18 Topology of the Switching Network All numbers in the following are based on a Banyan (= a fully connected, equal-size layer) network topology, assuming a maximum load of 40 MB/s on each output from a RU We have seen (J-P. Dufey’s presentation), that one can do better (taking into account the uni-directional data-flow)

19 Niko NEUFELD CERN, EP 19 Evolution of relative costs

20 Niko NEUFELD CERN, EP 20 Number of elementary switching elements needed to go from 4 to 12 GB/s 200 modules 4 switches

21 Niko NEUFELD CERN, EP 21 Cost of usable port For a Banyan 96 x 96 port system at 40 MB/s load –using Foundry BigIron: 4 switches needed  120 x 120 usable ports: 2200$ (copper) or 2500$ (SX fibre) per port (list-price) –using 4 x 4 NP based module: 92 modules needed  92 x 92 usable ports: ~ 2400$ (SX fibre) per port (estimate) Note however: Big Iron does not seem to fulfil our requirements (flow control!) NP based system can do final event-building (see later)

22 Niko NEUFELD CERN, EP 22 Readout Network Summary Fairly large ~ 100 x 100 Gigabit Ethernet Switch – most likely not monolithic Need non-blocking, line-speed switching, flow- control and reasonably large buffers Optical connectors very much preferred ( but price!) NP based modules fulfil all requirements Optimised topology saves in switch ports Ultimate decision will evidently be based on price per usable port (provided other requirements are met)

23 Niko NEUFELD CERN, EP 23 L2/L3 Farm Read-out Network (RN) RU 6-15 GB/s 50 MB/s Variable latency L2 ~10 ms L3 ~200 ms Control & Monitoring LAN Read-out units (RU) Timing & Fast Control Level-0 Front-End Electronics Level-1 VELO TRACK ECAL HCAL MUON RICH LHCb Detector L0 L1 Level 0 Trigger Level 1 Trigger 40 MHz 1 MHz 40-100 kHz Fixed latency 4.0  s Variable latency <2 ms Data rates 40 TB/s 1 TB/s 1 MHz Front End Links Trigger Level 2 & 3 Event Filter SFC CPU Sub - Farm Controllers (SFC) Storage Throttle Front - EndMultiplexers(FEM)

24 Niko NEUFELD CERN, EP 24 Event building and Level 2 & 3 Farm Data from the RN are delivered to a specific part of the Level 2 & 3 farm The entry point towards the RN is the Subfarm Controller (SFC) (RUs know only about SFCs) The SFC is also the gate-way to the Storage Controller(s) (SC). Immediately before or after an event enters the SFC, the final event building must be performed Load on the farm must be balanced The farm nodes must be controlled

25 Niko NEUFELD CERN, EP 25 Bird’s eye view of a sub-farm Storage Controller 10–20 Subfarm nodes Subfarm Controller ControlsPC Readout Network Main Switch Controls Network Aggregation Switch Subfarm Aggregation Switch Up-link to CERN Up-link to Controls Network

26 Niko NEUFELD CERN, EP 26 Anatomy of a SFC “Server-like” PC CPU Memory GigE NIC Local Bus PCI/Infinibus GigE NIC 100BaseT NIC Readout Network Subfarm Network Controls Network Local Bridge ~60 MB/s ~0.5 MB/s ~60 MB/s ~0.5 MB/s This NIC could do the final event building Large buffer for load balancing Not critical if Event Building done else-where Local Bridge 66/64b 33/32b A server like this can be bought today for ~5 kCHF

27 Niko NEUFELD CERN, EP 27 Subfarm Node is disk-less, network-booted needs 2 network interfaces for controls and data needs remote reset facility needs lots of memory and CPU power must be “cheap” in terms of: –price per MIPS –floor-space –cooling, power, maintenance possible physical realizations include: –rack-mounted (1U) servers –standard boxes, “pizza-boxes” –“naked motherboards” on a carrier board crate based –micro-server blades –etc.

28 Niko NEUFELD CERN, EP 28 Moore’s Law 608 SpecInt2000 (roughly ~ 60 SpecInt95) Complete system (standard box) ~ 2400 CHF today!

29 Niko NEUFELD CERN, EP 29 Further Components of the L2/L3 Farm Storage Controller: 1 or more multi T-Byte disk servers with connection to the high band-width link to the permanent storage facility Controls and Sub-farm aggregation switches: Edge switches with typically 2 1000BaseT up-links and ~20 100BaseT links (these are already almost commodity items) Controls PC: Server PC to control an entire sub- farm – will run standard ECS/SCADA system. (if needed for performance reasons several Control PCs can share the control of a sub-farm)

30 Niko NEUFELD CERN, EP 30 Final Event Building Concatenation of fragments from RUs to one event: Using the SFC CPU (sorting & memory copy) Using “smart” = programmable NICs (event-building done during DMA) Using a final stage of NP based modules as 4 to 4 event-builders NP-based 8 portmodule SFC CPU

31 Niko NEUFELD CERN, EP 31 Size of L2/L3 farm Assuming 10000 SI95 for L2, 25000 SI95 for L3 and 50000 for Reconstruction for results in ~850 SI95 units per sub-farm Assuming 55 SI95 for a farm node  20 nodes per sub-farm (including a comfortable safety margin)  100 SFCs (500 kCHF), 100 edge switches (300 kCHF), 2000 farm-nodes (4000 kCHF) = 4.8 MCHF (total cost of farm) (TODAY!!!) These numbers are approximate and the demand for CPU will perhaps be higher but a high performing farm could be built today at reasonable cost

32 Niko NEUFELD CERN, EP 32 L2/L3 Farm Summary The L2/L3 farm is composed of sub-farms It maintains the separation between control and data network It consists of ~100 SFCs, strong in I/O, and ~O(2000) nodes, strong in CPU/memory and an aggregation switch per sub-farm It is scalable, hierarchically organised, uniform, hence easy to configure, control and monitor

33 Niko NEUFELD CERN, EP 33 Conclusions (1) The data flow system is based on Gigabit Ethernet, most likely over cheap multi-mode fibres (1000BaseSX) The data flow system consists of 3 main parts: 1.FEM/RU complex, which consists of NP based modules and multiplexes several L1 links to 1 output link from a Readout Unit 2.A Gigabit Ethernet Switch Fabric, most likely composed of several smaller sub-units 3.A large compute farm, decomposed into sub-farms, load-balanced by Sub-farm Controllers

34 Niko NEUFELD CERN, EP 34 Conclusions (2) FEM/RU complex will be built of NP-based modules, whose performance has been established to be largely sufficient The main switching network will be built in an optimised topology, using either commercial switches or NP-based modules, depending on cost and performance The Subfarm will be implemented from server PCs as Subfarm-Controllers and PC like farm-nodes, connected by moderate sized edge switches. All these components exist at reasonable prices already today


Download ppt "1 DAQ System Realization DAQ Data Flow Review Sep. 11-12 th, 2001 Niko Neufeld CERN, EP."

Similar presentations


Ads by Google