Presentation is loading. Please wait.

Presentation is loading. Please wait.

Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Similar presentations


Presentation on theme: "Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy."— Presentation transcript:

1 Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy

2 2 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network Configuration  Network Debugging  Status of LHCb network installation and deployment

3 3 LHCb online system LHCb Online system consists of three major components:  Data Acquisition (DAQ)  transfers the event data from the detector front-end electronics to the permanent storage  Timing and Fast Control (TFC)  drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm  Experiment Control System (ECS),  controls and monitors all parts of the experiment: the DAQ System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.

4 4 LHCb online system

5 5 LHCb online network  Two large scale Ethernet networks:  DAQ network Dedicated to data acquisition  Control network For the instruments and computers in LHCb experiment In total:  ~170 switches  ~9000 ports

6 6 LHCb DAQ network  DAQ works in a push mode  Components:  Readout board: TELL1/UKL1 In total: ~330  Aggregation switches  Core DAQ switch: Force10 E1200i Supports up to 1260 GbE ports Switch capacity: 3.5Tb/s  Edge switches Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTOR Aggregation Switches

7 7 LHCb DAQ network Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTOR Aggregation Switches  Protocols  Readout: MEP light-weight datagram protocol over IP  Storage: standard TCP/IP  Network throughputs  Read out: ~35 GByte/s L0 trigger accept rate: 1 MHz Avg. event size: ~ 35 kByte  Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz ~280 Gb/s ~560 Mb/s

8 8 Network Monitoring  Part of the LHCb ECS  Uses the same tool and framework  Provides the same operation interface  Implementation  Monitoring and integration: PVSS and JCOP  Data collection: Varied front-end processors  Data exchange: Distributed Information Management (DIM)

9 9 Architecture of the Network Monitoring Network Monitoring  Monitoring the status of the LHCb DAQ network at different levels  Topology  IP routing  Traffic  Hardware/system

10 10 Network Monitoring  Monitoring the status of the LHCb DAQ network at different levels  Topology  IP routing  Traffic  Hardware/system Structure of the Finite State Machine for Network Monitoring

11 11 Network Monitoring: Topology  The topology is quite “static”  NeDi: an open source tool to discover the network  Discovery of the network topology based on Link Layer Discovery Protocol (LLDP) Queries the neighbors of the seed, and then the neighbors of those neighbors, and so on until all the devices have been discovered in the network.  Discovery of the network nodes  All information is stored in the database, and can be queried by PVSS  PVSS Monitors the topology only (the uplinks between the switches). The nodes are monitored by Nagios.

12 12 Network Monitoring: IP routing  Monitoring the status of the routing with Internet Control Message Protocol (ICMP), specifically “ping“  Three stages for the DAQ:  Entire read-out event from the readout board to HLT farm ICMP not fully implemented in the readout board, a general computer is inserted to simulate the the readout board: Test the status of the readout board using “arping” Test the availability of the HLT nodes using “ping”  Selected events from the HLT to the LHCb online storage  From the online storage to CERN CASTOR  The front-end script gets the result and exchanges the message with PVSS using DIM

13 13 Network Monitoring: traffic  Front-end processors:  Collect all the interface counters from the network devices using SNMP Input and output traffic Input and output errors, discards  Exchange data as a DIM server  PVSS:  Receives the data via PVSS-DIM bridge  Analyzes the traffic and archives them  Displays the current status and trending of the bandwidth utilization  Issues alarm in case of error

14 14 Network Monitoring: traffic

15 15 Network Monitoring: hardware/system  Syslog server is setup to receive the syslog messages from the network devices and parse the messages. When the network devices run into problems, the error messages will be generated and sent to the syslog server as configured in the network device  Hardware: temperature, fan status, power supply status  System: CPU, memory, login authentication etc.  Syslog can collect some information not covered by SNMP  All the collected messages will be communicated to PVSS

16 16 Network Configuration  The LHCb online network system is quite large:  Different devices with different OS and command sets  But quite static luckily, only a few features are essential for configuring the network devices.  Currently a set of Python scripts is used for configuring the network devices, using module pexpect for interactive CLI access.  Initial setup for new installed switch  Firmware upgrade  Configuration file backup and restore

17 17 Network Configuration NeDi CLI access  Web-based interface  Possible to select a set of switches by type, IP, or name etc.  Can apply a batch of commands on a set of switches

18 18 Network Diagnostics Tools  sFlow Sampler  sFlow is a mechanism to capture packet headers, and collect the statistics from the device, especially in high speed networks  samples the packet on the switch port and displays the header information It is very useful to debug the packet loss problem, e.g. caused by wrong IP or MAC address  Relative high speed traffic monitoring  Queries the counters for selected interfaces using SNMP or CLI with a finer time resolution  Shows the utilization for the selected interfaces

19 19 Status of Network Installation and Deployment  Current setup:  With 2 aggregation switches  Only 2 linecards inserted to the core DAQ switch  For L0 trigger rate at ~200kHz  Upgrade for 1 MHz full speed readout.  Core DAQ switch: Forec10 E1200i 14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity and all ports run in line rate  All readout boards will be connected to the core DAQ switch directly

20 20


Download ppt "Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy."

Similar presentations


Ads by Google