Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.

Slides:



Advertisements
Similar presentations
Introduction to Network Analysis and Sniffer Pro
Advertisements

André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
June 19, 2002 A Software Skeleton for the Full Front-End Crate Test at BNL Goal: to provide a working data acquisition (DAQ) system for the coming full.
LHCb Upgrade Overview ALICE, ATLAS, CMS & LHCb joint workshop on DAQ Château de Bossey 13 March 2013 Beat Jost / Cern.
Supervision of Production Computers in ALICE Peter Chochula for the ALICE DCS team.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
The LHCb Event-Builder Markus Frank, Jean-Christophe Garnier, Clara Gaspar, Richard Jacobson, Beat Jost, Guoming Liu, Niko Neufeld, CERN/PH 17 th Real-Time.
DASAN NETWORKS GPON Training
LHCb readout infrastructure NA62 TDAQ WG Meeting April 1 st, 2009 Niko Neufeld, PH/LBC.
L. Granado Cardoso, F. Varela, N. Neufeld, C. Gaspar, C. Haen, CERN, Geneva, Switzerland D. Galli, INFN, Bologna, Italy ICALEPCS, October 2011.
Control and monitoring of on-line trigger algorithms using a SCADA system Eric van Herwijnen Wednesday 15 th February 2006.
The LHCb Online System Design, Implementation, Performance, Plans Presentation at the 2 nd TIPP Conference Chicago, 9 June 2011 Beat Jost Cern.
Basic Router Configuration Honolulu Community College Cisco Academy Training Center Semester 2 Version 2.1.
Hands-on Networking Fundamentals
© 2008 Cisco Systems, Inc. All rights reserved.CIPT1 v6.0—2-1 Administering Cisco Unified Communications Manager Understanding Cisco Unified Communications.
Architecture and Dataflow Overview LHCb Data-Flow Review September 2001 Beat Jost Cern / EP.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
Cisco S2 C4 Router Components. Configure a Router You can configure a router from –from the console terminal (a computer connected to the router –through.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Introduction to HP Availability Manager.
Discovery 2 Internetworking Module 5 JEOPARDY John Celum.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Overview of DAQ at CERN experiments E.Radicioni, INFN MICE Daq and Controls Workshop.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
LHCb DAQ system LHCb SFC review Nov. 26 th 2004 Niko Neufeld, CERN.
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 4 Monitoring Network Activity.
Modeling PANDA TDAQ system Jacek Otwinowski Krzysztof Korcyl Radoslaw Trebacz Jagiellonian University - Krakow.
Niko Neufeld, CERN/PH. Online data filtering and processing (quasi-) realtime data reduction for high-rate detectors High bandwidth networking for data.
Management of the LHCb Online Network Based on SCADA System Guoming Liu * †, Niko Neufeld † * University of Ferrara, Italy † CERN, Geneva, Switzerland.
A Super-TFC for a Super-LHCb (II) 1. S-TFC on xTCA – Mapping TFC on Marseille hardware 2. ECS+TFC relay in FE Interface 3. Protocol and commands for FE/BE.
Future experiment specific needs for LHCb OpenFabrics/Infiniband Workshop at CERN Monday June 26 Sai Suman Cherukuwada Sai Suman Cherukuwada and Niko Neufeld.
Network management Network management refers to the activities, methods, procedures, and tools that pertain to the operation, administration, maintenance,
DAQ interface + implications for the electronics Niko Neufeld LHCb Electronics Upgrade June 10 th, 2010.
1 Farm Issues L1&HLT Implementation Review Niko Neufeld, CERN-EP Tuesday, April 29 th.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Niko Neufeld HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
Clara Gaspar, April 2006 LHCb Experiment Control System Scope, Status & Worries.
LHCb Configuration Database Lana Abadie, PhD student (CERN & University of Pierre et Marie Curie (Paris VI), LIP6.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Monitoring for the ALICE O 2 Project 11 February 2016.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
DAQ & ConfDB Configuration DB workshop CERN September 21 st, 2005 Artur Barczyk & Niko Neufeld.
Introduction to DAQ Architecture Niko Neufeld CERN / IPHE Lausanne.
COMPASS DAQ Upgrade I.Konorov, A.Mann, S.Paul TU Munich M.Finger, V.Jary, T.Liska Technical University Prague April PANDA DAQ/FEE WS Игорь.
Clara Gaspar, February 2010 DIM A Portable, Light Weight Package for Information Publishing, Data Transfer and Inter-process Communication.
Model: DS-600 5x 10/100/1000Mbps Ethernet Port Centralized WLAN management and Access Point Discovery Manages up to 50 APs with access setting control.
20OCT2009Calo Piquet Training Session - Xvc1 ECS Overview Piquet Training Session Cuvée 2009 Xavier Vilasis.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
Giovanna Lehmann Miotto CERN EP/DT-DI On behalf of the DAQ team
MPD Data Acquisition System: Architecture and Solutions
LHCb and InfiniBand on FPGA
Network Environment Management
CCNA Routing and Switching Routing and Switching Essentials v6.0
Enrico Gamberini, Giovanna Lehmann Miotto, Roland Sipos
Controlling a large CPU farm using industrial tools
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Chapter 10: Device Discovery, Management, and Maintenance
CCNA Routing and Switching Routing and Switching Essentials v6.0
© 2002, Cisco Systems, Inc. All rights reserved.
Network Environment Management
The LHCb Event Building Strategy
Chapter 10: Device Discovery, Management, and Maintenance
John Harvey CERN EP/LBC July 24, 2001
Philippe Vannerem CERN / EP ICALEPCS - Oct03
LHCb Trigger, Online and related Electronics
The LHCb High Level Trigger Software Framework
Network Processors for a 1 MHz Trigger-DAQ System
Requirements Definition
Presentation transcript:

Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy

Guoming Liu 2 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

Guoming Liu 3 RT2009 LHCb online system LHCb Online system consists of three major components:  Data Acquisition (DAQ)  transfers the event data from the detector front-end electronics to the permanent storage  Timing and Fast Control (TFC)  drives all stages of the data readout of the LHCb detector between the front-end electronics and the online processing farm  Experiment Control System (ECS),  controls and monitors all parts of the experiment: the DAQ System, the TFC Systems, the High Level Trigger Farm, the Detector Control System, the Experiment's Infrastructure etc.

Guoming Liu 4 RT2009 LHCb online system

Guoming Liu 5 RT2009 LHCb DAQ network  Components:  Readout board: TELL1/UKL1 In total: ~330  Aggregation switches  Core DAQ switch: Force10 E1200i Supports up to 1260 GbE ports Switch capacity: 3.5Tb/s  50 Edge switches Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches

Guoming Liu 6 RT2009 LHCb DAQ network Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches  DAQ works in a push mode  Protocols  Readout: MEP light-weight datagram protocol over IP  Storage: standard TCP/IP  Network throughputs  Read out: ~35 GByte/s First Level trigger accept rate: 1 MHz Avg. event size: ~ 35 kByte  Storage: ~ 70 MByte/s HLT accept rate: ~ 2 kHz ~280 Gb/s ~560 Mb/s

Guoming Liu 7 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

Guoming Liu 8 RT2009 Network monitoring  DAQ network  Large scale LAN: ~54 switches, ~3500 GbE ports  Performance critical  The status of the whole DAQ network must be monitored at different levels for different requirements  DAQ network monitoring is part of the LHCb ECS  Uses the same tool PVSS and framework JCOP PVSS: commercial SCADA system JCOP: Joint Control Project for LHC experiments  Provides the same operation interface

Guoming Liu 9 RT2009 Network Monitoring  Monitored Items  Topology  Traffic  IP routing  Hardware/system  Tools  Data collectors: Varied front-end processors based on SNMP, SysLog  Data communication: DIM Client/Server mechanism Server: publishes information by services Client: subscribes to the service Architecture of the Network Monitoring DIM SNMP / Syslog

Guoming Liu 10 RT2009 Network Monitoring (1): Topology  NeDi: an open source tool to discover the network  Discovery of the network topology based on Link Layer Discovery Protocol (LLDP) Seed  neighbors  neighbors of those neighbors  end (all devices are discovered)  Discovery of the network nodes  Certain modifications have been made for LHCb network environment  All information is sent to PVSS  PVSS Monitors any change of the topology.

Guoming Liu 11 RT2009 Network Monitoring (2): traffic  Traffic monitoring is based on SNMP (Simple Network Management Protocol)  The SNMP driver provided by PVSS has a low performance  Custom SNMP collectors:  Collect all the interface counters from the network devices Input and output traffic Input and output errors, discards  Publishes data for PVSS as a DIM server  PVSS:  Receives the data via PVSS-DIM bridge  Analyzes the traffic and archives them  Displays the current status and trending of the bandwidth utilization  Issues alarm in case of error

Guoming Liu 12 RT2009 Network Monitoring (2): traffic

Guoming Liu 13 RT2009 Network Monitoring (3): IP routing  Monitoring the status of the routing using “ping“  Three stages for the DAQ: 1. From readout board to HLT farm ICMP are not fully supported by the readout board, a normal computer is inserted to simulate the readout board 2. From HLT Farm to the LHCb online storage 3. From the online storage to CERN CASTOR  The front-end script gets the result and sends the summary to PVSS using DIM Core Switch HLT CPU ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches 1 2 3

Guoming Liu 14 RT2009 Network Monitoring (4): hardware/system  Syslog can collect some information not covered by SNMP  Syslog server is setup to receive the syslog messages from the network devices and parse the messages. Alarm information:  Hardware: temperature, fan status, power supply status  System: CPU, memory, login authentication etc.  All the messages with the priority higher than warning, will be sent to PVSS for further processing

Guoming Liu 15 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

Guoming Liu 16 RT2009 Network configuring  The LHCb online network system is quite large:  Different devices with different OS and command sets  But quite static, only a few features are essential for configuring the network devices.  Currently a set of Python scripts is used for configuring the network devices  Initial setup for new installed switch  Firmware upgrade  configuration file backup and restore

Guoming Liu 17 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

Guoming Liu 18 RT2009 Network Debugging Tools  Motivation: debugging the DAQ network problems, mainly packet dropping 1.High speed traffic monitoring  Queries the counters of selected interfaces using SNMP or CLI with a better time resolution  Shows the bandwidth utilization for the selected interfaces 2.sFlow Sampler  sFlow is a mechanism to capture packet headers, and collect the statistics from the device, especially in high speed networks  It is very useful to debug the packet dropping problem caused by wrong IP or MAC address

Guoming Liu 19 RT2009 Outline  Introduction to LHCb DAQ system  Network Monitoring based on SCADA system  Network configuring  Network Debugging  Status of LHCb network installation and deployment

Guoming Liu 20 RT2009 Status of Network Installation and Deployment  Current setup:  With 2 aggregation switches  Only 2 linecards inserted to the core DAQ switch  For L0 trigger rate at ~200kHz  Upgrade for 1 MHz full speed readout.  Core DAQ switch: Forec10 E1200i 14 linecards, 1260 GbE ports will be ready at the end of June Upgrade from Terascale to Exascale: double the switch capacity and all ports run in line rate  All readout boards will be connected to the core DAQ switch directly Core Switch HLT CPU 50 Edge Switches ~330 Readout Boards HLT CPU Storage Aggregation CASTO R Aggregation Switches

Guoming Liu 21 RT2009