R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet Domenico Galli Università di Bologna and INFN, Sezione di Bologna XII SuperB Project.

Slides:



Advertisements
Similar presentations
Digital RF Stabilization System Based on MicroTCA Technology - Libera LLRF Robert Černe May 2010, RT10, Lisboa
Advertisements

LCG TCP performance optimization for 10 Gb/s LHCOPN connections 1 on behalf of M. Bencivenni, T.Ferrari, D. De Girolamo, Stefano.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Spring 2008 – Winter 2009 Midterm Presentation.
1 Design of the Front End Readout Board for TORCH Detector 10, June 2010.
PCIe based readout U. Marconi, INFN Bologna CERN, May 2013.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Input/Output Systems and Peripheral Devices (03-2)
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
5 Feb 2002Alternative Ideas for the CALICE Backend System 1 Alternative Ideas for the CALICE Back-End System Matthew Warren and Gordon Crone University.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
HyperTransport™ Technology I/O Link Presentation by Mike Jonas.
Is Lambda Switching Likely for Applications? Tom Lehman USC/Information Sciences Institute December 2001.
“ Analyzer for 40Gbit Ethernet “ (Bi-semestrial project) Executers: פריד מחאג ' נה Farid Mahajna Husam Kadan חוסאם קעדאן Instructor:
Engineering & Instrumentation Department, ESDG, Rob Halsall, 24th February 2005CFI/Confidential CFI - Opto DAQ - Status 24th February 2005.
Silicon Building Blocks for Blade Server Designs accelerate your Innovation.
A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
GBT Interface Card for a Linux Computer Carson Teale 1.
“L1 farm: some naïve consideration” Gianluca Lamanna (CERN) & Riccardo Fantechi (CERN/Pisa)
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
BUS IN MICROPROCESSOR. Topics to discuss Bus Interface ISA VESA local PCI Plug and Play.
Computer Architecture Part IV-B: I/O Buses. Chipsets Intelligent bus controller chips found on the motherboard Enable higher speeds on one or more buses.
ENW-9800 Copyright © PLANET Technology Corporation. All rights reserved. Dual 10Gbps SFP+ PCI Express Server Adapter.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
1 Network Performance Optimisation and Load Balancing Wulf Thannhaeuser.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Prospects for the use of remote real time computing over long distances in the ATLAS Trigger/DAQ system R. W. Dobinson (CERN), J. Hansen (NBI), K. Korcyl.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK– PARALLEL BUS DEVICE PROTOCOLS 1.
Takeo Higuchi (KEK); CHEP pptx High Speed Data Receiver Card for Future Upgrade of Belle II DAQ 1.Introduction – Belle II DAQ Experimental apparatus.
Guido Haefeli CHIPP Workshop on Detector R&D Geneva, June 2008 R&D at LPHE/EPFL: SiPM and DAQ electronics.
LNL 1 SADIRC2000 Resoconto 2000 e Richieste LNL per il 2001 L. Berti 30% M. Biasotto 100% M. Gulmini 50% G. Maron 50% N. Toniolo 30% Le percentuali sono.
Ethernet Bomber Ethernet Packet Generator for network analysis
Input/Output Organization III: Commercial Bus Standards CE 140 A1/A2 20 August 2003.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
GBT-FPGA Interface Carson Teale. GBT New radiation tolerant ASIC for bidirectional 4.8 Gb/s optical links to replace current timing, trigger, and control.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Pierre VANDE VYVRE ALICE Online upgrade October 03, 2012 Offline Meeting, CERN.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
L1/HLT trigger farm Bologna setup 0 By Gianluca Peco INFN Bologna Genève,
ROM. ROM functionalities. ROM boards has to provide data format conversion. – Event fragments, from the FE electronics, enter the ROM as serial data stream;
16 th IEEE NPSS Real Time Conference 2009 IHEP, Beijing, China, 12 th May, 2009 High Rate Packets Transmission on 10 Gbit/s Ethernet LAN Using Commodity.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
The Evaluation Tool for the LHCb Event Builder Network Upgrade Guoming Liu, Niko Neufeld CERN, Switzerland 18 th Real-Time Conference June 13, 2012.
EXtreme Data Workshop Readout Technologies Rob Halsall The Cosener’s House 18 April 2012.
M. Bellato INFN Padova and U. Marconi INFN Bologna
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
LHCb and InfiniBand on FPGA
NaNet Problem: lower communication latency and its fluctuations. How?
Electronics Trigger and DAQ CERN meeting summary.
HyperTransport™ Technology I/O Link
TELL1 A common data acquisition board for LHCb
Electronics, Trigger and DAQ for SuperB
CMS DAQ Event Builder Based on Gigabit Ethernet
I/O BUSES.
Network Processors for a 1 MHz Trigger-DAQ System
SVT detector electronics
TELL1 A common data acquisition board for LHCb
Presentation transcript:

R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet Domenico Galli Università di Bologna and INFN, Sezione di Bologna XII SuperB Project Workshop, Annecy-les-Vieux, 18 th March, 2010

Commodity Links More and more often used in HEP for DAQ, Event Building and High Level Trigger Systems: –Limited costs; –Maintainability; –Upgradability. Demand of data throughput in HEP is increasing following: –Physical event rate; –Number of electronic channels; –Reduction of the on-line event filter (trigger) stages. Industry has moved on since the design of the DAQ for the LHC experiments: –10 Gigabit Ethernet well established; –4x DDR Infiniband (16 Gb/s) ready; –100 Gigabit Ethernet is being actively worked on. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 2

Evaluation of New Commercial Link Technologies Bologna group, in its spare time, is constantly evaluating new commodity link technologies: –In the perspective of an employment in DAQ/EB/HLT. Evaluated parameters: –Maximum throughput; –Maximum datagram rate; –CPU load; –Datagram loss rate. Recently tested links: –Gigabit Ethernet (presented at IEEE RT-05); –10-Gigabit Ethernet (presented at IEEE RT-09); –Infiniband (2010). Choice of technology for the experiment must be delayed as much as possible. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 3

10-GbE Point-to-Point Tests We start technology evaluation from PC-to-PC tests. –NIC mounted on the PCI-E bus of commodity PCs as transmitters and receivers. In real operating condition, maximum transfer rate limited not only by the capacity of the link itself, but also: –by the capacity of the data busses (PCI and FSB/QPI); –by the ability of the CPUs and of the OS to handle packet processing and interrupt rates raised by the network interface cards in due time. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 4 10GBase-SR

10-GbE Network I/O “Fast network, slow host” scenario: –Already seen in transitions to 1 Gigabit Ethernet: 3 major system bottlenecks may limit the efficiency of high-performance I/O adapters: –The peripheral bus bandwidth: PCI-X (peak throughput 8.5 Gbit/s in 133 MHz flavor) substituted by the PCI-E, (20 Gbit/s peak throughput in x8 flavor). –The memory bandwidth: FSB has increased the clock from 533 MHz to 1600 MHz and then substituted by AMD Hypertransport and Intel QuickPath Interconnect. –The CPU utilization: Multi-core architectures. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 5

CPU Affinity Settings DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 6

CPU Affinity Settings (II) DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 7

UDP protocol UDP/IP protocol is the simplest IP protocol that can be implemented in a FPGA. –It does not hide the network problems at lower layers. –SCTP/IP (Stream Control Transmission Protocol) could be an alternative. –TCP/IP is too complex: Need thousands of connections (and buffers) to be kept open on the FPGA side. Too many mechanism which slow down the data flow to be tuned: –Congestion control, slow start, sliding windows, retransmission timer, Nagle’s algorithm, etc. Large protocol overhead. Retransmission timer to be tuned in order to keep the latency low. Experience in DAQ shows that a protocol stack as complete as possible is very useful to simplify debugging in commissioning phase: –Including ARP, RARP, ICMP (ping), etc. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 8

UDP – Standard Frames 1500 B MTU (Maximum Transfer Unit). UDP datagrams sent as fast as they can be sent. Bottleneck: sender CPU core 2 (sender process 100% system load). DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 9 User System IRQ Soft IRQ Total ~ 4.8 Gb/s ~ 440 kHz 2 frames 3 frames 4 frames 100% (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (~50%) system (~50%)

UDP – Jumbo Frames 9000 B MTU. Sensible enhancement with respect to 1500 MTU. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 10 User System IRQ Soft IRQ Total ~ 9.7 Gb/s ~ 440 kHz 2 frames 3 frames 4 frames 2 frames 3 frames 4 frames 2 PCI-E frames 3 PCI-E frames 100% (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (~50%) system (~50%)

~ 10 Gb/s ~ 600 kHz 2 frames 3 frames 4 frames 2 frames 3 frames 4 frames ~3 KiB UDP – Jumbo Frames 2 Sender Processes Doubled availability of CPU cycles to the sender PC. 10GbE fully saturated. Receiver (playing against 2 senders) not yet saturated. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet % (bottleneck) fake softIRQ softIRQ (4/5) IRQ (1/5) softIRQ (25-75%) system (75-90%) ~5 KiB no more CPU bottleneck User System IRQ Soft IRQ Total

R&D Project A R&D project (PRIN) has been funded by Italian Education and Research Ministry (MIUR): –TeraDAQ: protype demonstrator of a high-performance data acquisition system based on a PC cluster and using ultra- high speed networking standards. The project targets particle physics experiments on next-generation accelerators of very high luminosity. –INFN Bologna, Bologna University and Roma Tor Vergata University. –51,700 €. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 12

Electronics Evaluation kit Xilinx ML605: –Equipped with last generation Virtex-6 Xilinx FPGA; –FPGA Mezzanine Connector (FMC). Connectivity board FMC XM104: –10-GbE CX4 connector. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 13 Mezzanine FMC XM104 connectivity card Xilinx Mezzanine FMC XM104 connectivity card Xilinx Xilinx Virtex-6 FPGA ML605 Evaluation board PC 10 GbE connector CX4 10GBASE-CX4 (max 10 m) FMC 10 Gb/s FPGA Virtex-6 Xilinx Software VHDL UDP/IP Software VHDL UDP/IP Software core 10-GbE MAC Software core 10-GbE MAC Software core XAUI SERDES Software core XAUI SERDES

Electronics (II) FMC XM104 Connectivity Card: –designed to provide access to eight serial transceivers on the FMC HPC connector found on Xilinx FMC- supported boards including Virtex-6 ML605. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 14 ML605 board

Software XAUI SERDES and 10-GbE MAC: –Available as evaluation software for free. UDP/IP: –Evaluating possible solutions. DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 15

Domenico Galli Dipartimento di Fisica, Alma Mater Studiorum - Università di Bologna and INFN, Sezione di Bologna

Test Platform DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 17 MotherboardIBM X3650 Processor typeIntel Xeon E5335 Procesors x cores x clock (GHz)2 x 4 x 2.00 L2 cache (MiB)8 L2 speed (GHz)2.00 FSB speed (MHz)1333 ChipsetIntel 5000P RAM4 GiB NICMyricom 10G-PCIE-8A-S NIC DMA Speed (Gbit/s) ro / wo /rw10.44 / / 19.07

Settings DOMENICO GALLI - R&D on data transmission FPGA → PC using UDP over 10-Gigabit Ethernet 18 net.core.rmem_max (B) net.core.wmem_max (B) net.ipv4.tcp_rmem (B)4096 / / net.ipv4.tcp_wmem (B)4096 / / net.core.netdev_max_backlog Interrupt Coalescence (μs)25 PCI-E speed (Gbit/s)2.5 PCI-E widthx8 Write Combiningenabled Interrupt TypeMSI