Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute.

Slides:



Advertisements
Similar presentations
CHEP 2000 Padova Stefano Veneziano 1 The Read-Out Crate in the ATLAS DAQ/EF prototype -1 The Read-Out Crate model The Read-Out Buffer The ROBin ROB performance.
Advertisements

6-April 06 by Nathan Chien. PCI System Block Diagram.
CS-334: Computer Architecture
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
Who We Are? Detector Building Group of KFKI-RMKI (Research Institute for Particle and Nuclear Physics), Budapest, HUNGARY.
A Gigabit Ethernet Link Source Card Robert E. Blair, John W. Dawson, Gary Drake, David J. Francis*, William N. Haberichter, James L. Schlereth Argonne.
CMS Week Sept 2002 HCAL Data Concentrator Status Report for RUWG and Calibration WG Eric Hazen, Jim Rohlf, Shouxiang Wu Boston University.
DUAL-OUTPUT HOLA MAY 2011 STATUS Anton Kapliy Mel Shochet Fukun Tang.
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
Takeo Higuchi Institute of Particle and Nuclear Studies, KEK; On behalf of COPPER working group Jan.20, 2004 Hawaii, USA Super B Factory Workshop Readout.
PALM-3000 ATST/BBSO Visit Stephen Guiwits P3K System Hardware 126 Cahill February 11, 2010.
Detector Array Controller Based on First Light First Light PICNIC Array Mux PICNIC Array Mux Image of ESO Messenger Front Page M.Meyer June 05 NGC High.
Sept TPC readoutupgade meeting, Budapest1 DAQ for new TPC readout Ervin Dénes, Zoltán Fodor KFKI, Research Institute for Particle and Nuclear Physics.
PCIe based readout U. Marconi, INFN Bologna CERN, May 2013.
System Architecture A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Hyong-Youb Kim, Paul Willmann, Dr. Scott Rixner Rice.
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
DDL hardware, DATE training1 Detector Data Link (DDL) DDL hardware Csaba SOOS.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
A 50-Gb/s IP Router 참고논문 : Craig Partridge et al. [ IEEE/ACM ToN, June 1998 ]
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
1 Microprocessor-based Systems Course 9 Design of the input/output interfaces (continue)
GBT Interface Card for a Linux Computer Carson Teale 1.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK
LECC2003 AmsterdamMatthias Müller A RobIn Prototype for a PCI-Bus based Atlas Readout-System B. Gorini, M. Joos, J. Petersen (CERN, Geneva) A. Kugel, R.
PHENIX upgrade DAQ Status/ HBD FEM experience (so far) The thoughts on the PHENIX DAQ upgrade –Slow download HBD test experience so far –GTM –FEM readout.
2009 Sep 10SYSC Dept. Systems and Computer Engineering, Carleton University F09. SYSC2001-Ch7.ppt 1 Chapter 7 Input/Output 7.1 External Devices 7.2.
A PCI Card for Readout in High Energy Physics Experiments Michele Floris 1,2, Gianluca Usai 1,2, Davide Marras 2, André David IEEE Nuclear Science.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
Gueorgui ANTCHEVPrague 3-7 September The TOTEM Front End Driver, its Components and Applications in the TOTEM Experiment G. Antchev a, b, P. Aspell.
Input/Output Computer component : Input/Output I/O Modules External Devices I/O Modules Function and Structure I/O Operation Techniques I/O Channels and.
Frank Lemke DPG Frühjahrstagung 2010 Time synchronization and measurements of a hierarchical DAQ network DPG Conference Bonn 2010 Session: HK 70.3 University.
CMS ECAL Week, July 20021Eric CANO, CERN/EP-CMD FEDkit FED Slink64 readout kit Dominique Gigi, Eric Cano (CERN EP/CMD)
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
May 23 rd, 2003Andreas Kugel, Mannheim University1 Mannheim University – FPGA Group Real Time Conference 2003, Montréal ATLAS RobIn ATLAS Trigger/DAQ Read-Out-Buffer.
Latest ideas in DAQ development for LHC B. Gorini - CERN 1.
2003 Conference for Computing in High Energy and Nuclear Physics La Jolla, California Giovanna Lehmann - CERN EP/ATD The DataFlow of the ATLAS Trigger.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
PCI B ASED R EAD-OUT R ECEIVER C ARD IN THE ALICE DAQ S YSTEM W.Carena 1, P.Csato 2, E.Denes 2, R.Divia 1, K.Schossmaier 1, C. Soos 1, J.Sulyan 2, A.Vascotto.
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
ATLAS TDAQ RoI Builder and the Level 2 Supervisor system R. E. Blair, J. Dawson, G. Drake, W. Haberichter, J. Schlereth, M. Abolins, Y. Ermoline, B. G.
Input Output Techniques Programmed Interrupt driven Direct Memory Access (DMA)
Guirao - Frascati 2002Read-out of high-speed S-LINK data via a buffered PCI card 1 Read-out of High Speed S-LINK Data Via a Buffered PCI Card A. Guirao.
New DAQ at H8 Speranza Falciano INFN Rome H8 Workshop 2-3 April 2001.
Clara Gaspar on behalf of the ECS team: CERN, Marseille, etc. October 2015 Experiment Control System & Electronics Upgrade.
Exploiting Task-level Concurrency in a Programmable Network Interface June 11, 2003 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
LECC2004 BostonMatthias Müller The final design of the ATLAS Trigger/DAQ Readout-Buffer Input (ROBIN) Device B. Gorini, M. Joos, J. Petersen, S. Stancu,
Rutherford Appleton Laboratory September 1999Fifth Workshop on Electronics for LHC Presented by S. Quinton.
ROD Activities at Dresden Andreas Glatte, Andreas Meyer, Andy Kielburg-Jeka, Arno Straessner LAr Electronics Upgrade Meeting – LAr Week September 2009.
Status Report of the PC-Based PXD-DAQ Option Takeo Higuchi (KEK) 1Sep.25,2010PXD-DAQ Workshop.
CHEP 2010, October 2010, Taipei, Taiwan 1 18 th International Conference on Computing in High Energy and Nuclear Physics This research project has.
Demo system of Belle2link Sun Dehui, Zhao Jingzhou,Liu zhen’an Trigger Lab IHEP.
Grzegorz Kasprowicz1 Level 1 trigger sorter implemented in hardware.
The ALICE Data-Acquisition Read-out Receiver Card C. Soós et al. (for the ALICE collaboration) LECC September 2004, Boston.
DIRECT MEMORY ACCESS and Computer Buses
Use of FPGA for dataflow Filippo Costa ALICE O2 CERN
Read-out of High Speed S-LINK Data Via a Buffered PCI Card
Evolution of S-LINK to PCI interfaces
PCI BASED READ-OUT RECEIVER CARD IN THE ALICE DAQ SYSTEM
The Read Out Driver for the ATLAS Muon Precision Chambers
New DCM, FEMDCM DCM jobs DCM upgrade path
Presentation transcript:

Design and Performance of a PCI Interface with four 2 Gbit/s Serial Optical Links Stefan Haas, Markus Joos CERN Wieslaw Iwanski Henryk Niewodnicznski Institute of Nuclear Physics LECC, Sept. 2004, Boston

S. Haas, 14. Sept. '04LECC Outline ● Introduction ● Interface Card Hardware ● Firmware Description ● Software ● Performance Measurements ● Summary

S. Haas, 14. Sept. '04LECC Introduction ● DAQ systems for current and future experiments depend on reliable high-speed data transmission ● S-LINK specification addresses this type of application: ► Point-to-point data link, bandwidth 160 MB/s 40 MHz) ► Flow control (XON/XOFF) ► Error detection (e.g. CRC), ► Self-test mode & return line signals ► CMC mezzanine card format ● ATLAS Read-Out Link (ROL) ► ROL implementation is based on S-LINK ► Connects front-end electronics interface modules (Read-Out Drivers) to the Read-Out system (ROS) ► ROS is based on commodity PCs and custom PCI interface cards (ROBin) ► ~1650 ROLs will be used in ATLAS

S. Haas, 14. Sept. '04LECC ROL Source Card ● High-speed Optical Link for ATLAS (HOLA) ● Standard S-LINK mezzanine card ● Industry standard pluggable (SFP) 850nm F/O transceiver ● Serial link speed 2 Gb/s with 8B10B line encoding ● Low-power: ~2W typical SER DES S-LINK Protocol FPGA CMC mezzanine Connector 160MB/s 40MHz Cage for SFP F/O Transceiver

S. Haas, 14. Sept. '04LECC Quad S-LINK PCI Interface (FILAR) ● FILAR Features: ► Four 2 Gb/s HOLA link channels integrated on-board ► 64-bit/66MHz PCI interface (3.3V slots only) ► Move data between 4 link interfaces and the host PC memory ► Based on S32PCI64 interface design: one slot for S-LINK mezzanine card ● Applications: small readout systems for lab & test beam ● FPGA-based (in-system reconfigurable) ► PCI I/F implemented using a commercial PCI IP core ● Firmware versions: ► Quad S-LINK receiver (S-LINK to PCI) ► Quad S-LINK transmitter (PCI to S-LINK) ► Quad S-LINK data source (for performance measurements)

S. Haas, 14. Sept. '04LECC FILAR Hardware HOLA Interface FPGASFP Fiber Optic TransceiverSERDES PCI Interface FPGA 64-bit/66MHz PCI interface 3.3V only(!)

S. Haas, 14. Sept. '04LECC Receiver Firmware Operation ● Host processor: ► 1) Fills a request FIFO on the interface card with addresses of free memory buffer pages ► 5) Reads the results from the acknowledge FIFO and processes the data ● Interface card: ► 2) Transfers data fragments from S-LINK to host memory as bus master using PCI bursts of up to 1kB for maximum performance ► 3) Stores length, status and control words for received fragments in an acknowledge FIFO ► 4) Asserts an interrupt (optional) ● Protocol overhead of ~2 PCI single-cycles (SC) per data fragment and channel: ► Write address of buffer memory page ► Read length and status of received fragment

S. Haas, 14. Sept. '04LECC Receiver Firmware Block Diagram BACKEND CONTROL LOGIC BACKEND CONTROL LOGIC CONTROL & STATUS REGISTERS CONTROL & STATUS REGISTERS 64-BIT PCI IP CORE 64-BIT PCI IP CORE DMA ENGINE 66 MHz 64-bit PCI 66 MHz 64-bit PCI INPUT BUFFER FIFO PCI BURST FIFO REQUEST FIFO S-LINK ACK. FIFO REQUEST FIFO REQUEST FIFO REQUEST FIFO ACK. FIFO ACK. FIFO ACK. FIFO S-LINK INPUT BUFFER FIFO INPUT BUFFER FIFO INPUT BUFFER FIFO PCI BURST FIFO PCI BURST FIFO PCI BURST FIFO 528MB/s

S. Haas, 14. Sept. '04LECC Firmware Optimization ● Single-cycles do not use the PCI bus efficiently ● Performance optimized version receiver firmware was developed (DMA protocol firmware): ► Interface card transfers request and acknowledge data using DMA ► CPU prepares a descriptor block with buffer addresses for one or more channels in system memory ► Firmware fetches the block using DMA and fills the on-board request FIFOs ► Firmware transfers a block with the length and status information from the acknowledge FIFOs to the system memory using DMA when a threshold is reached ► Requires additional memory resources in the FPGA, only 3 receive channels can be implemented on the current hardware ● Reduces PCI bus overhead and CPU load

S. Haas, 14. Sept. '04LECC Software ● FILAR software package: ► Linux device driver (loadable module) ► Library provides easy to use programming API for applications ► Test and benchmarking programs ● Software written in C ● Separate drivers for the different receiver firmware versions ● Supports multiple channels & PCI cards ● Interrupt driven: device driver is called when a predefined number of fragments are available in any channel ● Code optimised for maximising throughput ► Manage the card with minimal attention from the application layer ► Reduce the number of context switches ● Fully integrated into the ATLAS DataFlow software ● Requires cmem driver/library for allocation of contiguous memory ● Similar package available for the transmitter firmware

S. Haas, 14. Sept. '04LECC Measurement Setup ● PC with Supermicro server motherboard (ServerWorks GC-LE chipset) ● 4 independent 64-bit PCI bus segments ● Intel Xeon CPU (3 GHz) ● S-LINK input channels driven by HOLA data sources ● Chipset architecture is important to obtain the maximum performance

S. Haas, 14. Sept. '04LECC Performance: Single-Cycle Firmware ● FILAR receiver with SC firmware ● Sawtooth structure due to overhead for setting up a PCI burst (1kB) ● Performance for one channel is limited by link bandwidth ● Throughput with 3 channels is limited by PCI interface ● Maximum throughput is ~450MB/s 145MB/s per channel 187MB/s per channel 1kB

S. Haas, 14. Sept. '04LECC Performance: DMA Protocol Firmware ● FILAR receiver with DMA firmware ● Better performance than SC firmware, in particular for short fragments ● 25% improvement for 3 channels at 1kB fragment length ● Performance for long fragments is similar for both firmware versions 1kB

S. Haas, 14. Sept. '04LECC Throughput: Multiple FILAR cards ● DMA protocol F/W ● Maximum throughput of 1.1GB/s with 3 receiver cards ● Throughput scales with the number of channels for fragments of 2kB and more ● For fragments of 500B and less the system is rate limited 147MB/s per channel 145MB/s per channel 140MB/s per channel

S. Haas, 14. Sept. '04LECC Fragment Rate: Multiple FILAR cards ● Received data fragment frequency per channel vs. fragment length ● Fragment rates of 100kHz can be sustained with 3 cards for fragments of less than 1kB 1kB

S. Haas, 14. Sept. '04LECC S-LINK Transmitter Performance ● Transmitter connected to a FILAR receiver in another PC ● PCI interface is saturated with 2 active channels ● Maximum throughput obtained is 360MB/s ● PCI memory read performance is not as good as write

S. Haas, 14. Sept. '04LECC Summary ● FILAR high-performance PCI interface card with 4 on-board 2 Gb/s S-LINK channels (HOLA) has been designed ● Quad S-LINK receiver, transmitter and data source firmware versions have been developed and optimized ● Software package with Linux device driver and API library are integrated in the ATLAS DataFlow software ● Maximum throughput for one receiver card ~450MB/s ● Aggregate data rate of > 1GB/s to system memory has been measured with 3 receiver cards ● Event rates of over 100kHz can be achieved for 1kB fragments ● FILAR applications and users: ► Test readout of front-end electronics interface modules ► ATLAS subdetector groups ( LAr, SCT, TileCal, TRT, Pixel, LVL1 Calo), DAQ & ROBin, MDT chamber tests ► Readout system for the ATLAS combined test beam ► Stable design, ~50 cards produced so far