Presentation is loading. Please wait.

Presentation is loading. Please wait.

IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance.

Similar presentations


Presentation on theme: "IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance."— Presentation transcript:

1 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance of DAQ Protocols Dave Bailey, Richard Hughes-Jones, Marc Kelly The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” www.hep.man.ac.uk/~rich/

2 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 2 Collecting Data over the Network ●●● Custom Links Ethernet Switches Processing Nodes 1 Burst / Node Detector elements e.g. Calorimeter Planks Output link Bottleneck Queue uAim for a general purpose DAQ solution for CALICE uCAlorimeter for the LInear Collider Experiment uTake ECAL as an example. uAt the end of the beam spill the planks send all the data, to the concentrators uConcentrators ‘pack’ data & send to one processing node uClassic bottleneck problem for the switch Concentrators

3 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 3 XpressFX Vertex4 Network Test Board uXpressFX Development Card from PLDApplications 8 lane PCI-e card Xilinx Virtex4FX60 FPGA DDR2 memory 2 SFP cages – 1GigE 2 HSSDC connectors

4 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 4 Overview of the Firmware Design uVirtex4FX60 has: 16 RocketIP Multi-Gigabit Tranceivers Large internal memory 2 PPC CPUs u Ethernet Interface Embedded MAC RocketIO u Packet Buffers & logic Allows routing of input Prioritising of output u Packet State Machine Packet Generator State Machines u VHDL model HC11 CPU Control of MAC State Machines (Green Mountain Computer Systems) u Reserve the PPC for data processing

5 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 5 The State Machine Blocks uPacket Generator uCSRs (set by HC11) for Packet length Packet count Inter-packet delay Destination Address u Request – Response u RX State Machine Decode Request Packet Checksum RFC768 Action Mem writes Q Other Requests u FIFO u TX State Machine Process Request Construct reply Fragment if needed Checksum u Packet Analyser Packet Analyser State Machine

6 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 6 The Receive State Machine Idle Read Header Read Cmd Check Cmd Do Cmd Write Mem Fill Fifo Empty Packet Packet in Queue Correct packet type All bytes received Good cmd Is a memory write Write finished Fifo written End of packet Wrong packet type Not a memory write Bad cmd Fifo has: Address cmd

7 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 7 The Receive State Machine Idle Send Header &cmd Check Cmd All Sent? Update Counter Send Xsum Send Memory End Pkt cmd needs no data All bytes have been sent Header & cmd sent More data to send cmd requires data Max packet size or byte count done cmd in fifo End of packet Xsum sent

8 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 8 The Test Network Responding nodes Cisco 7609 1 GE and 10 GE blades Requesting Node FPGA Concentrator uUse for testing Raw Ethernet Frame generation by the FPGA uTest Data collection with Request- Response protocols

9 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 9 Request-Response Latency 1 GE uRequest sent from PC uLinux Kernel 2.6.20-web100_pktd- plus uIntel e1000 NIC uInterrupt Coalescence OFF on PC uMTU 1500 bytes uResponse Frames generated by FPGA code uLatency 19.7 µs well behaved uLatency Slope 0.018 µs/byte uB2B Expect: 0.0182 µs/byte Mem 0.0004 PCI-e 0.0018 1GigE 0.008 FPGA 0.008 uSmooth to 35,000 bytes

10 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 10 FPGA  PC ethCal_recv : Frame jitter u25 us frame spacing u12 us frame spacing (line speed) Peak separation 4-5 us no coalescence Packet loss

11 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 11 Test the Frame Spacing from the FPGA uFrames generated by FPGA code uInterrupt Coalescence OFF on PC uFrame size 1472 bytes u1M packets sent. uPlot mean of observed frame spacing vs requested spacing uAppear have offset of -1 us ? uSlope close to 1 as expect uPacket loss decreases with packet rate. uPacket lost in receiving host uLarger effect than UDP/IP packets uUDP/IP losses linked to scheduling

12 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 12 The Test Network Responding nodes Requesting Node FPGA Concentrator uUse for testing Raw Ethernet Frame generation by the FPGA uTest Data collection with Request- Response protocols uThis time use 10GE hosts uBut does 10GE work on a PC?? Cisco 7609 1 GE and 10 GE blades

13 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 13 10 GigE Back2Back: UDP Throughput uMotherboard: Supermicro X7DBE uKernel: 2.6.20-web100_pktd-plus uNIC: Myricom 10G-PCIE-8A-R Fibre rx-usecs=25 Coalescence ON uMTU 9000 bytes uMax throughput 9.4 Gbit/s uNotice rate for 8972 byte packet u~0.002% packet loss in 10M packets in receiving host uSending host, 3 CPUs idle uFor 90% in kernel mode inc ~10% soft int uReceiving host 3 CPUs idle uFor <8 µs packets, 1 CPU is 70-80% in kernel mode inc ~15% soft int

14 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 14 Scaling of Request-Response Messages uRequests from 10GE system uInterrupt Coalescence OFF on PC uFrame size 1472 bytes u1M packets sent. uRequest 10,000 bytes of data uHost does fragment collection like the IP layer uSequential Requests: Time to receive all responses scales with round trip time. As expected from sequential requests uGrouped Requests: Collection time increases by 24.6µs per node. From network alone expect 1+12.3 = 13.3 µs ●●●●●● Time ●●●●●●

15 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 15 Sequential Request-Response uInterrupt Coalescence OFF on PCs uMTU 1500 bytes u10,000 packets sent. uHistograms similar uStrong 1 st peak uSecond peak 5 µs later uSmall group ~25 µs later uEthernet occupancy for 1500 bytes: 1Gig 12.3 µs 10Gig 1.2 µs

16 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 16 Grouped Request-Response uInterrupt Coalescence OFF on PCs uMTU 1500 bytes u10,000 packets sent. uHistograms multi-nodal uSecond peak ~ 7 µs later uSmall group ~25 µs later

17 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 17 Conclusions u Implemented MAC and PHY layers inside Xilinx Virtex4 FPGA u Learning curve steep had to overcome issues with Xilinx “CoreGen” design Clock generation & stability on PCB u FPGA easily drives 1Gigabit Ethernet at line rate Packet dynamics on the wire as expected Loss of Raw Ethernet frames in end host being investigated u Request-Response style data collection promising u Developing a simple Network test system u Planned upgrade to operate at 10Gbit/s u Work performed in collaboration with ESLEA UK e-Science & EU EXPReS projects:

18 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 18 Any Questions?

19 IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 19 10 GigE UDP Throughput vs packet size uMotherboard: Supermicro X7DBE uLinux Kernel 2.6.20-web100_ pktd-plus uMyricom NIC 10G-PCIE-8A-R Fibre umyri10ge v1.2.0 + firmware v1.4.10 rx-usecs=0 Coalescence ON MSI=1 Checksums ON tx_boundary=4096 uSteps at 4060 and 8160 bytes within 36 bytes of 2 n boundaries uModel data transfer time as t= C + m*Bytes C includes the time to set up transfers Fit reasonable C= 1.67 µs m= 5.4 e4 µs/byte Steps consistent with C increasing by 0.6 µs uThe Myricom drive segments the transfers, limiting the DMA to 4096 bytes – PCI-e chipset dependent!


Download ppt "IEEE Real Time 2007, Fermilab, 29 April – 4 May R. Hughes-Jones Manchester 1 Using FPGAs to Generate Gigabit Ethernet Data Transfers & The Network Performance."

Similar presentations


Ads by Google