What happens when you try to build a low latency NIC? Mario Flajslik.

What happens when you try to build a low latency NIC? Mario Flajslik

Overview ● Motivation ● Setup ● Receive path NIC latency ● Transmit path NIC latency ● Results ● Conclusion

Slide by John Ousterhout RAMCloud Overview ● Storage for datacenters ● 1000-10000 commodity servers ● 64 GB DRAM/server ● All data always in RAM ● Durable and available ● Performance goals:  High throughput: 1M ops/sec/server  Low-latency access: 5-10µs RPC Application Servers Storage Servers Datacenter

Datacenter network topology ● Current low latency switches ~600ns ● Next generation switches promise ~300ns...

Test setup xupv5 FPGA NIC ICH MCH CPURAM ethernet PCIe 1.1 x1 DMA rd/wr sizelatency 64B2424 ns 128B2808 ns 256B3464 ns 512B4984 ns DMA read latencies:

Receiving packets ● Pre-allocate buffers (sk_buff) and give them to the NIC ● On packet receive DMA data to the buffer, and notify the CPU CPU meta data sk_buff … RAM packet in DMA write write length interrupt NIC

RX path PCIe core CTRL length (256ns delay) TX/RX MAC 0MAC 1MAC 2MAC 3 other sources

PCIe RX latency breakdown ● 128B receive (+16B PCIe header) ● PCIe 1.1: 250 MB/s -> 576 ns wire time for 144B ● All times in nanoseconds app trn_tx pcie_mim_wr pcie_mim_rd wire 0 272 416 746 1156 1322 580 128 560

Transmitting packets ● NIC writes back a flag that the packet has been sent and sk_buff can be deallocated. Data written back is the DMA read latency. CPU meta data sk_buff … RAM reg write DMA read write back packet out NIC rsvd(2)end(1)id(5)size(16)addr(40) reg write:

TX path PCIe core CTRL TX/RX MAC 0MAC 1MAC 2MAC 3 MEM

PCIe RX latency breakdown app trn_tx pcie_mim_wr pcie_mim_rd wire 0 128 96 240 272 316380 446510 pcie_mim_rd pcie_mim_wr app trn_rx wire 176 576 752 8321120 13761664 14241680 0

Results ● RTT of DMA read from the NIC ● Xilinx PCIe core latency (128B) = 2126 ns ● Latency = 2000 ns + size * 6 ns ● Smaller latency for x8 PCIe sizelatency 64B2424 ns 128B2808 ns 256B3464 ns 512B4984 ns

Evolution of PCI ● Experiment done with register DMA reads (4 byte DMA reads) Data from: Motivating Future Interconnects: A Differential Measurement Analysis of PCI Latency. David Miller (University of Cambridge, United Kingdom); Philip M Watts (University of Cambridge, United Kingdom); Andrew W. Moore (University of Cambridge, United Kingdom) PCI versionDMA read latency 33 MHz PCI324 ns 66 MHz PCI-X174 ns 133 MHz PCI-X84 ns PCIe x12106 ns PCIe x8252 ns

PHY & MAC ● XAUI is interconnect protocol between virtex-5 and the PHY chip ● Given times are for a copper cable ● Optical cable times are a little better (~10ns better) MACPHY MAC 10G NetFPGA NIC 0 ns 192 ns403 ns607 ns819 ns XAUI 109 ns83 ns211 ns 83 ns90 ns

Conclusion ● Current Intel 10G NICs (RX+TX): ~5μs ● Current Intel 1G NICs (RX+TX): ~9μs ● Infiniband (RX+TX): ~2.2μ ● There is room for improvement

What happens when you try to build a low latency NIC? Mario Flajslik.

Similar presentations

Presentation on theme: "What happens when you try to build a low latency NIC? Mario Flajslik."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What happens when you try to build a low latency NIC? Mario Flajslik.

Similar presentations

Presentation on theme: "What happens when you try to build a low latency NIC? Mario Flajslik."— Presentation transcript:

Similar presentations

About project

Feedback