Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.

Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware

Active monitoring – a probe Uses test packets Results truly applicable to test packets only Passive monitoring – a view Does not send anything, provides characteristics about real traffic Many characteristics are inherent to real traffic and cannot be obtained from test packets traffic volume, protocol usage, burstiness, real packet loss, anomalies, security attacks, … Active vs. passive monitoring

Hardware vs. software Hardware processing is often considered „fast“ and software processing „slow“ Software runs on top of hardware and hardware is often programmed, no clear line between HW and SW processing Hardware programming is sometimes considered „design-time“ and software programming „run-time“, but dynamically reconfigurable HW exists and software needs to be designed There is often difference in flexibility of programming (SW better than HW) What is more „powerful“, FPGA, network processor or multi-core CPU?

NICs and monitoring cards 10GE NICs are now commonly available and relatively inexpensive, $1300 / port including XFP transceiver (was $4000 4 years ago) 10GE monitoring cards are few and expensive – DAG (Endace), Napatech, COMBO (Invea-Tech), ~6500 Euro / port and more incl. XFP transceiver (was over 20000 Euro 2 years ago) Two main differences NICs vs. monitoring cards: - some hardware acceleration (filtering, header classification, simple packet statistics) - large packet buffer and block DMA transfer – key difference

10 Gb/s cards that we tested x8 PCI-Express NICs : Myricom Myri-10G Neterion Xframe II 64-bit/133 MHz PCI-X NICs Intel PRO/10 GbE Neterion Xframe Monitoring card: DAG 8.2X (PCI-E) Theoretical bus throughput: - 20 Gb/s for x8 PCI-E - 8 Gb/s for PCI-X

Test setup RFC2544 – Benchmarking Methodology for Network Interconnect Devices Frame sizes 1518, 1280, 1024, 512, 256, 128 and 64 bytes DUT – Device under test, difficult to isolate in case of a PC card tested card PC hardware NAPI driver for NICs Linux 2.6 with standard IP stack MAPI middleware test application (header filter and packet counter)

Processing throughput Maximum IP-layer throughput in Gb/s with zero-loss processing Myricom best among NICs, but marginally DAG 100% line rate Maximum load at zero loss [Gb/s] Packet size [B] Packets/s at 10 Gb/s MyricomIntelXframeXframe IIDAG (for comparison) 15188127448.57.5 6.06.010 12809615387.07.06.56.56.56.55.55.510 102411973186.06.05.55.55.05.05.55.510 51223496243.53.53.03.03.03.03.03.010 25645289862.02.01.01.01.51.51.51.510 12884459461.01.00.51.01.01.01.010 64148809520.50.10.5 10

Processing frame rates Myricom Myri-10G PCI-E Neterion Xframe II PCI-E Neterion Xframe PCI-X Intel PRO/10GbE PCI-X

CPU load CPU load at max. zero-loss throughput For all cards CPU was not bottleneck DAG anomaly for larger frames being investigated

Traffic processing in a PC Example of a modern mainboard – Supermicro X7DB8 with Intel 5000P chipset): In a modern PC, bandwidth of PCI, memory and FSB are sufficient for sustained processing of 10 Gb/s data, bottlenecks are CPUs and NICs. In our case NICs were most likely bottleneck with limit of ~1.3 mil. packets/s.

Cycles per packet 10 Gb/s in 64-byte packets = 14.8*10 6 packets / second 3 GHz CPUs: 4 cores – 806 cycles / packet 8 cores – 1612 cycles / packet 16 cores – 3224 cycles / packet

Packet sizes in live traffic ~ 40% packets near 64 bytes ~ 40% packets near 1518 bytes ~ 20% packets in between (~3% near 600 bytes) average packet size 790 bytes Example: GN2 – CESNET link:

Traffic classification into application-layer protocols Based on MAPI and trackflib library Each protocol requires combination of header filtering and payload searching 2x dual-core 3 GHz Xeon: ~3.5 Gb/s of live traffic zero-loss monitoring (=> 4x quad-core: ~14 Gb/s) Example application: ABW 3.6 Gb/s

Tilera – TILExpress-64 and TILExpress-20G cards 64-cores, 1 or 2 XAUI connectors (Infiniband-style) Many-core processing Other many-core cards exists, but without high-speed network interface (e.g., 128-core NVIDIA Tesla C870 GPU processing board)

1.In hardware: Some monitoring cards have firmware that copies packets into multiple memory buffers based on user-defined load balancing (DSM – Data Stream Management in DAG cards, but more than two buffers available only in NinjaBoxes) 2. In software: One core runs packet scheduler that creates virtual buffers (packets are not copied), not splitting flows Other cores serve virtual buffers, in development … Distribution into multiple cores

Complex zero-loss processing of 10 Gb/s packet stream is possible in a modern PC when two conditions are satisfied: Packet are copied from the network to the PC’s memory efficiently (CPU must not be loaded by this task), this is currently not possible with NICs, but it is possible with monitoring cards Packets need to be distributed among multiple cores Conclusion

Thank you for your attention Questions? ubik@cesnet.cz

Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.

Similar presentations

Presentation on theme: "Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.

Similar presentations

Presentation on theme: "Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware."— Presentation transcript:

Similar presentations

About project

Feedback