Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Venus Performance (Tentative Update )

Similar presentations


Presentation on theme: "Understanding Venus Performance (Tentative Update )"— Presentation transcript:

1 Understanding Venus Performance (Tentative Update 2003-11-04)
Shih-Hao Hung Performance and Availability Eng. Sun Microsystems Inc. Sun Confidential/Proprietary – Internal Use Only

2 Sun Confidential/Proprietary - Internal Use Only
Overview Venus is a PCI card that provides the following functionalities for Solaris/SPARC platforms: High-performance gigabit Ethernet interface High-speed cryptographic engine SSL acceleration IPsec acceleration Our goals are to make sure Software (Apps, OS, and drivers) utilize Venus as efficiently as possible. Venus performs under mixed workload. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

3 Venus Gigabit Ethernet
Venus uses the Cassini chip that is also used by other Sun Gigabit Ethernet cards such as BGCC, Kuheen, etc. One major difference between Venus and other Cassini-based cards – Venus can interrupt only ONE host processor, due to the limitation of the Intel bridge chip on the Venus card. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

4 Venus Gigabit Ethernet TCP Performance Measurement
Netperf Test Uni-directional Throughput Gigabit Ethernet TCP 1-connection RX 620 mbps Gigabit Ethernet TCP 1-connection TX 804 mbps Gigabit Ethernet TCP Multi-connection RX 945 mbps Gigabit Ethernet TCP Multi-connection TX 841 mbps The peak throughput of Venus is on par with the other Cassini-based cards (without MDT). 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

5 New Gigabit Ethernet Features Proposed for Venus 1.1
Hardware Checksuming: Should help reduce CPU consumption, but has a bug at this moment ( ). Jumbo Frames (JF)*: Data show that jumbo frames improves IPsec acceleration by ~3X with SysKonnect cards. Support for jumbo frames may be put in Venus 1.1. Multi-Data Tranfer (MDT)*: MDT is already in Solaris 9 Update 4 for Cassini, saving CPU cycles and improving efficiency (i.e. Mbps/Mhz ratio) by up to 60%. Data show 5-10% performance gain on SPECweb99 and Netperf with MDT-enabled Cassini card and driver. * JF & MDT may not be supported in 1.1. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

6 Sun Confidential/Proprietary - Internal Use Only
Venus Crypto Engine Venus has two Broadcom 5821 crypto chips. It is possible for Venus hardware to offload the following crypto operations: Public-key ops: RSA (512-bit, 1024-bit, 2048-bit) and DSA Bulk encryption ops: RC4*, DES, 3DES Hash ops: SHA1 and MD5 * RC4 support is disabled in the Venus 1.0 driver. Software crypto is available for fail-over and small tasks. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

7 Sun Confidential/Proprietary - Internal Use Only
Our Performance Data We have conducted performance testing on the following platforms: 2-way 900mhz Sun Fire 280R 8-way 900mhz Sun Fire V880 12-way 1200mhz Sun Fire 6800 The per-CPU numbers presented are based on the 900mhz UltraSPARC III cu processor. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

8 The Venus Crypto Engine
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

9 Venus Crypto Hardware Performance Measurement
Performance Metric Measured Comments RSA performance (1024-bit) per CPU 4500 ops/s Need 20% CPU RSA performance (1024-bit) per Card SHA1 performance per CPU (16KB chunks) 1215 mbps SHA1 performance per CPU(1.5KB chunks) 198 mbps 3DES performance per CPU (16KB chunks) 686 mbps 3DES performance per CPU (1.5KB chunks) 190 mbps SHA1 performance per Card (16KB chunks) 1373 mbps SHA1 performance per Card (1.5KB chunks) 413 mbps Improved 50% in 1.1 3DES performance per Card (16KB chunks) 887 mbps 3DES performance per Card (1.5KB chunks) 382 mbps SHA1+3DES performance per Card (16KB chnk) 540 mbps MD5 performance per Card (16KB chunks) 1411 mbps MD5 performance per Card (1.5KB chunks) 411 mbps 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

10 Venus Crypto Software Performance Measurement
Performance Metric Measured Comments RSA performance (1024-bit) per CPU 240 ops/s Much better than stock OpenSSL (120 RC4 performance per CPU (16KB chunks) 877 mbps All tests stress CPU to 0% idle 3DES performance per CPU (16KB chunks) 72 mbps 3DES performance per CPU (1.5KB chunks) 70 mbps SHA1 performance per CPU (16KB chunks) 428 mbps SHA1 performance per CPU(1.5KB chunks) 392 mbps MD5 performance per CPU (16KB chunks) 682 mbps MD5 performance per CPU (1.5KB chunks) 623 mbps 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

11 The Venus SSL Performance
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

12 Venus SSL Support (Tentative, check when spec. is final) Cipher
Sun One Web Server (NSS) Apache Web Server (OpenSSL) Software Hardware RSA Yes default RC4  No No  3DES Disabled SHA1 MD5 (Tentative, check when spec. is final) 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

13 Sun Confidential/Proprietary - Internal Use Only
Venus SSL Performance Performance Metric Target Measured Comments SSL Transaction Rate per Card 6000 HTTPS/s 4300 HTTPS/s BugID , Firmware Issue SSL Transaction Rate per CPU n/a 350 HTTPS/s S1WS 6.0 SP5, on par w/Deimos 425 HTTPS/s Apache , on par w/Deimos SSL 3DES Bulk Encryption (SW) per CPU 28 mbps Apache SSL 3DES Bulk Encryption (HW) per CPU 100.5 mbps 19.2 mbps S1WS 6.0 SP5 77.6 mbps S1WS 6.0 SP5* SSL 3DES Bulk Encryption (HW) per Card 900 mbps? 835 mbps SSL 3DES+SHA1 Bulk Encr. (HW) per Card 528 mbps S1WS 6.0SP5* * HW bulk encryption support is disabled by default for S1WS for Venus 1.0. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

14 Venus SSL Additional Performance Issues
Enabling HW bulk encryption support cause extra overhead for key management operation: SSL handshake performance is reduced by 33% (BugID ) Short-term fix: disable HW bulk encryption by default; offer a mechanism for users to enable the support. RFE# : Should find a way to reduce the key management overhead. Update ( ): The gap has been shrunk to ~14% with latest Venus 1.1 software. Enabling HW bulk encryption support may limit the SSL throughput Affect mostly large systems Customer may choose to disable the support, or buy additional cards. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

15 The Venus IPsec Performance
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

16 Solaris IPsec Performance Issues with 3DES
The Stock Solaris 9 (update 3) IPsec-3DES is slow and does not scale. 3DES code is not optimized. 3DES jobs are done synchronously. Packets are processed sequentially. 28 mbps on a 2-way 900mhz E280R, only one CPU is utilized. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

17 Venus IPsec Design Considerations
Accelerates DES/3DES encryption/decryption via: Asynchronous processing by KCL2 job scheduler, Performance-optimized software crypto, Hardware offloading engine, Must process jobs at Ethernet packet size, 1460 bytes, which is much smaller than the SSL chunk size. A big constraint for hardware offloading, a big issue of IPsec acceleration compared to SSL acceleration. Impacted by hardware offloading overhead Packets < 512 bytes are not offloaded – overhead too costly Light weight ciphers such MD5 and SHA1 are harder to benefit from hardware offloading. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

18 Venus IPsec Implementations
Venus accelerates IPsec in one of the following two forms: Out-of-band: Packets are sent to Venus crypto for encryption, and then sent to any NIC for transmission. Packets are received from any NIC, and then sent to Venus crypto for decryption. In-band (pending Solaris 9 Update 5): Packets are sent to Venus crypto for encryption and transmitted via Venus NIC in one trip. Packets are received by Venus NIC and decrypted by Venus crypto before entering the host. The in-band implementation will really reflect the strength of Venus, but it requires significant changes to the network stack. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

19 Venus Out-of-Band IPsec
ipsecesp Venus out-of-band IPsec requires minor changes to an existing system: New modules replacing encrdes/encr3des modules. For pkt < 512 bytes, swcrypto handles 3des For pkt >= 512 bytes, KCL handles 3des KCL sends jobs to vca for hardware offload when a Venus card is available KCL sends jobs to its software crypto when hardware offloading is not available. Venus encr3des pkt <512 pkt >=512 swcrypto KCL no hardware hardware ok Software 3des vca Hardware 3des 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

20 Venus IPsec Performance Benefits
Accelerates IPsec-3DES throughput To 105 mbps on a 1-way 900mhz E280R. 375% speedup compared to stock S9u3. Improves throughput scalability Asynchronous crypto processing scales throughput to 210 mbps on 8-way 900mhz V880. 750% speedup compared to stock S9u3. Reduces IPsec latency. Asynchronous crypto processing improves parallelism and hence reduces the latency in 3DES encryption/decryption. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

21 Venus IPsec TCP Unidirectional RX Throughput
Performance Metric Measured Comments IPsec 3DES (Stock Solaris Software) per CPU 28 mbps Stock Solaris 3DES implementation is not good IPsec 3DES (Venus Software) per CPU 54 mbps Venus 3DES software implementation is better IPsec 3DES (Venus HW out of band) per CPU 105 mbps Significant overhead to drive Venus hardware IPsec 3DES (Venus HW out-of-band) per Card 260 mbps IP does not scale beyond 4 processors IPsec 3DES (Venus HW in-band) N/A Pending Solaris 9 Update 5 Per CPU numbers measured on 900mhz E280R. Per Card number measured on 12-way 1.2Ghz SF6800. The TX or bi-directional throughput is similar to RX, but is ~15-20% slower. The on-going FireEngine project may be able to address this issue by making IP MT-hot. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

22 Sun Confidential/Proprietary - Internal Use Only
IPsec Latency IPsec adds substantial latency, and thus affects mostly Applications that demands low network latency. The transaction rate for single-threaded applications. Venus reduces IPsec latency via fast and asynchronous crypto processing, The graph shows latency reduction by Venus software and hardware. Tuning can be applied thru Encr3DesTuning and unloading the vca module to minimize latency for specific apps. Note: Encr3DesTuning is set to 256 in this set of data. Default is 512. 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

23 Jumbo Frames and Venus IPsec Acceleration
Venus IPsec acceleration is sensitive to packet size. Significant overhead for regular Ethernet packets (MTU=1500). Overhead reduced for bigger MTU (Jumbo Frames). Performance data measured with SysKonnect 9821 Ethernet card and Venus out-of-band IPsec acceleration show ~3X performance. 2-way SF280R MTU (bytes) Throughput (Mbps) Regular Frames 1500 145 Jumbo Frames 9000 393 8-way V880 MTU (bytes) Throughput (Mbps) Regular Frames 1500 210 Jumbo Frames 9000 600 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

24 The Venus Performance under Mixed Workload
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

25 Venus Performance Under Mixed Workload
Possible scenarios: Mixed non-IPsec and IPsec traffics Mixed non-IPsec and SSL traffics Would NIC operations interfere with crypto operations? Yes, because both the NIC and the crypto chips share one interrupt line. NIC can generate interrupts much more rapidly than the crypto chips typically do. BugID: 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

26 Venus Performance Under Mixed Workload (cont.)
Crypto performance suffers when network traffic is high. 30% to 90% 3DES performance degradation (hurts IPsec) 50% to 80% RSA performance degradation (hurts SSL) Ideal (long-term) fix would be to have separate interrupt lines for crypto and NIC. Workaround is available: Use rx-intr-pkts and rx-intr-time to limit the interrupt rate from the NIC. However, it reduces NIC performance up to 30%. Still Working on bug fixes in 1.1 ( ). 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

27 Sun Confidential/Proprietary - Internal Use Only
Summary 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

28 Sun Confidential/Proprietary - Internal Use Only
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

29 Extra Materials for Technical Discussions
11/20/2018 Sun Confidential/Proprietary - Internal Use Only

30 Netperf TCP_RR Transaction Latency (sec)
IPsec TCP_RR Latency Message Size Netperf TCP_RR Transaction Latency (sec) ClearText IPsec SW-SW SW-HW HW-HW Sol9_ipsec Improved 1 16 64 128 256 512 1024 1460 4096 8192 16384 11/20/2018 Sun Confidential/Proprietary - Internal Use Only

31 Netperf TCP_RR Latency
11/20/2018 Sun Confidential/Proprietary - Internal Use Only


Download ppt "Understanding Venus Performance (Tentative Update )"

Similar presentations


Ads by Google