Presentation on theme: "IP Platforms Best Practices for Performance"— Presentation transcript:
1IP Platforms Best Practices for Performance Pierre LamyTechnical Lead Ottawa TACApril 2010
2Intro and revision history This document describes methods and techniques that users can apply on various Check Point IP Security Appliances to achieve optimal performance.Version 1.0 October 2009 Word formatVersion 1.1 January 2010 Word + PPT, minor revisionsVersion 1.2 April 2010 Updates
3General Performance Best Practices These guidelines are Appliance independent and do not require any special tuning.Always use the latest versions of Check Point products. Always upgrade to the most recent HFA (HotFix Accumulator) for a given version.Create a small block of rules near the top of your rulebase, containing the most heavily used rules. These rules should be fully accelerated with SecureXL.Keep the rulebase simple and small. Reduce the number of rules by combining similar rules together. Rules which disable SecureXL acceleration should be placed very low in the rulebase.If not using VPN (encryption) on the module, make sure the VPN product is disabled on that module.
4General Performance Best Practices Do not use QoS from IPSO or Floodgate.Avoid using Domain Objects. DNS lookup takes additional CPU cycles.Avoid using UFP URI Filtering Protocol as this is resource intensive.Use Networks instead of address ranges for Network Address Translation.Keep logging to a minimum. Only business critical rules which will be analyzed should have logging enabled. Drop rules, Accept rules, Stealth rules, Cleanup rules and Implied rules should not log unless there is a clear business case and the customer intends on analyzing the logs on a regular basis. Otherwise logging should only be used for debug purposes.IP Cluster members should have exactly the same package lists, having dissimilar packages can cause state sync issues resulting in a performance reduction.
5General Performance Best Practices When you install an ADP interface module in an appliance, the network processor in the card performs all VPN encryption and decryption, even for VPN packets that ingress or egress through non-ADP interfaces. The built-in Nokia encryption accelerator continues to accelerate IKE traffic but does not perform any other processing.If VPN traffic ingresses or egresses through a non-ADP interface, throughput is negatively affected because the packets must transit the appliance’s backplane to reach the network processor in the ADP module. It is recommended that one configure VPNs to use only ADP interfaces to avoid this performance loss.
6General Performance Best Practices Uniprocessor systems (IP152, IP292, IP395, IP565, IP1265) should use IPSO 4.2 (this is correct in April 2010) while Multiprocessor systems (IP695, IP1285, IP2455) should use IPSO 6.2. The latest build of any major release should always be used.Multiprocessor systems should have Check Point R70 installed to take full advantage of CoreXL technology. sk40465 has some more details about this.Use R70 + IPSO 6.2 on uniprocessor systems where there is a need for a specific featureDo not persist in the use of IPSO 4.2 or R65 once support is no longer offered for these products.
7General Performance Best Practices Using interface flow control can reduce network throughput on busy interfaces and we do not suggest it be enabled.Avoid using SmartView Monitor to constantly monitor system performance or collect historical data, as SmartView Monitor itself has an impact on performance.Avoid using custom scripts on systems which have performance issues, as the scripts will incur CPU resources.
8General Performance Best Practices A CST from the IPSO system as well as a cpinfo from the management station, are CRITICAL to provide to Check Point Support when opening a case for assistance in troubleshooting performance issues. Without at least those files, Check Point Support will be unable to assist the customer. For systems with extremely high CPU, or where running CST may cause problems, it is recommended to run it with the following syntax: “nice +20 cst”. The command may take hours to complete but will not divert critical system resources from processing traffic in a live environment.Any cpinfo that is provided to Check Point Support MUST be generated using the latest cpinfo tool downloaded from the Support site. This requires uninstalling the old cpinfo, and then installing the latest one. Providing old cpinfo output to Check Point Support will delay Support response.
9General Performance Best Practices Avoid using Standalone installations. Separate the management station from the enforcement point by running the management station on another system.Use the default settings in the capacity optimization tab of the enforcement point’s properties, changing only the total connections number.General recommendations for these platforms are to use onboard quad ports for:Security Gateway State Synchronization trafficCluster protocol network trafficPolicy and Appliance management trafficA path from the enforcement point to the Check Point Log serverSecureXL options should be matched between SXL and the Security Gateway settings; for example Sequence Validation and Delayed Notifications.
10System ResourcesHigh performance relies on the availability of key system resources: CPU, memory, and network interface bandwidth.Tuning involves better using the current hardware, not simply upgrading.
11System Resources - CPUSmall packet size traffic: The amount of traffic any network device can process is not determined by byte throughput numbers, but rather by the packets per second. A small packet uses as many resources as a large full size ethernet packet.CPU utilization is incurred on a per-packet level, rather than per-byte. Therefore it is critical to note that a system that is processing a large number of small packets, works as hard as a system processing the same number of large packets. There may be a very large difference in the apparent byte throughput between the systems.State synchronization particularly demands high CPU cycles because the nodes in a cluster perform synchronization for every connection they encounter. This ensures high availability but causes high CPU usage. This is especially important to consider, when deciding whether or not to synchronize short lived connection types like DNS when using a VRRP pair – the failover time of VRRP exceeds the DNS timeout. It would not be advisable to sync DNS connections in that scenario.
12System Resources - CPUHigh number of logging rules affects CPU. Logging uses CPU cycles and is discouraged where there is no need.The Active Log feature in SmartView Tracker will severely compromise the ability of an enforcement point to process traffic, and for performance reasons should not be used.Accounting Logging: By default, accounting logging produces two kinds of log tracking (one in Log Viewer and one in Account Log View) for the same connection. Alerts as a tracking option, also have a significant negative impact on performance.
13System Resources - CPUAny configuration which disables SecureXL or forces traffic to use Slowpath affects CPU. This includes rule configurations, SmartDefense protections, and Floodgate.NAT traffic incurs slightly more CPU impact that non-NATed traffic. NAT in an ADP environment is strongly discouraged as connection establishment rate acceleration does not work on NAT traffic.Traffic which is not connection-rate accelerated, uses more CPU resources than traffic which is. Connection establishment and teardown uses (relatively) a lot of CPU resources, even when all SecureXL acceleration is in use.
14System Resources – Memory Check Point recommends upgrading any Appliance’s memory to its full capacity to improve performance.The main factors that demand high RAM usage are:Concurrent connectionsConcurrent VPN tunnelsNAT connectionsSecurity ServersUse the Web User Interface to determine how much memory the Appliance has installed. In the Web User Interface navigation tree, select Monitor --> System Utilization --> CPU-Memory Live Utilization. Look for the “Total Real Memory” value. The top command can also be used on recent IPSO versions to view the information on the command line.The amount of memory, allocated to the Check Point Security Gateway to process network traffic, is determined under the capacity optimization tab of the gateway object properties. The Automatic values should always be used, and the manual setting of memory allocation should NEVER be used unless directed to by Check Point Support.You should always set the capacity optimization to the maximum values (connections) supported by the platform and memory configuration as detailed in the release notes for IPSO. There is no drawback to doing this, but leaving the low defaults in place will results in insufficient memory on busy systems.
15System Resources – Memory Two factors can reduce the number of Security Gateway connections that can be supportedConcurrent VPN tunnels are dependent upon the amount of memory available in the Appliance. As you add more VPN tunnels, the number of Security Gateway connections an Appliance can support will decrease.Security Servers will reduce the maximum number of supported connections as they write to temporary files and use 8 entries in the connections table.
16System Resources - Network Interface Bandwidth When the Security Gateway performance reaches the limitation of the interface bandwidth and the CPU is still not fully utilized then the bottleneck is the interface. One option to increase performance in this case is to use more ports via Link Aggregation to achieve the maximum performance.The limitation of the network interface is determined by the amount of packets per second it can process. Assuming 1518 byte frames, a 100megabit NIC port can sustain ~8234 packets per second (pps ) in each direction (full duplex). A 1gb NIC port can sustain ~82340 pps in each direction (full duplex).While the network port may sustain much higher numbers of pps than these – and this is often seen in the field and QA – there is no guarantee that it WILL support more than these standard pps numbers for a given link speed.
17System Resources - Network Interface Bandwidth If the number of packets per second exceeds those numbers, using Link Aggregation to combine 2 or more links together is possible, thus increasing the amount of bandwidth in pps that can be processed on that logical network interface.IPSO Sync should not need more than a 1gb interface, and there can be problems when IPSO Sync is run over a Link Aggregation group. The speed of the IPSO and Check Point sync interface should be as fast as the fastest NIC port on the system. In a 10gb environment, typically 1gb for sync is sufficient. State sync should always be on an isolated VLAN or network segment. Note that it is NOT supported to run Check Point state sync over any interface VLAN other than 0 / 1 / untagged. Any VLAN’ing done on the switch access port is fine, but trunking is not supported as Check Point state sync is not designed to support the extra frame sizes.Note: Security Gateway state sync link aggregated interfaces should be directly connected using cross-over cables, unless you are using 3 or more cluster members. IPSO and Check Point state sync should always be on an isolated VLAN or network segment.
18IP ClusteringIP Clustering provides both high availability and scalability. IP Clustering is useful when the performance of one system alone is insufficient to provide the desired level of performance. For example, when an Appliance CPU reaches ~30%, it would be recommended to add another Appliance to form a two-member cluster that can scale the Security Gateway performance. This is a capacity planning exercise that Check Point Sales engineers can help with. The 30% number is considered an industry standard measurement or indicator that suggests more capacity should be added.IP Clustering is especially beneficial when using SmartDefense features. With all SmartDefense features enabled, a two-member cluster’s HTTP transaction rate is about 40% higher than a Standalone Appliance.
19IP ClusteringUse dedicated interfaces for cluster protocol networks and state synchronization; do not share interfaces with the production traffic.It is strongly recommended to use separate interfaces for cluster protocol network and Security Gateway synchronization traffic so that they are separate Broadcast domains.Use a bandwidth of at least 100 Mbps full duplex for IPSO sync interface(s). 1gb is recommended.Use switches, not hubs, and never use crossover cables for IP Clustering protocol networks.
20IP ClusteringDo not use IP Clustering Forwarding mode, when performance is a concern. Unicast and Multicast provide better performance and less latency. Forwarding mode is a fallback mode, for when feature-poor network switches are in use.If IGMP snooping is in use on the switch, disable it or configure static CAMs in order to allow Multicast traffic on specific ports.Use dynamic cluster work assignment for optimum load balancing. This allows the cluster to periodically rebalance the load by moving active connections between nodes.Use delayed synchronization if your system processes many short-lived connections, you are in VRRP or Standalone, and SXL templates are in use. A 30 second delay in synchronizing connections can boost the performance by about 20%. If you use Check Point delayed notifications, you must also enable SecureXL delayed notifications.
21ADPAddition of ADP will increase performance of the appliance with some limitations explained below. The decision to purchase ADP add on cards should be made in consultation with Check Point Sales. ADP should be considered if the performance improvement desired falls in to one of the following categories:Packet throughput performance, specifically for small packets.Performance improvements for packet streams with mixed packet sizes.Encrypted traffic (VPN) forwarding.Long-lived connections performance.For example data transfer rates for protocols like ftp, http etc.NAT Performance only for long-lived NAT connections. (ADP accelerates NAT throughput. Connection-rate Acceleration is not currently supported for NAT connections. XMC cards should be used instead for high NAT & CPS.)Latency for both un-encrypted and encrypted traffic.Multicast throughput performance.
22ADP Performance issues with mixing ADP with non-ADP interfaces: The best performance one can get is by not mixing traffic between ADP and non-ADP interfaces. Running in mixed mode will have performance impacts. When run in dual-mode – having separate ADP traffic flows and separate non-ADP flows one can see the Appliance performance scale as opposed to not using any ADP interfaces at all.
23ADP Benefits Throughput Acceleration: The first packet in a connection is sent up the stack to the Security Gateway that validated the packet based on the defined rule base. Once validated the Security Gateway application tells IPSO via the SecureXL API to handle future packets in the same connection. IPSO then instructs the ADP sub-system to create a bi-directional flow for that connection. All future packets for that connection will now be processed by the ADP sub-system.The following protocols benefit from SecureXL & ADP Throughput acceleration:TCP, UDP, & traffic carried over those protocolsIPSec VPN accelerationMulticast forwardingPIM (from IPSO 3.9 for IP2250 & IP2255; from IPSO 4.2 for all platforms)GRE & ESP
24ADP Benefits Connection Rate Acceleration The first packet in a connection is validated by the Security Gateway application. Once validated, the Security Gateway instructs IPSO to create a template so that IPSO can validate future connections where only the Source Port differs. A template consists of the following attributes: SrcAddr, SrcPort, Proto, DestAddr, & Dest-Port. IPSO compares the first packet in the next connection to its template table. If the packet matches a template then IPSO adds the connection to its table, then instructs ADP to create a bi-directional flow for the connection and lastly informs the Security Gateway about the new connection. All future packets are processed by the ADP module.The following protocols benefit from SecureXL & ADP Connection-rate acceleration:Unencrypted TCP, UDP, & traffic carried over those protocolsParticularly effective on HTTP 1.1 trafficEven more effective on HTTP 1.0 trafficHTTP 1.0: Separate connection for each HTTP component
25ADP Best PracticesConfigure traffic to flow in/out of the same ADP subsystem, since traversing to another ADP subsystem or worse to a non-ADP interface will negatively impact throughput performance. There is a 10gb full duplex bandwidth over the crossbar between 2 ADP subsystems.Do not use ports connected to the ADP subsystems for cluster protocol network or for Security Gateway state synchronization. Use onboard ports for Security Gateway state synchronization; this will guarantee that the synchronization data goes to its own channel and will avoid sync packets being lost. This also prevents the sync data from disrupting the data passing between the ADPs and the main CPU. Likewise do not log to the management station via the ADP interfaces.
26ADP Best PracticesNote that the backplane connecting the ADP subsystems to the main CPU has limited bandwidth, and this bottleneck will impact throughput performance when there is a lot of non-accelerated traffic.Do not combine non-ADP ports and ADP ports in a link aggregation group or redundancy group.Do not include interfaces on different ADP I/O cards in the same link aggregation or redundancy group. IP Security Appliances do not support cross ADP link aggregation.SecureXL is enabled by default. Note that it’s critical not to disable SecureXL because SecureXL is required for ADPs to function.
27ADP Best PracticesAvoid performing tcpdump or fw monitor on ADP interfaces when the interfaces are under heavy load. Performing a tcpdump or fw monitor on an ADP interface forces all traffic received or transmitted by the ADP subsystem, to be copied and piped to IPSO through the backplane. SecureXL will still be used,and the ADP will still accelerate, but the backplane will be choked with data. This causes a significant degradation in performance due to constricted backplane capacity.
28Limitations of ADPTraffic that is not throughput or connection rate accelerated will not benefit from ADP acceleration. All limitations of SecureXL apply to ADP.Transparent mode will accelerate traffic normally, however a special design consideration is that there must be routes pointing out of the xmode interfaces, as SecureXL depends on caching route table lookups.
29Limitations of ADPEnabling Sequence verifier: this solution requires enabling sequence verifier option on IPSO as well as in SmartDashboard.This solution was suggested after analyzing the CST, where it was observed that most (60 million out of 80 million) of the TCP connections were getting closed with RSTs instead of the usual 3-way handshake for terminating TCP connections. As part of the stateful inspection, the Security Gateway needs to monitor all TCP RSTs if sequence verifier is not turned ON, as they are categorized as untrusted RSTs. This behavior of terminating TCP connections causes additional load on ADP backplane interfaces, where the packet drops were observed.By turning ON sequence verifier ADP will perform sequence verification on all TCP connections thereby validating even RSTs that are used to terminate the TCP connections. Once the TCP RST is validated by ADP and is accepted there is no need to send this packet to the Security Gateway, thereby reducing the backplane traffic and the overhead of the Security Gateway having to inspect these packets.
30Limitations of ADPOnce the sequence verification is turned ON, you should see significant reduction in packets going over the backplane to the Security Gateway. This can be monitored by executing the following commandipsctl –i net:dev:adp:if:stats with an “r” option to monitor the rate of tcp_rstAs a result of the reduction in TCP RSTs going over the backplane, we should observe less drops of data packets on backplane which can be monitored by executing the following command.ipsctl –i net:dev:bp:msg:stats with an “r” option to monitor the rate of rx_fc_drops
31Limitations of ADPToo many control messages queued up on eth1, will result in data loss on the data channels eth2-4. The queue depth for the control channel is tunable, the default is 64 in IPSO 6.2. Potential values are 128 and 256:ipsctl -w net:dev:bp:msg:delay_drop_limit 128You should see a decrease in the rate of dropped packets on data channel. This can be monitored by executing the following command:ipsctl –i net:dev:bp:msg:stats with an “r” option to monitor the rate of rx_fc_dropsIf this solution does not yield the desired result then the delay_drop_limit can be easily set back to its default value of 64. Setting of these ipsctl variables takes effect immediately and is non-intrusive.
32Limitations of ADPTurning off the delay_drop variable: this solution requires changing the default value of delay_drop, an ipsctl tunable in IPSO. IPSO pro-actively drops data packets, when control channel is congested. This option can be completely turned off by executing the following commandipsctl –w net:dev:bp:msg:delay_drop 0The option of dropping data packets when control channel is congested was developed under certain performance benchmarking conditions, where the box is tested for limits and the aggressive load conditions persist for an extended period of time. This is the reason for aggressive drop_delay_limit to 64 by default.Unfortunately, this condition also comes into effect when there is transient congestion on the control channel. By turning off this feature, we do not drop the data packets pre-maturely.Current congestion level and the max congestion level on the control channel can be monitored by executing the following command:ipsctl –i net:dev:bp:msg:stats with an “r” option to monitor the rate of bms_scheds and bms_scheds_maxIf this solution does not yield the expected result, then you can revert back to default behavior immediately by executing the following command.ipsctl –w net:dev:bp:msg:delay_drop 1
33Limitations of ADPThe PSL acceleration feature should be enabled on multicore systems using SecureXL and CoreXL, with or without ADP. The ipsctl tunable can be found usingipsctl –a net:sxlPSL acceleration allows full acceleration of all but the last packet containing the application level Protocol Data UnitIPS / SmartDefense takes care of a go/no-go to drop denied connections.
34Security Gateway Performance Tuning NATIP Appliances do not support Network Address Translation (NAT) connection acceleration. The first packets of the first connection on the same service are forwarded to the Security Gateway application. Then a “template” of that connection is created so that subsequent TCP establishments on the same service, where only the source port is different, will be accelerated by SecureXL. NAT connections setup and teardown cannot be accelerated because NAT templates are not supported.While each connection uses two entries in the flows table, connections involving NAT use four entries. NAT connections use more CPU and memory resources compared to ordinary connections.
35Security Gateway Performance Tuning Rulebase SizeAlthough there is no limit to the number of rules in a Security Gateway database, there is a performance impact as the number of rules grows. The more rules an Appliance has, the more it will cost the Appliance in compilation time and runtime efficiency.Rulebase size affects connections rate performance.Rulebase order is important and can affect performance. Use the following guidelines for organizing the rulebase:The rulebase should be as simple as possible. With fewer rules the rulebase will be more efficient and less error prone.When creating a rule, be specific. Narrow down the source, destination, and service.Avoid using “Any” in the service field.The most active NAT rules should be at the top of the NAT rulebase.Defining Group Objects for networks allows the policy compiler to superset traffic in the actual rule for a performance gain.Anti spoofing should be configured for all the Security Gateway interfaces.Avoid using “negate” in the rulebase (For example, a network exclusion)
36Performance Troubleshooting Follow the guidelines to troubleshoot performance issues and use the best practices outlined above to optimize the Appliance overall performance.Do NOT use fw monitor for performance troubleshooting. Connecting a span port via a switch, is preferred to tcpdump. Traffic captures sent to Check Point Support should not exceed 80mb uncompressed. Particular lines within a packet capture should be referenced by the customer as needed.Check free disk space using the “df –k” command line tool.
37Performance Troubleshooting Check currently used CPU statistics using the “vmstat 1” tool. The last 3 columns are significant, customers should never concern themselves with the other columns. The very last column is CPU Idle time, this is the amount of free CPU cycles, in percentage, since the last vmstat iteration. The second-last column is System CPU usage in percentage, this includes IPSO and Check Point kernels as well as interrupts. The third-from-last column is User CPU utilization, this is usually due to Policy Installation, SmartDefense, Security Servers, and user scripts or commands.
38Performance Troubleshooting The “top” command line utility is used on recent IPSO versions, this utility will provide per-process CPU and memory utilization, as well as global statistics, and more granular CPU statistics such as % of interrupts. The percentage of interrupts includes both software interrupts and hardware interrupts. Hardware interrupts are virtually never the cause of performance issues, performance issues are virtually always caused by software interrupts.If it can be shown that the performance problem is due to a high ratio of interrupts compared to overall CPU utilization, the fix is always first to properly tune SecureXL.
39Performance Troubleshooting SecureXL Acceleration statistics can be verified using the following commands:fwaccel statfwaccel statsfwaccel stats –sfwaccel templates –sThese commands should provide a good overview of how much SecureXL is in use. The SecureXL and Nokia IPSO Guide (http://downloads.checkpoint.com/dc/download.htm?ID=10036) should be used to help tune the rulebase, and ensure that as many connections and packets are accelerated as possible.
40Performance Troubleshooting If dropped packets are a concern, they can be checked as a snapshot in IPSO by running the command “ipsctl –a ifphys | grep qdrop”. This will provide an indication of which interfaces are dropping traffic. Please note that these counters are incremental since system boot time. For more information about qdrops, consult sk39462.To view realtime statistics for a particular counter, run ipsctl –i <full-counter-path> use use the “r” command to toggle the rate per second counter refresh.Any qdrop which is logged will have a corresponding reason code incremented, inifphys:<interface_name>:errorsifphys:<interface_name>:statsCommon drops are rx_mpc, which represents the Receive, Missed Packet Count, and symerrs, Symbol Errors.
41Performance Troubleshooting rx_mpc is due to the operating system not being able to flush the receive buffer for the interface fast enough. The receive buffer queues up incoming data which has been received on the physical media, and is flushed from the queue every time there is an interrupt. The interrupt is triggered under 2 conditions – when the rx_ring has reached ¼ full, or after a timer is hit. More information about these processed is detailed in sk There are advanced tunable variables that can be used under direction of Check Point Support to influence this behavior.Symbol Errors are due to a bad fiber network cable, dirty or dusty NIC port or fiber connectors. It can also be due to a bad NIC card or switch port but this is very uncommon. Symbol errors only increment for received data for the local side, not sent data. More information about symerrs is detailed in sk39733.
42Performance Troubleshooting A lot of short lived connections transiting the enforcement point can cause slowdowns, as connection establishment and teardown incurs CPU utilization. This can be partly mitigated, by ensuring templates can be created for the most heavily used rules. You may also be able to use the Fast Expire SecureXL feature.Fast Expire should be used primarily for short lived connections such as DNS.
43Performance Troubleshooting Use ifconfig to verify that no interfaces have the PROMISC flag set. An interface in promiscuous mode forwards all frames seen on the physical media, to the operating system for Layer 2 filtering. An interface in non-promiscuous mode, uses a MAC chip to filter Layer 2 frames to ensure that only frames which are destined for the local machine are passed to the operating system. This is determined based on Unicast/Broadcast/Multicast MAC Address lists in the Receive Address High and Low registers. PROMISC is set for Transparent Mode and this is normal behavior.Customers may wish to view the Security Gateway connection tables in human readable format to help with rules optimization. Check Point Support has internal-only tools to read the output of “fw tab –u”. Customers may be interested in an unsupported third-party script which can be found atCheck Point makes no guarantees about this product, and provides this information as reference only.
44Advanced debugging and tuning Advanced debugging and tuning should only be carried out under the direction of the Escalations group, or Development.