Ten Commandments Of TCP/IP Performance Inside Products, Inc. www.inside-products.com (831) 659-8360 sales@inside- products.com Nalini Elkins Inside Products, Inc. Nalini_elkins@inside-products.com
How Does TCP/IP Work? To find problems in TCP/IP, lets start with thinking about what TCP/IP is. TCP/IP helps the network to get information - packets - from one place to another through some network equipment - routers, switches, etc. What so hard about this? Packet 2 Packet 1 Packet 2 Packet 1
Networks are Complex There are hundreds of thousands, even millions of connections in a network Finding just which one has problems is a daunting task.
Network diagnostics involves decoding multiple layers of protocols. TCP IP MQSeries HTTP LDAP LPR/LPD FTP TN3270 UDP IPv6
Ten Commandments 1.Thou shalt monitor thy application backlog queue 2.Thou shalt not kill thy network by many short connections 3.Thou shalt drop unused connections 4.Thou shalt honor thy TCP duplicate ACKs and thy retransmissions 5.Thou shalt relate thy TCP resets to the cause. 6.Thou shalt not fail in watching thy TCP attempt fails 7.Thou shalt delve deeply into UDP no ports errors 8.Thou shalt address the reason for your IP address errors 9.Thou shalt not convert thy applications directly from multi-dropped SDLC 10.Thou shalt not use two packets when one will do
Thou shalt monitor thy application backlog queue How does backlog queue work? Example below: –Application can have 5 active connections –10 connections can be in the backlog queue –When the 16 th connection comes in, it is rejected How to monitor? See above –Portion of Netstat All display –SNMP MIB will also show queues
What is going on? User is a U.S. state government. Using a CICS accounting application, sometimes the user gets a Connection Refused other times, the session initiation just hangs. Technical support has no idea what to do. Users are angry! The problem has gotten escalated to just below the governors office.
Backlog Queues We looked at the application backlog queues. We see that the backlog queues are being exceeded and connections are dropped.
Connection Refused… When the backlog queue is exceeded, then the users get a Connection Refused message. This is actually better than…. If the user is stuck in the backlog queue, then they just see an hourglass on the terminal and they appear to be hung! This is even more frustrating.
The Real Problem… The installation tried a number of ways to speed up the application. The vendor of the application is being contacted for assistance.
How to Find This! - One way is via the z/OS SNMP MIB. - The counts are also shown per socket on the Netstat All command. If you do a Netstat all (IPAddr 0.0.0.0 then you may see all the listener connections. - But, remember, these are all just snapshots. You should be monitoring these continuously.
SOMAXCONN SOMAXCONN is used in conjunction with the backlog queue value specified in application programs. As a socket connection request arrives at the TCP/IP stack and the server is busy processing a previous request, the new request is queued up to the amount specified with the SOMAXCONN parameter. When that number is exceeded, TCP/IP connection requests will timeout and get refused. The value specified in the backlog queue can not exceed that of SOMAXCONN. No error will be given, but the value of SOMAXCONN will be used. SOMAXCONN is set per listener. In other words, the SOMAXCONN value is not cumulative for all listener ports. SOMAXCONN: Specifies maximum length for the connection request queue created by the socket call listen(). Sample: SOMAXCONN 10 This is the length of the backlog queue.
Thou shalt not kill thy network by many short connections Each connection establishment requires a flow of packets If there is a lot of data flow, better to have a long connection with many packets than multiple short connections.
TCP Many New Connections We were led to the problem of possible unneeded sessions by noticing that hundreds of new connections were made and terminated for TCP port 23. We investigated the possible sources for this activity.
Many Connections for Port 23 When we look at the IP addresses with connections to port 23, some stand out. IP address 10.111.1.190, in particular, was responsible for 53% of the connections.
Thou shalt drop unused connections 1.Many unused connections come from Voice Response Units (VRUs) 2.Others may be from scripts which use TN3270. 3.Minor modifications may help: fewer connections at longer intervals. (Instead of 100 connections every 3 minutes, 50 connections every 10 minutes.) 4.Investigate persistent connections, connection pooling.
Thou shalt honor thy TCP duplicate ACKs and thy retransmissions What do the dup acks and retransmissions have in common? The same subnet The same time of day The same socket application The same route - set of hardware ACK 101 ACK 201
TCP Retransmits By Port Notice that port 23 is responsible for 96% of the retransmits. Lets see about remote addresses
TCP Retransmits By Remote Address Five remote addresses are responsible for over 80% of the retransmits. Duplicate acknowledgments show a similar pattern to the retransmits.
Thou shalt relate thy TCP resets to the cause A RESET packet is sent by TCP to abort a connection. May or may not be a problem - closing an idle connection is proper On the other hand, if an application is refusing connections because it is out of resources then you may see many RESETs. Lets look at the next commandment for an example.
It may be that there is a TCP application which is not active. The packet to the TCP port which is not active will be responded to with a TCP RESET packet. The count of TCP Attempt Fails will be incremented. Host TCP Port 445 Packet to TCP Port 445 TCP Reset Packet Count of TCP Attempt Fails Thou shalt not fail in watching thy TCP attempt fails
Thou shalt delve deeply into UDP no ports errors UDP No Ports is equivalent to TCP Attempt Fails It may be that there is a UDP application which is not active. If all UDP sockets are active, then it may be that UDP traffic is coming in at too high a rate for a particular port. We have seen this error to be correlated with ICMP Destination Unreachable SubType Port Unreachable error. Host UDP Port 161 Packet to UDP Port 161 ICMP Destination Unreachable Port Unreachable Count of UDP No Ports
UDP Port Unreachable In the case above, no application was listening on port 161, so this generated the ICMP error. Since this port happened to be for UDP, then it also generated a UDP No Ports error. If this is just a mistake and happens thousands of times a day because some application is not properly configured…
Thou shalt address the reason for your IP address errors Many UDP applications send packets to a broadcast address. The mainframe does not recognize such addresses. The packets are dropped and noted as address errors. Such packets may also come from a router if routing is misconfigured. We have seen millions per day. Host 18.104.22.168 Packet to 22.214.171.124 Count of Address Errors Count of IP Discards
Misdirected Packets This analysis only lasted a few minutes. Hundreds of packets were sent from many UDP NetBios connections. Some were improperly configured SQL servers. These packets were dropped by the mainframe. Why clog up the network and make the mainframe do extra work for no reason?
Thou shalt not convert thy applications directly from multi-dropped SDLC Makes sense to have small packets so that no one dominates traffic. Multi- dropped SDLC link Packet For PU 1 Packet For PU 2 Packet For PU 3 Packet For PU 2 Packet For PU 1 Packet For PU 3 TCP virtual circuit. Remote host Local host Small packets means overhead. IP Header: 20 bytes TCP Header: 20 bytes Data: 8 bytes
Sample Application This application was converted directly from SDLC. Suffered from poor response time.
Thou shalt not use two packets when one will do Packet 1: data Packet 2: 0 bytes data PSH in TCP header PSH bit on in the header indicates that data transmission is complete. The PSH bit could have been turned on in Packet 1. Packet 2 does not need to be sent. A small mistake. If you do it a million times a day, it becomes a big mistake.
Tuning TCP Saves Money –Eliminate errors and unneeded traffic and benefit from: lower CPU usage Less frequent hardware upgrades lower costs for MIPS based software charges Increased bandwidth availability Increased technical staff productivity Inside the Stack is the only TCP/IP monitor focused on problem solving and tuning.
Data from a recent Network Health Check reveal TCP, UDP, ICMP, and listener errors for both systems. Over 2,000 errors per 3-minute interval. With tuning these numbers fall significantly. Errors contribute to TCP/IP SRB usage.
After a Health Check and tuning efforts lasting 2 -3 weeks, the listener and UDP errors for both systems have been completely eliminated. The ICMP errors for both systems are nearly eliminated. The TCP errors have been cut to 1/4 to 1/3 of what they used to be. TCP dropped from 2 nd highest user of CPU to 4 th highest user of CPU (SRBs).
The Silent Killer –You may not even realize you have problems with TCP/IP. –Just as cholesterol in the heart can be a silent killer, retransmissions, excessive connections, and unneeded traffic can clog up the network. –And… these problems are preventable!
How Can We Help? ESAI and Inside Products are TCP/IP specialists. We can help you with: –Training –Tools –Consulting
Inside the Stack Inside the Stack provides: –Real time monitoring –Historical reports –Alerting –Connection monitoring –TCP stack diagnostics –There are hundreds of reports possible!
TCP Problem Finder The product most directed to the serious diagnostician : TCP Problem Finder allows you to: –Find problems in diagnostic traces - which can consist of thousands or hundreds of thousands of packets –See the exact flow in a connection from a high level overview or the details –We use this product ourselves in consulting. IBM subcontracts to us to help with TCP problem resolution, we could not do it without TCP Problem Finder!
When you are serious about tuning your network, our Network Health Check can help to: Identify response time problems for applications (host or network) Identify response time problems for individual connections (host or network) Identify congestion or network traffic errors on subnets Identify paging, queues, high CPU usage for TCP sockets or TCP address space Analyze TCP profile Identify paging, queues, high CPU usage for individual FTPs Identify routes and applications with packet fragmentation Identify excessive idle or hanging connections Identify connections in frequent error status Identify application configuration problems (keepalive required, etc) Network Health Check
TCP Classes We offer many classes in TCP/IP including: Security, IPSec, Policy Agent IPv6 (Addressing, Multi-platform) TCP Tuning and Performance Analysis Trace Analysis and Diagnostics
For more information on : –Inside the Stack, –TCP Problem Finder, –TCP Response Time Monitor, –Availability Checker, –Network Health Check, –TCP/IP classes Coming soon! –EE Health Check Please contact us! Contact Us! 1-831-659-8360 or 1-866-464-3724 firstname.lastname@example.org sales@ESAIGroup.com Australia: Blueline Software UK : FitzSoftware BENELUX: Adinsec BvBa