Presentation on theme: "NET0183 Networks and Communications Lectures 17 and 18 Measurements of internet traffic (IP) 8/25/20091 NET0183 Networks and Communications by Dr Andy."— Presentation transcript:
NET0183 Networks and Communications Lectures 17 and 18 Measurements of internet traffic (IP) 8/25/20091 NET0183 Networks and Communications by Dr Andy Brooks
8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 2 http://portal.acm.org/citation.cfm?id=1298321
2.1 Collection of Traces Traces were collected on an SDH ring running Packet over SONET (PoS). – “Packet over SONET/SDH, abbreviated POS, is a communications protocol for transmitting packets in the form of the Point to Point Protocol (PPP) over SDH or SONET, which are both standard protocols for communicating digital information using lasers or light emitting diodes (LEDs) over optical fibre at high line rates.” communications protocolpacketsPoint to Point Protocol SDHSONET Wikipedia 16-Feb-10 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 3 Synchronous Optical Networking (SONET) Synchronous Digital Hierarchy (SDH)
“On the two OC-192 links (two directions) we use optical splitters attached to two Endace DAG6.2SE cards. The DAG cards captured the ﬁrst 120 bytes of each frame to ensure that the entire network and transport header information is preserved.” – Endace is a supplier of high-speed network traffic capture technology. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 4 2.1 Collection of Traces
Network measurements: in software or in hardware? Software tools interact with the operating system and device drivers to obtain copies of network packets. Software tools are however limited: they can´t capture everything on a high-speed link. Special hardware is designed for high-speed links such as an internet backbone. Special hardware captures traffic directly by, for example, using optical splitters. Software tools are typically inexpensive, while special hardware is typically expensive. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 5
Network measurements: online or offline processing? Full packet tracing creates large amounts of data. Offline processing is not time critical and data can be re-analyzed in various ways. Online processing can involve filtering according to hosts or port numbers or other properties. Online processing can involve sampling where every nth packet is recorded. Online processing can involve packet truncation where only a fixed number of bytes are stored. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 6
Very roughly speaking, the measurements were taken on links between the region of Göteborg and the rest of the Internet. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 7 2.1 Collection of Traces E-mail traffic/E-commerce traffic/Online gaming traffic/... Search engine traffic/Social-networking traffic/...
“Optical Carrier is a standardized set of specifications of transmission speeds that describe a range of digital signals that can be carried on Synchronous Optical Networking (SONET) fiber optic networks. The number attached to the Optical Carrier abbreviation, e.g., OC-48, is directly proportional to the data rate of the bitstream of the digital signal. The rule for calculating the speed of optical-carrier- classified lines is that a specification given as OC-n designates a speed of n × 51.84 Mbit/s.” – Wikipedia 16-feb.-10 OC-192 = 9,953.38 Mbit/s. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 8 2.1 Collection of Traces
“The data collection was performed between the 7th of April 2006, 2AM and the 26th of April 2006, 10AM. During this period, we simultaneously for both directions collected four traces of 20 minutes each day at identical times. The times (2AM, 10AM, 2PM, 8PM were chosen to cover business, non-business and nighttime hours.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 9 4 * 2 * 20 = 160 traces20 consecutive days
2.2 Processing and Analysis The DAG cards discarded 20 frames within 12 traces due to receiver errors or HDLC CRC errors. – High-Level Data Link Control (HDLC) is a protocol. – Point to Point Protocol (PPP) is based on HDLC. After storing the data on disk, the payload beyond the transport layer was removed. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 10
8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 11 Respect the user´s privacy. Delete any user data. The first 120 bytes might include some user data. http://windowsbootdisks.com/ip_images/ipmodelen.gif
2.2 Processing and Analysis A total of 71 frames within 30 traces had to be discarded due to IP checksum errors. – Single checksum errors are minor errors. “Trace sanitization refers to the process of checking and ensuring that the collected traces are free from logical inconsistencies and are suitable for further analysis.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 12
Sanity checks can include: Do timestamps on packets increase monotonically? – A packet should not have a timestamp earlier than a packet before it in the trace. Are the interarrival times between packets in keeping with packet sizes and the line-speed of the carrier? – First, calculate how many packets you should be detecting on average each second. Are there any identical IP headers within consecutive frames? Do counts before and after de-sensitization agree? – Anonymizing the data should not change the results. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 13
2.2 Processing and Analysis Trace desensitization refers to the process of ensuring privacy and confidentiality. – Packet payloads beyond the transport layer were removed very early in the processing. IP addresses were anonymized using the prefix preserving CryptoPAn tool. – “In Cyrpto-PAn, the IP address anonymization is prefix-preserving. That is, if two original IP addresses share a k-bit prefix, their anonymized mappings will also share a k-bit prefix.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 14
3. RESULTS “The 148 traces analyzed sum up to 10.77 billion PoS frames, containing a total of 7.6 TB of data. 99.97% of the frames contain IPv4 packets, summing up to 99.99% of the carried data. The remaining traﬃc consists of diﬀerent routing protocols (BGP, CLNP, CDP).” – 12 discarded traces: 12/160 = 7.5% – 91 discarded frames: 91/10.77 billion ≈ 8.5x10 -9 “The results in the remainder of this paper are based on IPv4 traﬃc only.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 15
BGP http://www.webopedia.com/TERM/B/BGP.html 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 16 “Short for Border Gateway Protocol, an exterior gateway routing protocol that enables groups of routers (called autonomous systems) to share routing information so that efficient, loop-free routes can be established. BGP is commonly used within and between Internet Service Providers (ISPs). The protocol is defined in RFC 1771.exterior gateway routing protocolroutersInternet Service Providers (ISPs)RFC
CDP http://www.webopedia.com/TERM/B/BGP.html 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 17 “The Cisco Discovery Protocol (CDP) was developed by Cisco Systems. It's primarily used to obtain the protocol addresses of neighboring devices and also to discover the platform of those devices. It can also be used to show information about the interfaces your router uses. CDP is media and protocol- independent, and runs on all Cisco-manufactured equipment including routers, bridges, access servers, and switches. CDP runs only over the data link layer enabling two systems that support different network-layer protocols to learn about each other. CDP Version-2 (CDPv2) is the most recent release of the protocol and provides more intelligent device tracking features.”Cisco Systemsdevicesplatformroutersbridgesswitches protocol
3.1.1 IP packet size distribution Earlier measurements showed the distribution of IPv4 packet lengths to be trimodal: – ≈ 40 bytes (TCP acknowledgements) – 576 bytes, default IP datagram size (RFC 879) – ≈ 1500 bytes, Ethernet Maximum Transmission Unit Between 1997 and 2002, studies reported the fraction of packets with a default IP datagram size ranged between 10%-40%. In 2004, a study by Pentikousis et. al. found the distribution to be bimodal and that the default IP datagram size accounted for only 3.8% of all packets. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 18
Figure 1. The Cumulative IPv4 Packet Size Distribution The distribution is bimodal. 44% of packets between 40 and 100 bytes. 37% of packets between 1400 and 1500 bytes. The default IP datagram size of 576 bytes represents now only 0.95% of the traffic and is no longer in the top three modes. – “This is caused by the predominance of Path MTU Discovery in today´s TCP implementations...” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 20
Path MTU Discovery @ Wikipedia 17-feb.-10 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 21 “Path MTU discovery works by setting the DF (Don't Fragment) option bit in the IP headers of outgoing packets. Then, any device along the path whose MTU is smaller than the packet will drop it, and send back an ICMP "Fragmentation Needed" (Type 3, Code 4) message containing its MTU, allowing the source host to reduce its path MTU appropriately. The process repeats until the MTU is small enough to traverse the entire path without fragmentation.”ICMP Many networks block ICMP for security reasons… “A robust method for PMTUD that relies on TCP or some other packetization layer to probe the path with progressively larger packets has been standardized in RFC 4821 (Packetization Layer Path MTU Discovery).”RFC 4821
Figure 1. The Cumulative IPv4 Packet Size Distribution Two modes appear at 628 bytes and 1300 bytes representing 1.76% and 1.1% of the traffic. Analysing the TCP flows it was found that packets with 628 bytes usually came after full-sized packets and had the PUSH flag set. “We suspect that they are sent by applications doing ’TCP layer fragmentation’ on 2KB blocks of data, indicating the end of a data block by PUSH.” “A look at the TCP destination ports revealed that large fractions of this traﬃc are indeed sent to ports known to be used for popular ﬁle-sharing protocols like Bittorrent and DirectConnect.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 22
Figure 1. The Cumulative IPv4 Packet Size Distribution “The notable step at 1300 bytes on the other hand could be explained by the recommended IP MTU for IPsec VPN tunnels.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 23 IPsec Short for IP Security, a set of protocols developed by the IETF to support secure exchange of packets at the IP layer. IPsec has been deployed widely to implement Virtual Private Networks (VPNs). http://www.webopedia.com/TERM/I/IPsec.htmlprotocolsIETFpacketsIPVirtual Private Networks (VPNs) Virtual private network A virtual private network (VPN) is a computer network that is layered on top of an underlying computer network. The term VPN can be used to describe many different network configurations and protocols. http://en.wikipedia.org/wiki/Virtual_private_networkcomputer network
Large packets 0.15% of traffic was larger than 1500 bytes. – The standard Ethernet MTU is 1500 bytes. Of the 0.15%, 99.7% were 4470 bytes. Packet sizes up to 8192 bytes were observed. “A minor part of the >1500 byte sized packets represents BGP updates between backbone or access routers.” – The majority of the large packet traffic was identified as customized data-transfer from a space observatory to a data center using jumbo frames over Ethernet. non-standard Ethernet 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 24
8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 27 http://www.fatpipe.org/~mjb/Drawings/ Andy comments: To check the checksum calculation, we need to know what the pseudo header is.
3.2.1. IP type of service The TOS field can be used for Explicit Congestion Notification (ECN) and Differentiated Services. 83.1% of the packets stored a value of zero in the TOS field, indicating the TOS field was not being used. “In our data only 1.0 million IPv4 packets provide ECN capable transport (either one of the ECT bits set) and additionally 1.1 million packets actually show ’congestion experienced’ (both bits set). This means that ECN is implemented in only around 0.02% of the IPv4 traﬃc.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 29
3.2.1 IP type of service Medina et. al. found that the number of ECN- capable webservers went from 1.1% in 2000 to 2.1% in 2004. “... suggesting that the number of ECN-aware routers is still very small.” “Valid ’Pool 1’ DiﬀServ Codepoints (RFC 2474) account for 16.8% of all TOS ﬁelds.” – Different applications have different needs. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 30 Andy comments: It would be useful to find a more recent study on ECN and Differentiated Services.
3.2.2 IP Options Only 68 packets carried IP Options. – 68 out of 10.77 billion One 20-minute trace contained 45 packets with IP Option 7 (Record Route). Three traces had 12 packets with IP Option 148 (Router Alert). 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 31 “The analysis of IP options showed that they are virtually not used.”
3.2.3 IP fragmentation In 2000, McCreary et. al. found an increase in the fraction of packets carrying fragmented traffic from 0.03% to 0.15%. In 2002, Shannon et. al. found the fraction of packets to be 0.67%. “Contrary to this trend, we found a much smaller fraction of 0.06% of fragmented traﬃc in the analyzed data.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 32
3.2.3 IP fragmentation 63% of the outgoing fragmented traffic was IPsec ESP traffic (RFC4303), observed between exactly one source and one receiver. A fragment series comprised one full length Ethernet MTU followed by a 72 byte fragment. “This can easily be explained by an unsuitably conﬁgured host/VPN combination transmitting 1532 bytes (1572 – 40 bytes IP and TCP header) instead of the Ethernet MTU due to the additional ESP header.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 33 Encapsulating Security Payload (ESP)
3.2.4 IP flags 91.3% of the packets have the don´t fragment bit (D or DF bit) set, “as proposed by Path MTU Discovery (RFC 1191)”. 0.04% of the packets have the more fragments bit (M or MF bit) set. – fragmented traffic was 0.6% 8.65% of the packets use neither DF or MF. 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 34
3.2.4 IP flags 27,474 IPv4 packets from 70 distinct IP sources had DF and MF set simultaneously! – an invalid combination according to the IP specification (RFC 791)... “Looking at the traﬃc pattern and considering that UDP port 53 is used, it seems to be obvious that there is a DNS server using improper protocol stacks inside the Göteborg region.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 35 DNS “(1) Short for Domain Name System (or Service or Server), an Internet service that translates domain names into IP addresses.” Internetdomain names “The DNS system is, in fact, its own network. If one DNS server doesn't know how to translate a particular domain name, it asks another one, and so on, until the correct IP address is returned.” http://www.webopedia.com/TERM/D/DNS.htmlnetwork
3.2.4 IP flags 233 packets from 126 distinct sources had the reserved bit set. “According to the IP standard (RFC 791) the reserved bit must be zero, so this behavior has to be regarded as misbehavior.” 8/25/2009 NET0183 Networks and Communications by Dr Andy Brooks 36 Misbehaviour can be caused by bugs in software or by network attacks exploiting protocol vulnerabilities.