Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Network Measurements Les Cottrell – SLAC Lecture # 3 presented at the Workshop on Scientific Information in the Digital Age: Access and Dissemination.

Similar presentations


Presentation on theme: "11 Network Measurements Les Cottrell – SLAC Lecture # 3 presented at the Workshop on Scientific Information in the Digital Age: Access and Dissemination."— Presentation transcript:

1 11 Network Measurements Les Cottrell – SLAC Lecture # 3 presented at the Workshop on Scientific Information in the Digital Age: Access and Dissemination ICTP, Trieste, Italy October, 2009 www.slac.stanford.edu/grp/scs/net/talk09/ictp-measure.ppt

2 2 Overview Why is measurement difficult yet important? LAN vs WAN SNMP Effects of measurement interval Passive Active –Tools Trouble shooting –Tools, how to find things & who to tell

3 3 Why is measurement difficult? Internet's evolution as a composition of independently developed and deployed protocols, technologies, and core applications Diversity, highly unpredictable, hard to find “invariants” Rapid evolution & change, no equilibrium so far –Findings may be out of date Measurement not high on vendors list of priorities –Resources/skill focus on more interesting an profitable issues –Tools lacking or inadequate –Implementations are flaky & not fully tested with new releases ISPs worried about providing access to core, making results public, & privacy issues The phone connection oriented model (Poisson distributions of session length etc.) does not work for Internet traffic (heavy tails, self similar behavior, multi-fractals etc.)

4 4 Why is measurement important? End users & network managers need to be able to identify & track problems Choosing an ISP, setting a realistic service level agreement, and verifying it is being met Choosing routes when more than one is available Setting expectations: –Deciding which links need upgrading –Deciding where to place collaboration components such as a regional computing center, software development –How well will an application work (e.g. VoIP)

5 5 LAN vs WAN Measuring the LAN –Network admin has control so: Can read MIBs from devices Can within limits passively sniff traffic Know the routes between devices –Manually for small networks –Automated for large networks Measuring the WAN –No admin control, unless you are an ISP Cant read information out of routers May not be able to sniff/trace traffic due to privacy/security concerns Don’t know route details between points, may change, not under your control, may be able to deduce some of it –So typically have to make do with what can be measured from end to end with very limited information from intermediates equipment hops.

6 6 SNMP ( Simple Network Management Protocol) Example of an Application, usually built on UDP Defacto standard for network management Created by IETF to address short term needs of TCP/IP Consists of: –Management Information Bases (MIBs) Store information about managed object (host, router, switch etc.) – system &status info, performance & configuration data –Remote Network Monitoring (RMON) is a management tool for passively watching line traffic –SNMP communication protocol to read out data and set parameters Polling protocol, manager asks questions & agent responds

7 7 SNMP Model NMS contains manager software to send & receive SNMP messages to Agents Agent is a software component residing on a managed node, responds to SNMP queries, performs updates & reports problems MIB resides on nodes and at NMS and is a logical description of all network management data. TCP/IP net Agent MIB Agent MIB Agent MIB Agent MIB Agent MIB Agent MIB Network Management Station(NMS)

8 8 SNMP Examples Using MRTG to display Router bits/s MIB variable CERN trans- Atlantic traffic

9 9 Averaging intervals Typical measurements of utilization are made for 5 minute intervals or longer in order not to create much impact. Interactive human interactions require second or sub-second response So it is interesting to see the difference between measurement made with different time frames.

10 10 Averages vs maxima Maximum of all 5 sec samples can be factor of 2 or more greater than the average over 5 minutes

11 11 Passive vs. Active Monitoring Active injects traffic on demand Passive watches things as they happen –Network device records information Packets, bytes, errors … kept in MIBs retrieved by SNMP –Devices (e.g. probe) capture/watch packets as they pass Router, switch, sniffer, host in promiscuous (tcpdump) Complementary to one another: –Passive: does not inject extra traffic, measures real traffic Polling to gather data generates traffic, also gathers large amounts of data –Active: provides explicit control on the generation of packets for measurement scenarios testing what you want, when you need it. Injects extra artificial traffic Can do both, e.g. start active measurement and look at passively

12 12 Passive tools SNMP Hardware probes: e.g. Sniffer, can be stand-alone or remotely access from a central management station Software probes: snoop, WireShark, tcpdump, require promiscous access to NIC card, i.e. root/sudo access Flow measurement: SFlow, OCxMon/CoralReef, Cisco/Netflow

13 13 Example : Passive site border monitoring Use Cisco Netflow in Catalyst 6509 on SLAC border Gather about 200MBytes/day of flow data The raw data records include source and destination addresses and ports, the protocol, packet, octet and flow counts, and start and end times of the flows –Much less detailed than saving headers of all packets, but good compromise –Top talkers history and daily (from & to), tlds, vlans, protocol and application utilization Use for network & security

14 14 E.g. SLAC Traffic by collabo ration site BNL (LHC ATLAS) IN2P3 CNAF MPI Last 2 weeks in May 2009 1.0 0.0 1.0 Gbits/s OUT IN

15 15 E.g. Top talkers by protocol Hostname MBytes/day (log scale) 100110000 Volume dominated by single Application - bbftp

16 16 Flow sizes Heavy tailed, in ~ out, UDP flows shorter than TCP, packet~bytes 75% TCP-in < 5kBytes, 75% TCP-out < 1.5kBytes (<10pkts) UDP 80% < 600Bytes (75% < 3 pkts), ~10 * more TCP than UDP Top UDP = AFS (>55%), Real(~25%), SNMP(~1.4%) Just 2 parameters power law slope & intercept characterize traffic flows SNMP Real A/V AFS file server

17 17 Flow lengths 60% of TCP flows less than 1 second Would expect TCP streams longer lived –But 60% of UDP flows over 10 seconds, maybe due to heavy use of AFS

18 18 Some Active Measurement Tools Ping connectivity, RTT, loss, jitter, reachability –flavors of ping, fping –but blocking & rate limiting Alternative synack, but can look like DoS attack Traceroute –How it works, what it provides –Reverse traceroute servers –Traceroute archives Combining ping & traceroute, –traceping, pingroute Pathchar, pchar, pipechar, bprobe etc. Iperf, netperf, ttcp, FTP …

19 19 Ping ICMP client/server application built on IP –Client send ICMP echo request, server sends reply –Server usually in kernel, so reliable & fast User can specify number of data bytes. Client puts timestamp in data bytes. Compares timestamp with time when echo comes back to get RTT Many flavors (e.g. fping) and options –packet length, number of tries, timeout, separation … Ping localhost (127.0.0.1) first, then gateway IP address etc. Type=8CodeChecksum 081631 IdentifierSequence number Optional data 24

20 20 Ping example syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com PING thumper.bellcore.com (128.96.41.1): 64 data bytes 72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms 72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms 72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms 72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms 72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms --- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5 packets received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms 30cottrell@pinger:~>ping www.whitehouse.gov PING e2561.g.akamaiedge.net (72.246.106.135) 56(84) bytes of data. 64 bytes from a72-246-106-135.deploy.akamaitechnologies.com (72.246.106.135): icmp_seq=0 ttl=54 time=1.51 ms Repeat count Packet sizeRemote host RTT Missing seq # Summary Names differ (why) Loss

21 How to Diagnose with Ping 1.to localhost (127.0.0.1), 2.ping to gateway (use route or traceroute ( tracert on Windows) to find gateway), 3.ping to well known host 4.& to relevant remote host –Use IP address to avoid nameserver problems –Look for connectivity, loss, RTT, jitter, dups –May need to run for a long time to see some pathologies (e.g. bursty loss due to DSL loss of sync) –Try flood pings if suspect rate limited –Use telnet- see if blocked; synack if ICMP blocked www-iepm.slac.stanford.edu/tools/synack/ 21

22 Main Ping Unreachable Messages ICMP Code Value Message SubtypeDescription 0/1 Network/host Unreachable The datagram could not be delivered to the network specified in the network ID portion of the IP address/specific host. Usually means a problem with routing but could also be caused by a bad address. 7 Destination Host Unknown The host specified is not known. This is usually generated by a router local to the destination host and usually means a bad address. 9/10 Communication with Destination Network/Host is Administratively Prohibited The source device is not allowed to send to the network where the destination device is located/is allowed to send to the network where the destination device is located, but not that particular device. 13 Communication Administratively Prohibited The datagram could not be forwarded due to filtering that blocks the message based on its contents. 22 Not ICMP but DNS not resolving name gives Unknown Host

23 Anomalies: 23 12cottrell@pinger:~>ping www.lbl.gov PING www.lbl.gov (128.3.41.105) 56(84) bytes of data. --- www.lbl.gov ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3018ms Yet: 15cottrell@pinger:~>synack -p 80 -k 5 www.lbl.govwww.lbl.gov SYN-ACK to www.lbl.gov (128.3.41.105), 5 Packets connected to www.lbl.gov : Seq = 0, RTT = 2.241 ms Also: 16cottrell@pinger:~>telnet www.lbl.gov 80 Trying 128.3.41.105... Connected to www.lbl.gov. Escape character is '^]'. 9cottrell@pinger:~>ping mail.unza.zm PING impala.unza.zm (196.46.196.4) 56(84) bytes of data. 64 bytes from 196.46.196.4: icmp_seq=0 ttl=38 time=404 ms Yet 10cottrell@pinger:~>ping impala.unzm.za ping: unknown host impala.unzm.za Confused DNS Pings blocked Try> ping 198.46.223.4 and ping www.cern.ch: explainwww.cern.ch

24 Ping from your own host to the world www-iepm.slac.stanford.edu/tools/pingworld –Linux: –Windows: Unless paranoid push Run on certificate warning 24

25 25 Traceroute UDP/ICMP tool to show route packets take from local to remote host 17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pk traceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms 4 snv-slac.es.net (134.55.208.30) 1.377 ms 5 nyc-snv.es.net (134.55.205.22) 75.536 ms 6 nynap-nyc.es.net (134.55.208.146) 80.629 ms 7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms 8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms 9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms 10 207.45.205.18 (207.45.205.18) 128.648 ms 11 210.56.31.94 (210.56.31.94) 762.150 ms 12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms 13 * 14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms Probes/hop Max hops Remote host No response: Lost packet or router ignores

26 26 Traceroute technical details Rough traceroute algorithm ttl=1; #To 1 st router port=33434; #Starting UDP port while we haven’t got UDP port unreachable & ttl<max { send UDP packet to host:port with ttl get response if time exceeded note roundtrip time else if UDP port unreachable quit print output ttl++; port++ } Can appear as a port scan –SLAC gets about one complaint every 2 weeks.

27 27 Reverse traceroute servers Reverse traceroute server runs as CGI script in web server Allow measurement of route from other end. Important for asymmetric routes. See e.g. –www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.htmlwww.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html Also cities.lk.net/trlist.html#Listscities.lk.net/trlist.html#Lists Visual Traceroute server: visualroute.visualware.com/visualroute.visualware.com/ Map at www.caida.org/research/routing/reversetrace/www.caida.org/research/routing/reversetrace/

28 Warning Some Linux versions have bug that incorrectly IDs cksum error on MPLS links. Make Pkt length>=140 –Traceroute 140 28

29 29 Ping along the route Run traceroute, then ping each router n times –helps identify where in route the problems start to occur Routers may not respond to pings, or may treat pings directed at them, differently to other packets Get Matt’s TraceRoute MTR from http://www.bitwizard.nl/mtr/ or pathping (built into windows) http://www.bitwizard.nl/mtr/

30 How is my host doing? www.speedtest.net,alsowww.speedtest.net www.bandwidth-test.net For problem diagnosis also: –netspeed.stanford.edunetspeed.stanford.edu Special TCP kernel on server, Java on client –Up & down link speeds + IDs: Duplex mismatch, excessive loss from faulty cables, checks for middle boxes, FWs; needs Java on client Also hints on setting TCP buffer sizes 30

31 31 Path characterization Pathchar –sends multiple packets of varying sizes to each router along route –measures minimum response time –plot min RTT vs packet size to get bandwidth –calculate differences to get individual hop characteristics –measures for each hop: BW, queuing, delay/hop –can take a long time Pipechar (many derivatives) –Also sends back-to-back packets and measures separation on return –Much faster –Finds bottleneck Bottleneck Min spacing At bottleneck Spacing preserved On higher speed links

32 32 Network throughput Iperf (& thrulay, netperf, ttcp…) –Client generates & sends UDP or TCP packets –Server receives receives packets –Can select port, maximum window size, port, duration, Mbytes to send etc. –Client/server communicate packets seen etc. –Reports on throughput Requires sever to be installed at remote site, i.e. friendly administrators or logon account and password

33 33 Iperf example 25cottrell@flora06:~>iperf -p 5008 -w 512K -P 3 -c sunstats.cern.ch ------------------------------------------------------------ Client connecting to sunstats.cern.ch, TCP port 5008 TCP window size: 512 KByte ------------------------------------------------------------ [ 6] local 134.79.16.101 port 57582 connected with 192.65.185.20 port 5008 [ 5] local 134.79.16.101 port 57581 connected with 192.65.185.20 port 5008 [ 4] local 134.79.16.101 port 57580 connected with 192.65.185.20 port 5008 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec [ 5] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec [ 6] 0.0-10.3 sec 19.7 MBytes 15.3 Mbits/sec Total throughput =3*15.3Mbits/s = 45.9Mbits/s TCP port 5006Max window size 3 parallel streams Remote host

34 34 PingER Monitors >40 in 23 countriesPI – 1 @ ICTP, 4 in Africa, Algeria, Burkina Faso, South Africa, Zambia, Beacons ~ 90 Remote sites (~740) –50 African Countries –~ 99% of world’s population, >160 countries Measurements go back to Jan-95 Reports on RTT, loss, reachability, jitter, reorders, duplicates … Uses ubiquitous “ping”

35 35 PingER Methodology very Simple Internet 10 ping request packets each 30 mins Remote Host (typically a server) Monitoring host > ping remhost Ping response packets Measure Round Trip Time & Loss Data Repository @ SLAC Once a Day Uses ubiquitous ping

36 Example PingER Output ICTP>Kenya Uses Smokeping –Blue median RTT, background color = loss –Smokiness = jitter 36 Median RTT drops 780ms to 225ms, i.e. cut by 2/3rds (3.5 times improvement)

37 37 Trouble shooting Ping to localhost, ping to gateway & to remote host –Use IP address to avoid nameserver problems –Look for connectivity, loss & RTT –May need to run for a long time to see some pathologies (e.g. bursty loss dues to DSL loss of sync) –Use synack if ICMP blocked Traceroute to remote host Reverse traceroute from remote host to you Ping routers along route (pingroute helps) Look at history plots (PingER, AMP, Surveyor), when did problem start, how big an effect is it? Look at own connectivity NDT (netspeed.stanford.edu)netspeed.stanford.edu

38 38 “Where is” a host – cont. Find the Autonomous System (AS) administering –Use reverse traceroute server with AS identification, e.g.: www.slac.stanford.edu/cgi-bin/nph-traceroute.pl … 14 lhr.comsats.net.pk (210.56.16.10) [AS7590 - COMSATS] 711 ms (ttl=242) –Get contacts for ISPs (if know ISP or AS): http://puck.nether.net/netops/nocs.cgi Gives ISP name, web page, phone number, email, hours etc. –Review list of AS's ordered by Upstream AS Adjacency www.telstra.net/ops/bgp/bgp-as-upsstm.txt Tells what AS is upstream of an ISP –Look at real-time information about the global routing system from the perspectives of several different locations around the Internet Use route views at www.antc.uoregon.edu/route-views/www.antc.uoregon.edu/route-views/

39 39 “Where is” a host Look at name (use dig or nslookup if have IP addr) 47cottrell@netflow:~>nslookup 210.56.16.10 Server: localhost Address: 127.0.0.1 Name: lhr.comsats.net.pk Address: 210.56.16.10 –Look at TLD and use www.iana.org/domain-names.htmwww.iana.org/domain-names.htm –Sometimes name (e.g. construct.haifa.ac.il)gives hint Use www-wanmon.slac.stanford.edu/cgi-bin/nph- traceroute.pl to traceroute to www.ictp.it, what cities does the route go thru?www-wanmon.slac.stanford.edu/cgi-bin/nph- traceroute.plwww.ictp.it Visit site’s www server, often location in home page Use whois, e.g. www.allwhois.comwww.allwhois.com May be able to get lat & long (GeoIPTool): –www.geoiptool.com/ or via: geotool.flagfox.net/www.geoiptool.com/geotool.flagfox.net/

40 Who are you gonna tell Local network support people Internet Service Provider (ISP) usually done by local networker –Usually will know immediate one, e.g. trouble@es.net –Use puck.nether.net/netops/nocs.cgi to find ISPpuck.nether.net/netops/nocs.cgi –Use www.telstra.net/ops/bgp/bgp-as-upsstm.txt to find upstream ISPswww.telstra.net/ops/bgp/bgp-as-upsstm.txt Well managed sites and ISPs maintain a list of email addresses such as abuse@ or postmaster@, that one can send email to, for example to complain about spam etc. –This follows an Internet recommendation (RFC 2142).RFC 2142 –Some less helpful sites do not provide such services, for more on these, see RFC-ignorant.orgRFC-ignorant.org Give them the ping & traceroute results 40

41 41 More Information Tutorial on monitoring (getting a bit dusty) –www.slac.stanford.edu/comp/net/wan-mon/tutorial.htmlwww.slac.stanford.edu/comp/net/wan-mon/tutorial.html RFC 2151 on Internet tools –www.freesoft.org/CIE/RFC/Orig/rfc2151.txtwww.freesoft.org/CIE/RFC/Orig/rfc2151.txt Network monitoring tools –www.slac.stanford.edu/xorg/nmtf/nmtf-tools.htmlwww.slac.stanford.edu/xorg/nmtf/nmtf-tools.html Ping –http://www.ping127001.com/pingpage.htmhttp://www.ping127001.com/pingpage.htm IEPM/PingER home site –www-iepm.slac.stanford.edu/pingerwww-iepm.slac.stanford.edu/pinger IEEE Communications, May 2000, Vol 38, No 5, pp 130-136

42 IP Addresses pingable June 2003 42 Grey= not allocated Black= not pingable Companies own class A

43 Growth 2003-2006 43 June 2003 Nov 2006 More areas allocated, Existing areas more colorful

44 44 Not your normal Internet site Ames IXP: approximately 60-65% was HTTP, about 13% was NNTP Uwisc: 34% HTTP, 24% FTP, 13% Napster

45 45 SNMP version 1 limitations Authentication is inadequate: –Password (community string) placed in clear in SNMP messages MIB variables must be polled separately, i.e. entire MIB cannot be fetched with single command SNMPv2 and v3 attempt to address these and other limitations Despite limitations, SNMP has been a big success –Provides device and link utilization (byte, packets) and errors –Lot of facilities/tools built around SNMP to provide reports for sites –Security concerns limit access typically to very limited set of owner/admins E.g. ISPs won’t let you poll their devices

46 46 Lot of heavy FTP activity The difference depends on traffic Only 20% difference in max & average

47 47 Simplified SLAC DMZ Network, 2001 swh-root Swh-dmz rtr-msfc-dmz slac-rt1.es.net ESnet Stanford Internet2 OC12 link 622Mbps 155Mbps OC3 link(*) SLAC Internal Network (*) Upgrade to OC12 has been requested (#) This link will be replaced with a OC48 POS card for the 6500 when available Etherchannel 4 gbps 10Mbps Ethernet 1Gbps Ethernet NTON 2.4Gbps OC48 link (#) 100Mbps Ethernet Dial up &ISDN

48 48 Time series UDP TCP Outgoing Incoming Cat 4000 802.1q vs. ISL

49 49 Flow lengths Distribution of netflow lengths for SLAC border –Log-log plots, linear trendline = power law –Netflow ties off flows after 30 minutes –TCP, UDP & ICMP “flows” are ~log-log linear for longer (hundreds to 1500 seconds) flows (heavy-tails) –There are some peaks in TCP distributions, timeouts? Web server CGI script timeouts (300s), TCP connection establishment (default 75s), TIME_WAIT (default 240s), tcp_fin_wait (default 675s) TCPUDP ICMP

50 50 Power law fit parameters by time Just 2 parameters provide a reasonable description of the flow size distributions

51 51 Surveyor & RIPE, NIMI Surveyor & RIPE use dedicated PCs with GPS clocks for synchronization –Measure 1 way delays and losses –Surveyor mainly for Internet 2 –RIPE mainly for European ISPs NIMI (National Internet Measurement Infrastructure) more of an infrastructure for measurements and some tools (I.e. currently does not have public available data,regularly updated) –Mainly full mesh measurements on demand

52 52 Skitter Makes ping & route measurements to tens of thousands of sites around the world. Site selection varies based on web site hits. –Provide loss & RTTs –Skitter & PingER are main 2 sites to monitor developing world.

53 53 Active Measurement Projects PingER Amp Surveyor, RIPE NIMI NWS Skitter All projects measure routes For a detailed comparison see: –www.slac.stanford.edu/comp/net/wan-mon/iepm-cf.htmlwww.slac.stanford.edu/comp/net/wan-mon/iepm-cf.html

54 54 PingER and AMP Both use ping to get RTT, loss etc. PingER 30 monitor sites, ~600 remote sites, 72 countries, > 3000 monitor-remote site pairs –Mainly for HENP, ESnet & XIWT http://amp.nlanr.net/AMP/ –AMP uses dedicated PCs as monitors, ~ 115 (June, 2001) –Oriented to Internet 2, ~ 10 countries –Does mainly full mesh pinging

55 55 PingER cont. Monitor timestamps and sends ping to remote site at regular intervals (typically about every 30 minutes) Remote site echoes the ping back Monitor notes current and send time and gets RTT Discussing installing monitor site in Pakistan –provide real experience of using techniques –get real measurements to set expectations, identify problem areas, make recommendations –provide access to data for developing new analysis techniques, for statisticians etc.

56 56 Trouble shooting Try user application Iperf to test throughput

57 57 Utilization with different averaging times Same data, measured Mbits/s every 5 secs Average over different time intervals Does not get a lot smoother May indicate multi-fractal behavior 5 secs 5 mins 1 hour


Download ppt "11 Network Measurements Les Cottrell – SLAC Lecture # 3 presented at the Workshop on Scientific Information in the Digital Age: Access and Dissemination."

Similar presentations


Ads by Google