Internet Monitoring - Results

Slides:



Advertisements
Similar presentations
Surveyor IP Performance Measurements Matt Zekauskas June, 1999 NLANR/I2 Joint Techs.
Advertisements

1 QoS on Best-effort IP Networks Les Cottrell – SLAC Presented at the Joint SG13/SG16 Workshop Panel.
pathChirp Efficient Available Bandwidth Estimation
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.
Internet Traffic Patterns Learning outcomes –Be aware of how information is transmitted on the Internet –Understand the concept of Internet traffic –Identify.
1 SLAC Internet Measurement Data Les Cottrell, Jerrod Williams, Connie Logg, Paola Grosso SLAC, for the ISMA Workshop, SDSC June,
Internet…issues Managing the Internet
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 TCP-LP: A Distributed Algorithm for Low Priority Data Transfer Aleksandar Kuzmanovic, Edward W. Knightly Department of Electrical and Computer Engineering.
1 PingER: Methodology, Uses & Results Les Cottrell SLAC, Warren Matthews GATech Extending the Reach of Advanced Networking: Special International Workshop.
Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.
1 ICFA/SCIC Network Monitoring Prepared by Les Cottrell, SLAC, for ICFA
Network Monitoring grid network performance measurement, simulation & analysis Presented by Warren Matthews at the Performance.
IP Performance Measurements using Surveyor Matt Zekauskas Guy Almes, Sunil Kalidindi August, 1998 ISMA 98.
Tiziana FerrariQuality of Service for Remote Control in the High Energy Physics Experiments CHEP, 07 Feb Quality of Service for Remote Control in.
Routing Measurements Matt Zekauskas, ITF Meeting 2006-Apr-24.
Reading Report 14 Yin Chen 14 Apr 2004 Reference: Internet Service Performance: Data Analysis and Visualization, Cross-Industry Working Team, July, 2000.
1 Monitoring Internet connectivity of Research and Educational Institutions Les Cottrell – SLAC/Stanford University Prepared for the workshop on “Developing.
1 ESnet Network Measurements ESCC Feb Joe Metzger
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
Users’ Authentication in the VRVS System David Collados California Institute of Technology November 20th, 2003TERENA - Authentication & Authorization.
PingER: Research Opportunities and Trends R. Les Cottrell, SLAC University of Malaya.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
ESnet Abilene 3+3 Measurements Presented at the Joint Techs Meeting in Columbus July 19 th 2004 Joe Metzger ESnet Network Engineer
Research on design and implementation of Internet measurement infrastructure Lv Jun Aug 28, 2003.
Tiziana Ferrari Quality of Service Support in Packet Networks1 Quality of Service Support in Packet Networks Tiziana Ferrari Italian.
9/23/20151 The Internet & SLAC Les Cottrell 1, SLAC Les Cottrell 1SLAC   /grp/scs/net/talk/internet-connectivity-97/index.htm.
Performance Monitoring - Internet2 Member Meeting -- Nicolas Simar Performance Monitoring Internet2 Member Meeting, Indianapolis.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 ITGN 235: Principles of Networking ITGN 225: Networking.
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
February 11, 2000 ICFA, RAL M.Kasemann, FNAL1 Report to ICFA February 11, 2000 Matthias Kasemann, FNAL.
The Internet is Broken, and How to Fix It Jim Gettys Bell Labs July 27, 2012.
Use cases Navigation Problem notification Problem analysis.
1 ESnet/HENP Active Internet End-to-end Performance & ESnet/University performance Les Cottrell – SLAC Presented at the ESSC meeting Albuquerque, August.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
1 The PingER Project: Measuring the Digital Divide PingER Presented by Les Cottrell, SLAC At the SIS Show Palexpo/Geneva December 2003.
ASCR/ESnet Network Requirements an Internet2 Perspective 2009 ASCR/ESnet Network Requirements Workshop April 15/16, 2009 Richard Carlson -- Internet2.
1 Measurements of Internet performance for NIIT, Pakistan Jan – Feb 2004 PingER From Les Cottrell, SLAC For presentation by Prof. Arshad Ali, NIIT.
/afs/slac/u/sf/cottrell/talk/escc/oct971 ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRCSLAC HEPNRC,
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
1 Quantifying the Digital Divide: focus Africa Prepared by Les Cottrell, SLAC for the NSF IRNC meeting, March 11,
Topic 3 Analysing network traffic
Measurement in the Internet Measurement in the Internet Paul Barford University of Wisconsin - Madison Spring, 2001.
1 Internet Performance Monitoring for the HENP Community Les Cottrell & Warren Matthews – SLAC Presented.
3/4/981 Internet Telephony & Internet Performance Issues Les Cottrell SLACSLAC Presented at the XIWT/IPWT meeting San Jose February 4th, 1998 Partially.
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
INDIANAUNIVERSITYINDIANAUNIVERSITY Status of FAST TCP and other TCP alternatives John Hicks TransPAC HPCC Engineer Indiana University APAN Meeting – Hawaii.
1 PingER performance to Bangladesh Prepared by Les Cottrell, SLAC for Prof. Hilda Cerdeira May 27, 2004 Partially funded by DOE/MICS Field Work Proposal.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Internet Traffic Engineering Motivation: –The Fish problem, congested links. –Two properties of IP routing Destination based Local optimization TE: optimizing.
-1- ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) David Robertson Internet2 Joint Techs Workshop July 18,
1 IEPM / PingER project & PPDG Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99 Partially funded by DOE/MICS Field Work Proposal on.
3/4/98z:\cottrell\escc\may98\essc- may98.ppt 1 ESnet End-to-end Internet Monitoring Les Cottrell and Warren Matthews, SLAC andSLAC David Martin, HEPNRC.
© 2014 Level 3 Communications, LLC. All Rights Reserved. Proprietary and Confidential. Simple, End-to-End Performance Management Application Performance.
1 PingER6 Preliminary PingER Monitoring Results from the 6Bone/6REN. Warren Matthews Les Cottrell.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
25/09/2016 INASP: Effective Network Management Workshops Unit 6: Solving Network Problems.
Accelerating Peer-to-Peer Networks for Video Streaming
Measurement Projects Overview
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
The PingER Project: Measuring the Digital Divide
Wide Area Networking at SLAC, Feb ‘03
Advanced Networking Collaborations at SLAC
Wide-Area Networking at SLAC
Presentation transcript:

Internet Monitoring - Results Les Cottrell SLAC <cottrell@slac.stanford.edu> Presented at the ICFA Meeting, CERN, Mar 1998 Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM) 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Outline of Talk What, why & how are we (ESnet/HENP community) measuring? What PingER measurement reports are available and what do they show (short), intermediate & long term grouping and multi-site visualization Traffic volume & Traceroute measurements Summary Deployment/development, Internet Performance, Next Steps Collaborations NIMI/IPWT Won’t talk about actual tools, only briefly cover the method (they were covered by Dave Martin’s presentation),, also will mainly dwell on long term trend reports and how we use the results of the tools to better understand the Internet.. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Why go to the effort? Apparent quality of Internet getting worse as size and demands increase Internet woefully under-measured & under-instrumented Internet very diverse - no single path typical Users need: realistic expectations, planning information guidelines for setting and validating SLAs information to help in identifying problems help to decide where to apply resources Demands are driven by: increase in number of users, increase in power available at desktop and in servers, newer applications (more graphics based, video, voice etc.), need for better QoS. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Importance of Response Time Time is scarcest and most valuable commodity Studies in late 70’s and early 80s showed the economic value of Rapid Response Time 0-0.4s High productivity interactive response 0.4-2s Fully interactive regime 2-12s Sporadically interactive regime 12s-600s Break in contact regime >600s Batch regime Threshold around 4-5s complaints increase rapidly. Voice has threshold around 100ms Note that the TCP/IP timeout caused by a packet loss is of the order of 4-5 seconds. For some newer Internet applications there are other thresholds, for example for voice a threshold appears at about 100ms - above that point, the delay causes difficulty for people trying to have a conversation and frustration grows. Also see: http://rescomp.stanford.edu/~cheshire/papers/LatencyQuest.html 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Perception of Poor Packet Loss Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows. Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable. The scarcest and most valuable commodity is time. Studies in late 70’s and early 80s by Walt Doherty of IBM and others showed the economic value of Rapid Response Time: 0-0.4s High productivity interactive response 0.4-2s Fully interactive regime 2-12s Sporadically interactive regime 12s-600s Break in contact regime >600s Batch regime There is a threshold around 4-5s where complaints increase rapidly. For some newer Internet applications there are other thresholds, for example for voice a threshold appears at about 100ms - above that point, the delay causes difficulty for people trying to have a conversation and frustration grows. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Our Main Metric is Ping “Universally available”, easy to understand no software for clients to install Low network impact Provides useful real world measures of loss, response time, reachability, unpredictability Avoid routers, they drop pings to the router if busy. Prefer lightly loaded or consistently loaded hosts, e.g. name server, mail gateway. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Ping Response vs Web Response 1/2 HTTP GET Response (ms) Minimum Ping Response (ms) R**2 ~ 0.6 i.e. 60% of the GET response can be explained by the ping response. More importantly there is a a lower limit around y=2*x. This is related to the GET taking 2 round trips (SYN/ACK & GET/response) versusPing taking a single round trip. The lower limit shows that given the minimum ping response one can get an idea of the best possible Web response. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Ping Response vs Web Response 2/2 Interquartile distance is ~ 250 ms (green lines show quartiles) FWHM ~ 120 ms 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Ranked packet loss for 3 months Stanford Rome UK Note X axis scale changes, shows ESnet sites much better for SLAC Note big variations from month to month. The poor U of Cincinnatti performance was a cause for concern since SLAC has a strong collaboration with them. Worked with various people to try and improve. The appearance of SLAC as the worst in November is an anomaly caused by problems on one particular day between the monitoring host and the host being monitored. The difference in the U of Colorado to SLAC and the Colorado State links is quite evident. U Colorado has a vBNS connection with good peering to ESnet, but Colorado State does not. Poor performance between SLAC and the Stanford U Medical center (separated by 2 miles) was due to poor connectivity at MAE-West. This has now been bypassed with a uwave link. Bad connectivity to Rome not reflected by other Italian sites. UK sites (RL & Glasgow) have worse performance than between SLAC and Beijing or Slac & Novosibirsk. Cincinnatti 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Sawtooth Effect 2 * capacity (+ 2Mbps) Added 45 Mbps (quadrupled capacity) 3 * capacity + 9 Mbps Adding extra capacity to UK - US link in April 96, Feb 97 and Jul 98 improves (reduces) packet loss, but the slack is soon taken up. There is also a distinct dip around the New Year holiday season The TEN-34 link improved access to Europe and mirror sites, also at this time the ANS/Sprint link balancing was improved. Recently added a second UK site (University of Glasgow) to ensure that the effects are not unique to RL (pair-wise comparison) Holidays 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

RAL Last 180 Days plot Lines are simply cubic splines fits to aid eye Upper green and black points are response time in ms Red & blue are weekday loss Cyan are weekend loss Note weekend/weekday differences (cyan vs blue) Note Xmas/New Year lull Also note quick onset of saturation at end August & September 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Italian sites look similar to each other 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Representative International HENP Site Loss Jan-95 thru Nov-97 Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97) Indicates importance of keeping log of what happened on routes. Possibly regular pathchars, or at least traceroutes. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Aggregation Group measurements, for example: by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country) trans-oceanic links, intercontinental links separation e.g. number of hops, time zones crossed, IXPs crossed ISP (ESnet, vBNS/I2, ...) by monitoring site one site seen from multiple sites common interest/affiliation (XIWT, HENP …) user selectable 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Group Selection (all sites monitoring CERN) Select one of these groups CMU CNAF RL FNAL SLAC DESY Carelton RMKI CERN KEK Allow user to select which group of links (out of > 500) to display results for Note some collection sites ping multiple hosts at a given site. Checks for consistency 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Group Response Time Jan-95 Nov-97 Improved between 1 and 2.5% / month Response & Loss similar improvements care with new sites Prime time 7am - 7pm weekday seen from SLAC. Increase in international response caused by addition of IHEP, Novosibirsk, FZU (in CZ). If we remove these additions we get just under 1% improvement/month (i.e. pretty much like the others). This points out the need to examine results for biases. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Network Quiescence Frequency of zero packet loss (for all time - not cut on prime time) Eg a network busy 8 work hrs/day per week and quiescent otherwise would have % ~ 75% ~ (total hsr/wk - 5 wkdays/wk * 8hrs/day) / total hrs/wk Clear that connectivity between SLAC and ESnet sites is best. A bit similar to the phone companies idea of error free seconds, except is is a frequency rather than a number 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Ping Loss Quality Want quick to grasp indicator of link quality Loss is the most sensitive indicator loss of packet requires ~ 4 sec TCP retry timeout Studies on economic value of response time by IBM showed there is a threshold around 4-5secs where complaints increase. 0-1% = Good 1-2.5% = Acceptable 2.5%-5% = Poor 5%-12% = Very Poor > 12% = Bad 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Quality Distributions ESnet median good quality All other groups poor or very poor Critical to have good peering Poor performance of non Esnet sites (seen from SLAC) due to poor performance as traverse interchanges between ESnet & rest of Internet 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Multi Collection Site Visualization Collection Sites Remote Sites Median ping loss on link Remote sites ordered by number of collection sites that Can select: grouping by site,by TLD, by continent which metric to display (loss, response, quiescence, unpredictability, unreachability) which month’s data to view Placing mouse over the ? in each box provides number of links included in data 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Intercontinental Grouping (Loss) Move mouse over ? to see # links Looks pretty bad for intercontinental use 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Top Level Domain Grouping (Loss) Mouseover red dots gives more information on TLD (e.g. ch=Switzerland) Diagonals are within TLD 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

TLD (Response Time) 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Grouping Details Also provides Excel for DIY at bottom Select metric Select group Sort Color for quality Also provides Excel for DIY at bottom 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Recent Transoceanic trends 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

By Monitoring Site 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

CERN Monitoring TLDs 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

ESnet bytes accepted by site for Jan ‘98 Exchanges LBL/ESnet After eliminate exchanges and LBL (5Mbps averaged over month) and ESnet the top 10 are: LANL, LLNL, BNL,FNAL, DOE1, ORNL, CEBAF, SLAC, PNL, GA 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

US HENP Traffic Growth Exponential growth from 3-6% 3/4/98 Need SNMP access to router, i.e administrative rights, so aither do for site’s external router links or is done by ISP (in this case ESnet) In some controlled cases e.g. CERN transatlantic link can look at traffic carried note relation between it and bandwidth available at bottleneck when congestion appears LBL 6% growth, BNL 5.4% growth, SLAC 4.8% growth, ANL 4.1% growth, FNAL 3.2% growth, CEBAF 3.1% growth ANL traffic probably going directly into ATM cloud and so not being measured at router. Note CEBAF growth as turns on 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Multi Router Traffic Grapher (MRTG) CERN-US E1(2Mbps) link Added 2nd 2Mbps link Can see weekend vs weekday utilization differences This link is heavily used The other E1 links is more lightly used. They will balance better when the two US ends are colocated. Monthly peak/average for CERN for cgate1 (is about 3 to 4), for SLAC is about 16). Peak/average ratio may be useful for indicating link congestion Useful to compare peaks with capacity available Can compare monthly average with ESnet monthly Octets accepted from sites Need summary of how often link is close to full utilization. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Traffic Volume for Germany (DFN) DFN T1 Utilization 15 Jan ‘98 (5 min averages) Green = to US Blue = from US DFN T1 Utilization for 15 Jan ‘98 (5 min averages) # of 2 min periods in Dec-96 with peak utilization > y % Upper graph from MRTG, shows line reasonably loaded Lower graph gives an idea of how often the line is at 100%l utilization etc. Number of 2 minute intervals in a month when the link was observed to be busier than x% for a 2 minute period Area under curve is an indicator of how busy/saturated the link is. Note traffic imbalance typically factor of 2-3x going out as coming in, also the loads are not typically during U.S. prime hours (they are sucking us dry) From US # Samples 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt To US

Capacity/Load Ratios Looking at the link capacity/average load Most ESnet links show ratios of a few to several tens The international links (CERN-Perryman (~4), DFN (~5), Italy (~4), KEK (~10), Canada (15)) show ratios of 4-15 The worst link appears to be the MAE-W-ESnet link at about 1.5 ratio However this may not be the bottleneck link For shared networks without some form of quality of service guarantees, one way to ensure excellent performance is to over-provision the links so they have sufficient capacity to handle instantaneous loads. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Bottlenecks Identification Then need to work on: Traceroute from/to multiple sites can identify common path segments in the maps Can see onset of losses with traceping Pathchar can identify bottlenecks Then need to work on: avoiding bottlenecks (new peering) getting bottleneck owners to improve this is difficult, lots of potential bottlenecks, bottlenecks move, not under our control 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

TracePing (Oxford) Muliple routes seen 3/4/98 traceroute to remote sites each hour, then pings along route Archives data Can see route changes short & long term and onset of problems in time & space Written by John MacAllister of Oxford U Being converted from VMS/DCL to Windows/Perl Muliple routes seen 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Traceroute Reverse traceroute servers Traceping TopologyMap Ellipses show node on route Open ellipse is measurement node Blue ellipse no reachable Keeping history From TRIUMF 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

GUI Traceroute (e.g. VisualRoute) 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Summary Deployment Development ESnet/HENP has 14 Collection sites in 8 countries collecting data on > 500 links involving 22 countries XIWT/IPWT deployed ~ 10 collection sites using PingER tools 600MB/month/link, 6 bps/link, .25 FTE @ analysis site, 1.5-2.5 FTE on analysis HEPNRC gathering, archiving Long term reports being ported to HEPNRC from SLAC Long term analysis today usually requires tool like SAS XIWT/IPWT want to: Measure performance of members' own networks Get tests to validate and understand what to recommend to other commercial customers and for what purposes. Build a community within XIWT so can evolve it to address harder issues. Have chosen the PingER tools for deployment Collection sites (mar-98): West Group, Bell South (2), Digital (2), HP, Intel, Hughes, NIST, SBC They are looking for an analysis/archive site SAS/Oracle can cost several tens of thousands of dollars Need indexing for rapid lookup Usually a bit of overkill as an analysis tool (don’t use much in the way of sophisticated statistics). 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Summary Deployment Development Internet Performance Performance within ESnet is good Performance between ESnet & other sites is poor to very poor on average one of main causes is congestion points, so peering is critical Intercontinental performance is very poor to bad ESnet traffic accepted from major HENP labs growing by 3-6% per month Response time improving by 1-2% / month Packet loss improving between SLAC & other sites by 3% / month 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Summary Deployment Development Internet Performance (continued): Links to sites outside N. America vary from good (KEK) to bad Some of the bad sites are to be expected, e.g. FSU, China, Czeck Republic, some surprises such as UK CERN, France, Germany acceptable to poor Provide monthly summary tables with lots of statistical measures to allow faster generation of long term reports, and more robust metrics Extend grouping, e.g. by AS, country, time zones crossed, more geographic regions, user selectable, by experiment, by community, by collection site Summaries (c.f. Weather Map, top 10s, weekly, Consumer Reports) 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Summary Next Steps Deployment Development Internet Performance Improve tools: Make long term reports at Analysis site available & understandable Look into prediction (extrapolations, develop models, configure and validate with data) Pursue IETF Surveyor & NIMI deployment Need consistent measurements of loading of link, e.g. MRTG both at end sites external routers, also ATM switches, and internal (e.g. ESnet) routers. Need to know capacity of link being monitored. Is anyone interested in passive measurements, i.e. measuring the performance of real traffic, need to correlate with other measurements. Need thruput measures and correlations to simpler measures such as ping response or packet loss, e.g. via NIMI. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

National Internet Measurement Infrastructure (NIMI) Secure, scalable infrastructure for scheduling monitoring, gathering data Minimal amount of human intervention Inexpensive probe built on PC FreeBSD platform Dynamic - can add/modify measurement suites, initially includes: Traceroute TReno - measures bulk transfer thruput Poip - one way ping Based on Vern Paxson’s NPD work - it ran at 30 sites and 1994/1995 Security uses public key pairs for authentication, and encryption By design decentralized control, simple configuration and maintenance FNAL claim it took a couple of hours to install the software after that folks from PSC administer by remote control Hardware cheap ($2-3K) for 200MHz, 64MB, 4GB, Enet. Modem + optional GPS Standard dedicated platform, reduces concerns of biases caused by server loading PSC, LBNL, FNAL in place, SLAC’s being configured, working with CERN (CH), RAL (UK), KEK (JP) Want to get NIMIs placed at strategic network points to get a better idea of overall network performance 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Asymmetric One-way Delays 20% Advanced to U Chicago U Chicago to Advanced Loss Loss 0% 300ms Delay Delay Nb one way response time very important for voice, need to be better than 100 msec or people start stepping into one another’s conversations. PC Hardware with GPS located at ANS & 23 CSG partner sites Measure one way loss & response time using clock synchronization, metrics defined by IETF/IPPM 8 sites now operational, monitor 56 paths ((N-1)*N) Results show can have big asymmetries (asymmetric loading & routing) Willing to deploy (at their cost) at 5 DOE sites For more see http://www.advanced.org/csg-ippm/ 0ms 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt 24

NIMI Deployed at PSC, LBL, FNAL, platforms being configured at SLAC & CERN As NIMI becomes more real will start to use as infrastructure for IPPM Surveyors Security allows full policy control over any box you own or delegation of all or subsets uses ACLs with authentication for requests, and encryption to prevent sniffing Host id is accomplished through use of public key/private key technology. Authentication and encryption uses RSA reference library Looking at additional security options to better support its use outside the U.S. Can provide 2 distributions one with full security, one with none, looking at possible support for 40 bit keys (crackable in 2 hours on PC, but session probably over by then, and use new key) or in early deployment simply turn off encryption. 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

Summary Lots of collaboration: SLAC & HEPNRC Deployment Development Internet Performance Next Steps Lots of collaboration: SLAC & HEPNRC 14 collection sites, ~ 400 remote sites Collection site tools CERN & CNAF/ICFA Oxford/TracePing MapPing/MAPNet/NLANR TRIUMF Traceroute topology Map NIMI/LBNL & Surveyor/IETF XIWT/IPWT Talks at IETF, XIWT, ICFA, ESCC ... 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt

More Information ICFA Monitoring WG home page (links to status report, meeting notes, how to access data, and code) http://www.slac.stanford.edu/xorg/icfa/ntf/home.html WAN Monitoring at SLAC has lots of links http://www.slac.stanford.edu/comp/net/wan-mon.html Tutorial on WAN Monitoring http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html MapPing Tool: http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html 3/4/98 \\pcbackup\users\cottrell\icfa\icfa-mar98.ppt