High Performance Network Monitoring for UltraLight

Slides:



Advertisements
Similar presentations
Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
Advertisements

BZUPAGES.COM 1 User Datagram Protocol - UDP RFC 768, Protocol 17 Provides unreliable, connectionless on top of IP Minimal overhead, high performance –No.
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.
1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil.
1 SLAC Internet Measurement Data Les Cottrell, Jerrod Williams, Connie Logg, Paola Grosso SLAC, for the ISMA Workshop, SDSC June,
1 Evaluation of Techniques to Detect Significant Performance Problems using End-to-end Active Network Measurements Les Cottrell, SLAC 2006 IEEE/IFIP Network.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 Tools for High Performance Network Monitoring Les Cottrell, Presented at the Internet2 Fall members Meeting, Philadelphia, Sep
Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.
Sven Ubik, CESNET TNC2004, Rhodos, 9 June 2004 Performance monitoring of high-speed networks from NREN perspective.
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
Network Monitoring School of Electronics and Information Kyung Hee University. Choong Seon HONG Selected from ICAT 2003 Material of James W. K. Hong.
KEK Network Qi Fazhi KEK SW L2/L3 Switch for outside connections Central L2/L3 Switch A Netscreen Firewall Super Sinet Router 10GbE 2 x GbE IDS.
PingER: Research Opportunities and Trends R. Les Cottrell, SLAC University of Malaya.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
Comparison of Public End-to-End Bandwidth Estimation tools on High- Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
Integration of AMP & Tracenol By: Qasim Bilal Lone.
1 ESnet/HENP Active Internet End-to-end Performance & ESnet/University performance Les Cottrell – SLAC Presented at the ESSC meeting Albuquerque, August.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
IEPM-BW: Bandwidth Change Detection and Traceroute Analysis and Visualization Connie Logg, Joint Techs Workshop February 4-9, 2006.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 High Performance Network Monitoring Challenges for Grids Les Cottrell, SLAC Presented at the International Symposium on Grid Computing 2006, Taiwan
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
1 WAN Monitoring Prepared by Les Cottrell, SLAC, for the Joint Engineering Taskforce Roadmap Workshop JLab April 13-15,
1 Lessons Learned Monitoring Les Cottrell, SLAC ESnet R&D Advisory Workshop April 23, 2007 Arlington, Virginia Partially funded by DOE and by Internet2.
Sven Ubik, Aleš Friedl CESNET TNC 2009, Malaga, Spain, 11 June 2009 Experience with passive monitoring deployment in GEANT2 network.
1 IEPM / PingER project & PPDG Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99 Partially funded by DOE/MICS Field Work Proposal on.
1 Terapaths: DWMI: Datagrid Wide Area Monitoring Infrastructure Les Cottrell, SLAC Presented at DoE PI Meeting BNL September
Connect communicate collaborate Performance Metrics & Basic Tools Robert Stoy, DFN EGI TF, Madrid September 2013.
1 Performance Network Monitoring for the LHC Grid Les Cottrell, SLAC International ICFA Workshop on Grid Activities within Large Scale International Collaborations,
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Toward a Measurement Infrastructure. Warren Matthews (SLAC) Presented at the e2e Workshop Miami, FL, February 2003.
1 High Performance Network Monitoring Challenges for Grids Les Cottrell, Presented at the Internation Symposium on Grid Computing 2006, Taiwan
Topics discussed in this section:
Les Cottrell & Yee-Ting Li, SLAC
Lessons Learned Monitoring the WAN
Monitoring 10Gbps and beyond
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
A Deterministic End to End Performance Verification Architecture
Networking between China and Europe
Hubs Hubs are essentially physical-layer repeaters:
Deployment & Advanced Regular Testing Strategies
Tools for High Performance Network Monitoring
Terapaths: DWMI: Datagrid Wide Area Monitoring Infrastructure
High Speed File Replication
Using Netflow data for forecasting
Transport Protocols Relates to Lab 5. An overview of the transport protocols of the TCP/IP protocol suite. Also, a short discussion of UDP.
Connie Logg, Joint Techs Workshop February 4-9, 2006
ESnet Network Measurements ESCC Feb Joe Metzger
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
Wide Area Networking at SLAC, Feb ‘03
Connie Logg February 13 and 17, 2005
End-to-end Anomalous Event Detection in Production Networks
Experiences in Traceroute and Available Bandwidth Change Analysis
Evaluation of Techniques to Detect Significant Performance Problems using End-to-end Active Network Measurements Mahesh Chhaparia & Les Cottrell, SLAC.
High Performance Network Monitoring for UltraLight
Experiences in Traceroute and Available Bandwidth Change Analysis
Forecasting Network Performance
Network Performance Measurement
Advanced Networking Collaborations at SLAC
IEPM. Warren Matthews (SLAC)
Wide-Area Networking at SLAC
Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.
pathChirp Efficient Available Bandwidth Estimation
pathChirp Efficient Available Bandwidth Estimation
Summer 2002 at SLAC Ajay Tirumala.
Presentation transcript:

High Performance Network Monitoring for UltraLight Connie Logg, Les Cottrell, SLAC Presented at UltraLight meeting at Caltech 24-26 October, 2005 www.slac.stanford.edu/grp/scs/net/talk05/ultralight-oct05.ppt Partially funded by DOE/MICS for Internet End-to-end Performance Monitoring (IEPM)

Goals Develop/deploy/use a high performance network monitoring tailored to HEP needs (tiered site model): Evaluate, recommend, integrate best measurement probes including for >=10Gbps & dedicated circuits Develop and integrate tools for long-term forecasts Develop tools to detect significant/persistent loss of network performance, AND provide alerts Integrate with other infrastructures, share tools, make data available

Using Active IEPM-BW measurements Focus on high performance for a few hosts needing to send data to a small number of collaborator sites, e.g. HEP tiered model Makes regular measurements with tools, now supports Ping (RTT, connectivity), traceroute pathchirp, ABwE, pathload (packet pair dispersion) iperf (single & multi-stream), thrulay, Bbftp, bbcp (file transfer applications) Looking at GridFTP but complex requiring renewing certificates Lots of analysis and visualization Running at major HEP sites: CERN, SLAC, FNAL, BNL, Caltech to about 40 remote sites http://www.slac.stanford.edu/comp/net/iepm-bw.slac.stanford.edu/slac_wan_bw_tests.html

Development Improved management: easier install/updates, more robust, less manual attention New probes: Thrulay, several packet pair dispersion, pathneck, look at owamp, integration with OSCARS Event detection and alerts Visualization (new plots, MonALISA integration) Pathneck is a packet train example, suggest deployment in UltraLight Cache the optimum settings and only measure for non slow-start

Active problems Packet pair problems at 10Gbits/s, timing in host and NIC offloading Use packet trains, turn off NIC offloading, integrate with NIC Recommend trying pathneck on UltraLight Traffic required for throughput (e.g. > 5GBytes, 1 minute), also requires scheduling Cache optimum settings only measure for non slowstart E.g. Quick iperf http://moat.nlanr.net/PAM2003/PAM2003papers/3801.pdf Use bwctl to avoid interference Add OSCARS scheduling to reserve paths

Passive benefits Evaluating effectiveness of using passive (Netflow) No passwords/keys/certs, no reservations, no extra traffic, real applications, real partners… ~30K large (>1MB) flows/day at SLAC border with ~ 70 remote sites 90% sites have no seasonal variation so only need typical value In a month 15 sites may have enough flows to use seasonal methods Validated that results agree with active, flow aggregation easy

But Apps use dynamic ports, need to use indicators to ID interesting apps Throughputs often depend on non-network factors: Host interface speeds (DSL, 10Mbps Enet, wireless) Configurations (window sizes, hosts) Applications (disk/file vs mem-to-mem) Looking at distributions by site, often multi-modal Provide percentiles, max, count etc. Need access to border router Real user driven file transfer results available for about a dozen critical sites. Estimates within a factor of 2-10 for most, can characterize, e.g. max, median,IQR BBFTP: In2p3, DL, ANL, Milan = 1 mode (clear results); fzk, DESY, CERN = 2 modes; APAN=3 modes; TRIUMF, NASA, CESnet, BNL, Stanford = big data spread

Forecasting Over-provisioned paths should have pretty flat time series Short/local term smoothing Long term linear trends Seasonal smoothing But seasonal trends (diurnal, weekly need to be accounted for) on about 10% of our paths Use Holt-Winters triple exponential weighted moving averages Predicting how long a file transfer will take, requires forecasting network and application performance. However, such forecasting is beset with problems. These include seasonal (e.g. diurnal) variations in the measurements, the increasing difficulty of making accurate active low network intrusiveness measurements especially on high speed (>1 Gbits/s) networks and with Network Interface Card (NIC) offloading, the intrusivenss of making more realistic active measurements on the network, the differences in network and large file transfer performance, and the difficulty of getting sufficient relevant passive measurements to enable forecasting. We will discuss each of these problems, compare and contrast the effectiveness of various solutions, look at how some of the methods may be combined, and identify practical ways to move forward.

Event detection Thrulay SLAC to Caltech U Florida min-RTT Affects multi-metrics Event Packet pair & ping RTT Capacity Problem caused by fan failure in a DWDM multiplexor between Stanford and Sunnyvale Available bandwidth Affects multi-paths Change in min-RTT

Alerts, e.g. Often not simple, simple RTT steps often fail: <5% route changes cause noticeable thruput changes ~40% thruput changes NOT associated with route change Use multiple metrics User cares about throughput SO need iperf/thrulay &/or a file transfer app, BUT heavy net impact Packet pair available bandwidth, lightweight but noisy, needs timing (hard at > 1Gbits/s and TCP Offload in NICs) Min ping RTT & route changes may have no effect on throughput Look at multiple routes Fixed thresholds poor (need manual setting), need automation Some routes have seasonal effects

Collaborations HEP sites: BNL, Caltech, CERN, FNAL, SLAC, NIIT ESnet/OSCARS – Chin Guok BNL/QoS- Dantong Yu Development – Maxim Grigoriev/FNAL, NIIT/Pakistan Integrate our traceroute analysis/visualization into AMP (NLANR), share measurements – Tony McGregor Integrate IEPM measurements into MonALISA – Iosif Legrand/Caltech/CERN Probe developers: Shalunov/I2, Vinay Ribiero/Rice, Bill Allcock/ANL, Andy Hanushevsky/SLAC

More Information Case studies of performance events IEPM-BW site www.slac.stanford.edu/grp/scs/net/case/html/ IEPM-BW site www-iepm.slac.stanford.edu/ www.slac.stanford.edu/comp/net/iepm-bw.slac.stanford.edu/slac_wan_bw_tests.html OSCARS measurements http://www-iepm.slac.stanford.edu/dwmi/oscars/ Forecasting and event detection www.acm.org/sigs/sigcomm/sigcomm2004/workshop_papers/nts26-logg1.pdf Traceroute visualization www.slac.stanford.edu/cgi-wrap/pubpage?slac-pub-10341 http://monalisa.cacr.caltech.edu/ Clients=>MonALISA Client=>Start MonALISA GUI => Groups => Test => Click on IEPM-SLAC Pathneck packet train method http://www.cs.cmu.edu/~hnn/pathneck/

Extra Slides

Achievable Throughput Use TCP or UDP to send as much data as can memory to memory from source to destination Tools: iperf (bwctl/I2), netperf, thrulay (from Stas Shalunov/I2), udpmon … Pseudo file copy: Bbcp and GridFTP also have memory to memory mode

Iperf vs thrulay Iperf has multi streams Maximum RTT Iperf has multi streams Thrulay more manageable & gives RTT They agree well Throughput ~ 1/avg(RTT) Average RTT RTT ms Minimum RTT Achievable throughput Mbits/s 1/RTT since with one stream thrulay does not saturate the bandwidth (window has not grown large enough for most of the time), so pipe is NOT full, so have to wait for ACKs and that depends on RTT.

BUT… At 10Gbits/s on transatlantic path Slow start takes over 6 seconds To get 90% of measurement in congestion avoidance need to measure for 1 minute (5.25 GBytes at 7Gbits/s (today’s typical performance) Needs scheduling to scale, even then … It’s not disk-to-disk or application-to application So use bbcp, bbftp, or GridFTP

AND … For testbeds such as UltraLight, UltraScienceNet etc. have to reserve the path So the measurement infrastructure needs to add capability to reserve the path (so need API to reservation application) OSCARS from ESnet developing a web services interface (http://www.es.net/oscars/): For lightweight have a “persistent” capability For more intrusive, must reserve just before make measurement

Visualization & Forecasting

Visualization http://monalisa.cacr.caltech.edu/ MonALISA (monalisa.cacr.caltech.edu/) Caltech tool for drill down & visualization Access to recent (last 30 days) data For IEPM-BW, PingER and monitor host specific parameters Adding web service access to ML SLAC data http://monalisa.cacr.caltech.edu/ Clients=>MonALISA Client=>Start MonALISA GUI => Groups => Test => Click on IEPM-SLAC

ML example

Esnet-LosNettos segment in the path Changes in network topology (BGP) can result in dramatic changes in performance Hour Samples of traceroute trees generated from the table Los-Nettos (100Mbps) Remote host Snapshot of traceroute summary table Notes: 1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:00 2. ESnet/GEANT working on routes from 2:00 to 14:00 3. A previous occurrence went un-noticed for 2 months 4. Next step is to auto detect and notify Drop in performance (From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech ) Back to original path Dynamic BW capacity (DBC) Changes detected by IEPM-Iperf and AbWE Mbits/s Available BW = (DBC-XT) Cross-traffic (XT) Esnet-LosNettos segment in the path (100 Mbits/s) ABwE measurement one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am

Alerting Have false positives down to reasonable level, so sending alerts Experimental Typically few per week. Currently by email to network admins Adding pointers to extra information to assist admin in further diagnosing the problem, including: Traceroutes, monitoring host parms, time series for RTT, pathchirp, thrulay etc. Plan to add on-demand measurements (excited about perfSONAR)

Integration Integrate IEPM-BW and PingER measurements with MonALISA to provide additional access Working to make traceanal a callable module Integrating with AMP When comfortable with forecasting, event detection will generalize

Passive - Netflow Need replacement for active packet pair and throughput measurements. Evaluating whether we can use Netflow for this.

Netflow et. al. Switch identifies flow by sce/dst ports, protocol Cuts record for each flow: src, dst, ports, protocol, TOS, start, end time Collect records and analyze Can be a lot of data to collect each day, needs lot cpu Hundreds of MBytes to GBytes No intrusive traffic, real: traffic, collaborators, applications No accounts/pwds/certs/keys No reservations etc Characterize traffic: top talkers, applications, flow lengths etc. Internet 2 backbone http://netflow.internet2.edu/weekly/ SLAC: www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-netflow.html NetraMet, SCAMPI are a couple of non-commercial flow projects. IPFIX is an IETF standardization effort for Netflow type passive monitoring.

Typical day’s flows Very much work in progress Look at SLAC border >100KB flows ~ 28K flows/day ~ 75 sites with > 100KByte bulk-data flows Few hundred flows > GByte

Forecasting? Collect records for several weeks Filter 40 major collaborator sites, big (> 100KBytes) flows, bulk transport apps/ports (bbcp, bbftp, iperf, thrulay, scp, ftp Divide by remote site, aggregate parallel streams Fold data onto one week, see bands at known capacities and RTTs ~ 500K flows/mo

Netflow et. al. Peaks at known capacities and RTTs RTTs might suggest windows not optimized

How many sites have enough flows? In May ’05 found 15 sites at SLAC border with > 1440 (1/30 mins) flows Enough for time series forecasting for seasonal effects Three sites (Caltech, BNL, CERN) were actively monitored Rest were “free” Only 10% sites have big seasonal effects in active measurement Remainder need fewer flows So promising

Compare active with passive Predict flow throughputs from Netflow data for SLAC to Padova for May ’05 Compare with E2E active ABwE measurements

Netflow limitations Use of dynamic ports. GridFTP, bbcp, bbftp can use fixed ports P2P often uses dynamic ports Discriminate type of flow based on headers (not relying on ports) Types: bulk data, interactive … Discriminators: inter-arrival time, length of flow, packet length, volume of flow Use machine learning/neural nets to cluster flows E.g. http://www.pam2004.org/papers/166.pdf Aggregation of parallel flows (not difficult) SCAMPI/FFPF/MAPI allows more flexible flow definition See www.ist-scampi.org/ Use application logs (OK if small number) For FTP port 21 only used as a control channel.

More challenges Throughputs often depend on non-network factors: Host interface speeds (DSL, 10Mbps Enet, wireless) Configurations (window sizes, hosts) Applications (disk/file vs mem-to-mem) Looking at distributions by site, often multi-modal Predictions may have large standard deviations How much to report to application

Conclusions Traceroute dead for dedicated paths Some things continue to work Ping, owamp Iperf, thrulay, bbftp … but Packet pair dispersion needs work, its time may be over Passive looks promising with Netflow SNMP needs AS to make accessible Capture expensive ~$100K (Joerg Micheel) for OC192Mon

More information Comparisons of Active Infrastructures: www.slac.stanford.edu/grp/scs/net/proposals/infra-mon.html Some active public measurement infrastructures: www-iepm.slac.stanford.edu/ e2epi.internet2.edu/owamp/ amp.nlanr.net/ www-iepm.slac.stanford.edu/pinger/ Capture at 10Gbits/s www.endace.com (DAG), www.pam2005.org/PDF/34310233.pdf www.ist-scampi.org/ (also MAPI, FFPF), www.ist-lobster.org Monitoring tools www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html www.caida.org/tools/ Google for iperf, thrulay, bwctl, pathload, pathchirp

Extra Slides Follow

Visualizing traceroutes One compact page per day One row per host, one column per hour One character per traceroute to indicate pathology or change (usually period(.) = no change) Identify unique routes with a number Be able to inspect the route associated with a route number Provide for analysis of long term route evolutions Route # at start of day, gives idea of route stability Multiple route changes (due to GEANT), later restored to original route Period (.) means no change

Pathology Encodings Change but same AS No change Probe type Change in only 4th octet End host not pingable Hop does not respond Stutter Multihomed ICMP checksum ! Annotation (!X)

Navigation traceroute to CCSVSN04.IN2P3.FR (134.158.104.199), 30 hops max, 38 byte packets 1 rtr-gsr-test (134.79.243.1) 0.102 ms … 13 in2p3-lyon.cssi.renater.fr (193.51.181.6) 154.063 ms !X #rt# firstseen lastseen route 0 1086844945 1089705757 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 1 1087467754 1089702792 ...,192.68.191.83,171.64.1.132,137,...,131.215.xxx.xxx 2 1087472550 1087473162 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 3 1087529551 1087954977 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 4 1087875771 1087955566 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,(n/a),131.215.xxx.xxx 5 1087957378 1087957378 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 6 1088221368 1088221368 ...,192.68.191.146,134.55.209.1,134.55.209.6,...,131.215.xxx.xxx 7 1089217384 1089615761 ...,192.68.191.83,137.164.23.41,(n/a),...,131.215.xxx.xxx 8 1089294790 1089432163 ...,192.68.191.83,137.164.23.41,137.164.22.37,(n/a),...,131.215.xxx.xxx

History Channel

AS’ information

Top talkers by application/port Hostname 1 100 10000 Volume dominated by single Application - bbcp MBytes/day (log scale)

Flow sizes SNMP Real A/V AFS file server 60% of TCP flows less than 1 second Would expect TCP streams longer lived But 60% of UDP flows over 10 seconds, maybe due to heavy use of AFS Heavy tailed, in ~ out, UDP flows shorter than TCP, packet~bytes 75% TCP-in < 5kBytes, 75% TCP-out < 1.5kBytes (<10pkts) UDP 80% < 600Bytes (75% < 3 pkts), ~10 * more TCP than UDP Top UDP = AFS (>55%), Real(~25%), SNMP(~1.4%)

Passive SNMP MIBs

Apply forecasts to Network device utilizations to find bottlenecks Get measurements from Internet2/ESnet/Geant perfSONAR project ISP reads MIBs saves in RRD database Make RRD info available via web services Save as time series, forecast for each interface For given path and duration forecast most probable bottlenecks Use MPLS to apply QoS at bottlenecks (rather than for the entire path) for selected applications NSF proposal SONET monitoring (from Endace): The PHYMON occupies 1U (OC3/12) or 2U (OC48/192) of vertical rack space and is equipped with two 10/100/1000 copper Ethernet interfaces for control and reporting via LAN. Key Features Monitors up to two OC3/OC12/OC48/OC192 network links Detects link-layer failures: LOS-S, LOF-S, AIS-L, REI-L, RDI-L, AIS-P, LOP_P, UNEQ-P, REI-P, RDI-P Derive errors: CV, ES, ESA, ESB, SES and UAS according to Bellcore GR-253, Issue 2 Rev 2 standard Sends SNMP traps for all failures and error thresholds according to user configuration Reports current status in real time via telnet, ssh or serial connection Reports accumulated status for 15m, 1h, 8h, 24h, 7d intervals Retains historical data for 35 days Supplies all the underlying data for SNMP SONET MIB (RFC2558)

Passive – Packet capture

10G Passive capture Endace (www.endace.net ): OC192 Network Measurement Cards = DAG 6 (offload vs NIC) Commercial OC192Mon, non-commercial SCAMPI Line rate, capture up to >~ 1Gbps Expensive, massive data capture (e.g. PB/week) tap insertion D.I.Y. with NICs instead of NMC DAGs Need PCI-E or PCI-2DDR, powerful multi CPU host Apply sampling See www.uninett.no/publikasjoner/foredrag/scampi-noms2004.pdf Also have tcpdump/libpcap to capture CoralReef captures and analyzes packets NetraMet/Flowscan, latter analyzes and reports MAPI/FFPF with Intel IXP1200, plus LOBSTER Endace Key features Endace DAGMON 3U rackmount server system Pair of Single Channel DAG6.2SE OC192c Network Measurement Cards CPU, memory and disks according to configuration options Linux and DAG device drivers preinstalled, FreeBSD available Conditioned clock with 1PPS input and local synchronization capability Captures up to 1.064GBps of network traffic Applications Network monitoring systems Traffic characterization and delay measurement Packet header capture Network delay measurements Network security applications, such as Intrusion Detection Systems DAG card capable of full-packet capture, uses zero copy (shared memory, reduces interrupts), has high precision time stamp, large static circular buffer DAGs expensive (list for 10GE NMC ~ $50K) Intel Pro 1GE NIC in dual 1.8GHz AMD Athlon, @ 700Mbits/s ~ 50 loss free flows can be captured with libpcap Over DAG ~ 100

LambdaMon / Joerg Micheel NLANR Tap G709 signals in DWDM equipment Filter required wavelength Can monitor multiple λ‘s sequentially 2 tunable filters

LambdaMon Place at PoP, add switch to monitor many fibers More cost effective Multiple G.709 transponders for 10G Low level signals, amplification expensive Even more costly, funding/loans ended …

Ping/traceroute Ping still useful (plus ca reste …) Is path connected? RTT, loss, jitter Great for low performance links (e.g. Digital Divide), e.g. AMP (NLANR)/PingER (SLAC) Nothing to install, but blocking OWAMP/I2 similar but One Way But needs server installed at other end and good timers Traceroute Needs good visualization (traceanal/SLAC) Little use for dedicated λ layer 1 or 2 However still want to know topology of paths

Packet Pair Dispersion Bottleneck Min spacing At bottleneck Spacing preserved On higher speed links Send packets with known separation See how separation changes due to bottleneck Can be low network intrusive, e.g. ABwE only 20 packets/direction, also fast < 1 sec From PAM paper, pathchirp more accurate than ABwE, but Ten times as long (10s vs 1s) More network traffic (~factor of 10) Pathload factor of 10 again more http://www.pam2005.org/PDF/34310310.pdf IEPM-BW now supports ABwE, Pathchirp, Pathload