Presentation is loading. Please wait.

Presentation is loading. Please wait.

/afs/slac/u/sf/cottrell/talk/escc/oct971 ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRCSLAC HEPNRC,

Similar presentations


Presentation on theme: "/afs/slac/u/sf/cottrell/talk/escc/oct971 ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRCSLAC HEPNRC,"— Presentation transcript:

1 /afs/slac/u/sf/cottrell/talk/escc/oct971 ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRCSLAC HEPNRC, cottrell@slac.stanford.edudem@hep.net Presented at the ESCC Meeting, JLAB, Oct 1997 JLAB

2 /afs/slac/u/sf/cottrell/talk/escc/oct972 Outline of Talk What happened to the NMTF/NMFG? What are we measuring? How are we measuring? Tools we are using/developing Coordination with others Next Steps Summary

3 /afs/slac/u/sf/cottrell/talk/escc/oct973 What happened to the NMTF/NMFG? It evolved –Some of original members (BNL & ORNL) were unable to continue effort –SLAC& HEPNRC retained focus on monitoring –ICFA concerned about impact of network performance on HENP research Created NTF with various WG, one on Monitoring More focus on HENP issues and International links Embraced work done by NMTF/NMFG and supported continued development Brought in new partners, in particular INFN, CERN as well as other collection sites

4 /afs/slac/u/sf/cottrell/talk/escc/oct974 Mission etc. of the ICFA-NTF WG on Monitoring Mission of Group: –Obtain as uniform picture as possible of the present performance of the connectivity used by the ICFA community Two meetings so far, CHEP97 (Apr-97), & Santa Fe (Sep-97) Produced an interim status report for Sep-97 Will update for Dec-97, with a final report Apr-98.

5 /afs/slac/u/sf/cottrell/talk/escc/oct975 Our Main Metric is Ping “Universally available”, easy to understand –no software for clients to install Low network impact Provides loss, response time, reachability, unpredictability select hosts carefully, concerns over routers, loaded hosts etc. (provide guidelines)provide guidelines does provide useful measuresuseful measures

6 /afs/slac/u/sf/cottrell/talk/escc/oct976 Ping Response Time vs Bytes

7 /afs/slac/u/sf/cottrell/talk/escc/oct977 Ping Response vs Web Response HTTP GET Response (ms) Minimum Ping Response (ms)

8 /afs/slac/u/sf/cottrell/talk/escc/oct978 Method –Measurement Each Collection site keeps list of remote hosts to ping at sites it is interested in Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings Min separation of pings is 1 second, timeout 20 seconds Throw away first ping Measure response, packet loss, host unreachable (no answer to any ping) Record data and make available

9 /afs/slac/u/sf/cottrell/talk/escc/oct979 Architecture Three Types of Sites –Remote Sites - need only to respond to ping packets –Collecting Sites Collecting Data: Perl Script Pings Nodes, Records Data in common documented format Serving Data: CGI/Perl Script makes Data Available to Analysis Sites WWW CGI tools make reports available –Analysis Sites Retrieving Data: Perl Script Retrieves Data from Collecting Sites Analysis: SAS Program Analyzes Data and Generates Graphs Reports: WWW Form Makes Customized Reports Available

10 /afs/slac/u/sf/cottrell/talk/escc/oct9710 Architecture WWW Analysis Collecting Remote HTTP Pings E.g. HEPNRC E.g. SLAC Archive Reports & Data Cache

11 /afs/slac/u/sf/cottrell/talk/escc/oct9711 Available Tools - Data Collection Collect data ( timeping ) –HEPNRC rearchitected, developed & documented –Deployed at 12 sites in 6 countriesDeployed at 12 sites in 6 countries ARM, BNL, CERN, CMU, DoE/GMTN, HEPNRC/FNAL, INFN/CNAF. KEK, Hungary, RAL, SLAC, UMD –DESY, IN2P3, TRIUMF, MSU, Beijing also expressed interest, plus commercial sites Data available ( pingdata ) in common format pingdataformat –Data collected available from collection site via HTTP –Allows data for specific times to be retrieved

12 /afs/slac/u/sf/cottrell/talk/escc/oct9712 Current Deployment ESnet Site (monitored from SLAC) N. American Site ( “ “ ) International Site ( “ “ ) Monitoring Site HEPNRC/FNAL RAL INFN/CNAF CERN RMKI/KFKIBNL KEK CMU UMD SLAC DESY

13 /afs/slac/u/sf/cottrell/talk/escc/oct9713 Analysis / Archive Site Gathers & archives data –HEPNRC gathers data from collection sites a few times daily –Archives the data (200 Mbytes/month) –Works with collection sites to resolve problems –Provide Web access to archive data via form (ping_data.pl)ping_data.pl

14 /afs/slac/u/sf/cottrell/talk/escc/oct9714 Access to Raw Data

15 /afs/slac/u/sf/cottrell/talk/escc/oct9715 Analysis / Archive Site Gathers & archives data –HEPNRC gathers data from collection sites a few times daily –Archives the data (200 Mbytes/month) –Works with collection sites to resolve problems –Provide Web access to archive data via form (ping_data.pl)ping_data.pl Provide Web form to allow simple plotting (graph_pings.pl), uses SAS for speedgraph_pings.pl

16 /afs/slac/u/sf/cottrell/talk/escc/oct9716 Form to Select Analysis Graphs

17 /afs/slac/u/sf/cottrell/talk/escc/oct9717

18 /afs/slac/u/sf/cottrell/talk/escc/oct9718 Analysis Tools for Collection Sites Short-term analysis / reports –Recent data (e.g. last 30 days cached) Web sortable table of latest measurements, colored for qualityquality

19 /afs/slac/u/sf/cottrell/talk/escc/oct9719 Ping Loss Quality 0 -1% Good, 1-5% Acceptable, 5-12% Poor, 12-25% Poor, > 25% Unusable Similar to Internet Weather Report ( 12%)

20 /afs/slac/u/sf/cottrell/talk/escc/oct9720 Analysis Tools for Collection Sites Short-term analysis / reports –Recent data (e.g. last 30 days cached) Web sortable table of latest measurements, colored for quality, with output (TSV) for Excel (connectivity.pl)qualityconnectivity.pl

21 /afs/slac/u/sf/cottrell/talk/escc/oct9721 Latest Ping Measurements

22 /afs/slac/u/sf/cottrell/talk/escc/oct9722 Raw Data from last 24 Hours

23 /afs/slac/u/sf/cottrell/talk/escc/oct9723 Latest Ping Measurements

24 /afs/slac/u/sf/cottrell/talk/escc/oct9724 Ping Performance for Last 180 Days

25 /afs/slac/u/sf/cottrell/talk/escc/oct9725 Analysis Tools for Collection Sites Short-term analysis / reports –Recent data (e.g. last 30 days cached) Web sortable table of latest measurements, colored for quality, with output (TSV) for Excel (connectivity.pl)qualityconnectivity.pl Web form to select sites and time frames to be plotted (ping_data_plot.pl)ping_data_plot.pl

26 /afs/slac/u/sf/cottrell/talk/escc/oct9726 Request Plot of Collection Site Data

27 /afs/slac/u/sf/cottrell/talk/escc/oct9727 Plot from Collection Site

28 /afs/slac/u/sf/cottrell/talk/escc/oct9728 Tools in Development Re-engineering SLAC long term reports –exception reportexception report

29 /afs/slac/u/sf/cottrell/talk/escc/oct9729 Exception Reports Color highlights extent of exception Click here to burrow down to more information Last 10 Weeks Ping Data Click to sort by column

30 /afs/slac/u/sf/cottrell/talk/escc/oct9730 Tools in Development Re-engineering SLAC long term reports –exception reportexception report –last 180 dayslast 180 days

31 /afs/slac/u/sf/cottrell/talk/escc/oct9731 180 Days SLAC - Stanford Uwave & Routing problems Direct connect 20 ms 5.5ms Loss < 1% Via ESnet Loss 3-6% 30ms Feb-97Aug-97

32 /afs/slac/u/sf/cottrell/talk/escc/oct9732 Tools in Development Re-engineering SLAC long term reports –exception reportexception report –last 180 dayslast 180 days –monthly points going back for years in tabular form with quality coloring, sorting & hyperlinks Loss (by site, and by group of sites) group of sites) Response ( “ “ )Response Reachability ( “ “ ) % time network “Quiescent” or “Busy”QuiescentBusy

33 /afs/slac/u/sf/cottrell/talk/escc/oct9733 Ping Loss History

34 /afs/slac/u/sf/cottrell/talk/escc/oct9734 TSV Output to Excel for Further Analysis

35 /afs/slac/u/sf/cottrell/talk/escc/oct9735 Ping Response by Group

36 /afs/slac/u/sf/cottrell/talk/escc/oct9736 Prime-time Packet Loss by Group

37 /afs/slac/u/sf/cottrell/talk/escc/oct9737 “Quiescent” Frequency by Group

38 /afs/slac/u/sf/cottrell/talk/escc/oct9738 International Site “Busy” Frequency RL.UK UK - US link upgraded Italian nodes track & look good CERN & IN2P3 track

39 /afs/slac/u/sf/cottrell/talk/escc/oct9739 Tools in Development Re-engineering SLAC long term reports –exception reportexception report –last 180 dayslast 180 days –monthly points going back for years in tabular form with quality coloring, sorting & hyperlinks Loss (by site, and by group of sites) group of sites) Response ( “ “ )Response Reachability ( “ “ ) % time network “Quiescent” or “Busy”QuiescentBusy Ten Worst links in HEP

40 /afs/slac/u/sf/cottrell/talk/escc/oct9740 Ten Worst HEP Links Ranked by % Packets Lost

41 /afs/slac/u/sf/cottrell/talk/escc/oct9741 What are Typical Uses Setting Expectations Service Level Contract Choosing ISPs Identifying problems, and verifying solutions Planning for upgrades

42 /afs/slac/u/sf/cottrell/talk/escc/oct9742 Summary to Help Choose Upgrades

43 /afs/slac/u/sf/cottrell/talk/escc/oct9743 Prime Time Packet Loss Jun-Aug 97

44 /afs/slac/u/sf/cottrell/talk/escc/oct9744 Coordination etc. XIWT/IPWT Interest/deployment

45 /afs/slac/u/sf/cottrell/talk/escc/oct9745 XIWT/IPWT interest Austin meeting in Sep-97 –available tools presented by developers: IWR, CAIDA/NLANR, Intel, Auto Industry/Bellcore, IETF/IPPM Surveyor … XIWT/IPWT want to: – Measure performance of members' own networks –Get tests to validate and understand what to recommend to other commercial customers and for what purposes. –Build a community within XIWT so can evolve it to address harder issues. Selected our tools to initially deploy at 6 sites –includes Intel, SBC, HAI, BellSouth, CNRI, NIST

46 /afs/slac/u/sf/cottrell/talk/escc/oct9746 Coordination etc. XIWT/IPWT Interest/deployment MICS funded joint SLAC/LBL proposal on Internet End-to-end performance monitoring for 1 year LBL/NIMI project

47 /afs/slac/u/sf/cottrell/talk/escc/oct9747 NIMI (1) NIMI=National Internet Measurement Infrastructure, collaboration LBL/PSC (V. Paxson, M Mathis, J. Mahdavi). It is a software suite (not hardware). Deploy on “measurement hosts” around the Internet for black box infrastructure measurements. Ready for deployment Nov-97. Perl daemon with treno, Poisson packet generation for loss & delays. Hooks for other tools such as pathchar, tcpanaly.

48 /afs/slac/u/sf/cottrell/talk/escc/oct9748 NIMI (2) Challenges: accurate clock synchronization (one way measurements), scaling to millions of nimids (nb end-to-end measurement strategies are usually not cost free, some things may be over-measured), data retrieval, new measurement strategies. There is no central management. Both HEPNRC & SLAC plan to install NIMI hosts (PCs running FreeBSD) at their sites

49 /afs/slac/u/sf/cottrell/talk/escc/oct9749 Coordination etc. XIWT/IPWT interest/deployment MICS funded joint SLAC/LBL proposal on Internet End-to-end performance monitoring for 1 year LBL/NIMI project Proposed joint work with NLANR to extend Mapnet Java tools to view our dataMapnet

50 /afs/slac/u/sf/cottrell/talk/escc/oct9750 NLANR Mapnet Tool Java Applet Zoom & pan Select ISPs Color: –ISP –bandwidth Mouse over –link details –node details

51 /afs/slac/u/sf/cottrell/talk/escc/oct9751 Maproute (from NLANR) Shapes show function –router at NAP, at transit backbone, at ISP Color show variance of transit time Meshes of paths to destination show flaps Can zoom into get site information etc.

52 /afs/slac/u/sf/cottrell/talk/escc/oct9752 Coordination etc. XIWT/IPWT interest/deployment MICS funded joint SLAC/LBL proposal on Internet End-to-end performance monitoring for 1 year LBL/NIMI project Proposed joint work with NLANR to extend Mapnet Java tools to view our dataMapnet Will submit paper to IETF for this December Surveyor installation proposed at ESnet sites

53 /afs/slac/u/sf/cottrell/talk/escc/oct9753 Surveyor PC Hardware with GPS located at ANS & 23 CSG partner sites Measure one way loss & response time using clock synchronization, metrics defined by IETF/IPPM 8 sites now operational, monitor 56 paths ((N-1)*N) Results show can have big asymmetries (asymmetric loading & routing) Willing to deploy (at their cost) at 5 DOE sites For more see http://www.advanced.org/csg-ippm/http://www.advanced.org/csg-ippm/

54 /afs/slac/u/sf/cottrell/talk/escc/oct9754 Asymmetric One-way Delays 0% 20% Loss Delay Advanced to U Chicago U Chicago to Advanced 0ms 300ms 0 24

55 /afs/slac/u/sf/cottrell/talk/escc/oct9755 Next Steps Longer term reports (10 week exceptions, 180 days, monthly going back forever) Provide monthly summary tables with lots of statistical measures to allow faster generation of long term reports, and more robust metrics Extend grouping, e.g. by AS, country, time zones crossed, more geographic regions, user selectable, by experiment, by community, by collection site Summaries (c.f. Weather Map, top 10s, weekly, Consumer Reports) NIMI/Surveyor install, NLANR tools, help XIWT

56 /afs/slac/u/sf/cottrell/talk/escc/oct9756 Summary 12 sites, 6 countries collecting data on > 400 links Need care selecting remote sites Deployment of data collection went well Collection sites easy to maintain after initial install Biggest effort at the moment (> 1 FTE) is in: –Tool definition & development –Data gathering archiving (looking after pathologies) Gearing up to extend SAS tools and attendant scripts Lot of interest & collaboration outside ESnet

57 /afs/slac/u/sf/cottrell/talk/escc/oct9757 To Join Collection site needs: –perl5 & HTTP server –install timeping & pingdata (need only cgi-bin access, not root) –Decide on links to monitor –Get an analysis site to retrieve & generate graphs, or at least get connectivity.pl & ping_data_plot.pl Need volunteers to work on analysis scripts, some of it will require SAS, also need Java applets to visualize,

58 /afs/slac/u/sf/cottrell/talk/escc/oct9758 More Information Monitoring WG home page (includes links to the status report, meeting notes, how to access data, and get & install code etc.) –http://www.slac.stanford.edu/xorg/icfa/ntf/home.htmlhttp://www.slac.stanford.edu/xorg/icfa/ntf/home.html WAN Monitoring at SLAC has lots of links –http://www.slac.stanford.edu/comp/net/wan-mon.htmlhttp://www.slac.stanford.edu/comp/net/wan-mon.html Tutorial on WAN Monitoring –http://www.slac.stanford.edu/comp/net/wan- mon/tutorial.htmlhttp://www.slac.stanford.edu/comp/net/wan- mon/tutorial.html


Download ppt "/afs/slac/u/sf/cottrell/talk/escc/oct971 ESnet NMTF/NMFG - Status Les Cottrell, SLAC & Dave Martin, HEPNRCSLAC HEPNRC,"

Similar presentations


Ads by Google