Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet2 E2E piPEs Project

Similar presentations


Presentation on theme: "Internet2 E2E piPEs Project"— Presentation transcript:

1 Internet2 E2E piPEs Project
Eric L. Boyd 7 December 2018

2 Internet2 E2E piPEs Project: End-to-End Performance Initiative Performance Environment System (E2E piPEs) Approach: Collaborative project combining the best work of many organizations, including DANTE/GEANT, Daresbury, EGEE, GGF NMWG, NLANR/DAST, UCL, Georgia Tech, etc. 12/7/2018

3 Internet2 E2E piPEs Goals
Enable end-users & network operators to: determine E2E performance capabilities locate E2E problems contact the right person to get an E2E problem resolved. Enable remote initiation of partial path performance tests Make partial path performance data publicly available Interoperable with other performance measurement frameworks 12/7/2018

4 Measurement Infrastructure Components
For any given “partial path” segment (dotted black line) … We might run regularly scheduled tests and/or on-demand tests. This means, we require the ability to make a test request, have test results stored in a database, request those test resutls, and retrieve the test results. 12/7/2018

5 Sample piPEs Deployment
Deployment is an inside-out approach. Start with regularly scheduled tests inside, make sure it plays well with regularly scheduled tests outside. Hope that projects working on the end nodes will meet us in the middle. 12/7/2018

6 Project Phases Phase 1: Tool Beacons
BWCTL (Complete), OWAMP (Complete), NDT (Complete), Phase 2: Measurement Domain Support General Measurement Infrastructure (Prototype) Abilene Measurement Infrastructure Deployment (Complete), Phase 3: Federation Support AA (Prototype – optional AES key, policy file, limits file) Discovery (Measurement Nodes, Databases) (Prototype – nearest NDT server, web page) Test Request/Response Schema Support (Prototype – GGF NMWG Schema) 12/7/2018

7 BWCTL (Jeff Boote) http://e2epi.internet2.edu/bwctl
12/7/2018

8 OWAMP (Jeff Boote) http://e2epi.internet2.edu/owamp
12/7/2018

9 NDT (Rich Carlson) Network Diagnostic Tester
Developed at Argonne National Lab Ongoing integration into piPEs framework Redirects from well-known host to “nearest” measurement node Detects common performance problems in the “first mile” (edge to campus DMZ) In deployment on Abilene: 12/7/2018

10 piPEs Deployment 12/7/2018

11 Example piPEs Use Cases
Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 12/7/2018

12 Test from the Edge to the Middle
Divide and conquer: Partial Path Analysis Install OWAMP and / or BWCTL Begin testing!: Key Required No Key Required 12/7/2018

13 Example piPEs Use Cases
Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 12/7/2018

14 Abilene Measurement Domain
Part of the Abilene Observatory: Regularly scheduled OWAMP (1-way latency) and BWCTL/Iperf (Throughput, Loss, Jitter) Tests Web pages displaying: Latest results “Weathermap” Worst 10 Performing Links Data available via web service: The E2E team is building the piPEs measurement framework. Internet2 has deployed an instance of that framework, the Abilene Measurement Domain (AMD). AMD is part of the Abilene Observatory. Currently, the AMD consists of regularly scheduled OWAMP and BWCTL tests, plus the ability of a user “on the edge” to test “to the middle” (a crude divide-and-conquer approach to diagnosis E2E problems). Network Monitoring is live (a prototype that will eventually be released) that allows simple analysis of network monitoring data across the backbone. In addition, we’ve made that data available via a web service (conforming to the schemata of the GGF NMWG). Other tools, such as NLANR’s Advisor and the HENP community’s MonALISA tool can now consume that data. 12/7/2018

15 Quality Control of Abilene Measurement Infrastructure (1)
Problem Solving Approach Ongoing measurements start detecting a problem Ad-hoc measurements used for problem diagnosis On-going Measurements Expect Gbps flows on Abilene Stock TCP stack (albeit tuned) Very sensitive to loss “Canary in a coal mine” Web100 just deployed for additional reporting Skeptical eye Apparent problem could reflect interface contention 12/7/2018

16 Quality Control of Abilene Measurement Infrastructure (2)
Regularly Scheduled Tests Track TCP and UDP Flows (BWCTL/Iperf) Track One-way Delays (OWAMP) IPv4 and IPv6 Observe: Worst 10 TCP flows First percentile TCP flow Fiftieth percentile TCP flow What percentile breaks 900 Mbps threshold General Conclusions: On Abilene, IPv4 and IPv6 statistically indistinguishable Consistently low values to one host or across one path indicates a problem 12/7/2018

17 A (Good) Day in the Life of Abilene
12/7/2018

18 First two weeks in March 50th percentile right at 980 Mb/s
1st percentile about 900 Mb/s Take it as a baseline. 12/7/2018

19 Beware the Ides of March 1st percentile down to 522 Mb/s
Circuit problems along west coast. nb: 50th percentile very robust. 12/7/2018

20 Recovery – sort of; life through 29 April
1st percentile back up to mid-800s, lower and shakier. nb: 50th percentile still very robust. 12/7/2018

21 Ah, sudden improvement through 5-May
1st percentile back up above 900 Mb/s and more stable. But why?? 12/7/2018

22 Then, while Matt Z is tearing up the tracks
1st percentile back down to the 500s. Diagnosis: something is killing Seattle. Oh, and Sunnyvale is off the air. 12/7/2018

23 1st percentile right at 500 Mb/s. Diagnosis: web100 interaction.
Matt fixes Sunnyvale, and things get (slightly) worse: both Seattle and Sunnyvale are bad. 1st percentile right at 500 Mb/s. Diagnosis: web100 interaction. 12/7/2018

24 Matt fixes the web100 interaction.
1st percentile cruising through 700 Mb/s. Life is good. 12/7/2018

25 Friday the (almost) 13th; JUNOS upgrade induces packet loss for
about four hours along many links. 1st percentile falls to 63 Mb/s. Long-distance paths chiefly impacted. 12/7/2018

26 A “Known” Problem Mid-May: routers all got a new software load to enable a new feature Everything seemed to come up, but on some links, utilization did not rebound Worst-10 reflected very low performance across those links QoS parameter configuration format change… 12/7/2018

27 12/7/2018

28 1st percentile rises to 968 Mb/s. But why??
Nice weekend. 1st percentile rises to 968 Mb/s. But why?? 12/7/2018

29 12/7/2018

30 We Found It First Streams over SNVA-LOSA link all showed problems
NOC responded: Found errors on SNVA-LOSA link (NOC is now tracking errors more closely…) 12/7/2018

31 Example piPEs Use Cases
Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 12/7/2018

32 Example Application Community: VLBI (1)
Very-Long-Baseline Interferometry (VLBI) is a high-resolution imaging technique used in radio astronomy. VLBI techniques involve using multiple radio telescopes simultaneously in an array to record data, which is then stored on magnetic tape and shipped to a central processing site for analysis. Goal: Using high-bandwidth networks, electronic transmission of VLBI data (known as “e-VLBI”). 12/7/2018

33 Example Application Community: VLBI (2)
Haystack <-> Onsala Abilene, Eurolink, GEANT, NorduNet, SUNET User: David Lapsley, Alan Whitney Constraints Lack of administrative access (needed for Iperf) Heavily scheduled, limited windows for testing Problem Insufficient performance Partial Path Analysis with BWCTL/Iperf Isolated packet loss to local congestion in Haystack area Upgraded bottleneck link 12/7/2018

34 Example Application Community: VLBI (3)
Result First demonstration of real-time, simultaneous correlation of data from two antennas (32 Mbps, work continues) Future Optimize time-of-day for non-real-time data transfers Deploy BWCTL at 3 more sites beyond Haystack, Onsala, and Kashima 12/7/2018

35 Example Application Community: ESnet / Abilene (1)
3+3 Group US Govt. Labs: LBL, FNAL, BNL Universities: NC State, OSU, SDSC Observed: 400 usec 1-way Latency Jump Noticed by Joe Metzger Detected: Circuit connecting router in the CentaurLab to the NCNI edge router moved to a different path on metro DWDM system 60 km optical distance increase Confirmed by John Moore 12/7/2018

36 Example Application Community: ESnet / Abilene (2)
12/7/2018

37 Example piPEs Use Cases
Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 12/7/2018

38 American / European Collaboration Goals
Awareness of ongoing Measurement Framework Efforts / Sharing of Ideas (Good / Not Sufficient) Interoperable Measurement Frameworks (Minimum) Common means of data extraction Partial path analysis possible along transatlantic paths Open Source Shared Development (Possibility, In Whole or In Part) End-to-end partial path analysis for transatlantic research communities VLBI: Haystack, Mass.  Onsala, Sweden HENP: Caltech, Calif.  CERN, Switzerland 12/7/2018

39 American / European Collaboration Achievements
UCL E2E Monitoring Workshop 2003 Transatlantic Performance Monitoring Workshop 2004 Caltech <-> CERN Demo Haystack, USA <-> Onsala, Sweden piPEs Software Evaluation (In Progress) Architecture Reconciliation (In Progress) 12/7/2018

40 How Can you Participate?
Set up BWCTL, OWAMP, NDT Beacons Set up a measurement domain Now: Place tool beacons “intelligently” Determine locations Determine policy Determine limits “Register” beacons Future: Install piPEs software Run regularly scheduled tests Store performance data Make performance data available via web service Make visualization CGIs available Solve Problems / Alert us to Case Studies 12/7/2018

41 12/7/2018

42 Extra Slides 12/7/2018

43 American/European Demonstration Goals
Demonstrate ability to do partial path analysis between “Caltech” (Los Angeles Abilene router) and CERN. Demonstrate ability to do partial path analysis involving nodes in the GEANT network. Compare and contrast measurement of a “lightpath” versus a normal IP path. Demonstrate interoperability of piPEs and analysis tools such as Advisor and MonALISA 12/7/2018

44 Demonstration Details
Path 1: Default route between LA and CERN is across Abilene to Chicago, then across Datatag circuit to CERN Path 2: Announced addresses so that route between LA and CERN traverses GEANT via London node Path 3: “Lightpath” (discussed earlier by Rick Summerhill) Each measurement “node” consists of a BWCTL box and an OWAMP box “next to” the router. 12/7/2018

45 All Roads Lead to Geneva
12/7/2018

46 Results BWCTL: OWAMP: MONALISA NLANR Advisor 12/7/2018

47 Insights (1) Even with shared source and a single team of developer-installers, inter-administrative domain coordination is difficult. Struggled with basics of multiple paths. IP addresses, host configuration, software (support source addresses, etc.) Struggled with cross-domain administrative coordination issues. AA (accounts), routes, port filters, MTUs, etc. Struggled with performance tuning measurement nodes. host tuning, asymmetric routing, MTUs We had log-in access … still struggled with IP addresses, accounts, port filters, host tuning, host configuration (using the proper paths), software. Port filters, MTUs. 12/7/2018

48 Insights (2) Connectivity takes a large amount of coordination and effort; performance takes even more of the same. Current measurement approaches have limited visibility into “lightpaths.” Having hosts participate in the measurement is one possible solution. 12/7/2018

49 Insights (3) Consider interaction with security; lack of end-to-end transparency is problematic. Security filters are set up based on expected traffic patterns Measurement nodes create new traffic Lightpaths bypass expected ingress points 12/7/2018


Download ppt "Internet2 E2E piPEs Project"

Similar presentations


Ads by Google