Presentation is loading. Please wait.

Presentation is loading. Please wait.

Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006.

Similar presentations


Presentation on theme: "Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006."— Presentation transcript:

1 Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

2 Connect. Communicate. Collaborate Hades Implementation Status List IPv6 Measurements (Up and running in more than half of the JRA1 locations) Multicast Measurements (Implementation) Alerts –Packet Loss Maps (Implemented, Deployed for X-WiN) –SNMP Traps (Server needs to be set up) –Generic Web Interface (Evaluation) Maintenance –To be integrated into one interface with Alerts

3 Connect. Communicate. Collaborate IPv6 Measurements Running in: –Amsterdam (SURFnet) –Athens (GRNET) –Ljubljana (ARNES) –Paris (RENATER) (currently offline) –Prague (CESNET) –Sofia (ISTF) –Zagreb (CARNET) Owning a JRA1 Hades measurement box as well as an IPv6 capable network but aren‘t on the list? Contact us!

4 Connect. Communicate. Collaborate Hades weather map (GEANT/NRENs, Geographically)

5 Connect. Communicate. Collaborate Hades weather maps (Abstract, domain specific)

6 Connect. Communicate. Collaborate Alerts – Packet Loss Maps One map to show observed packet loss on all Hades monitored links Colour coding on links to show short and long outages Currently still in development, not yet in the european context available Maps for other metrics under consideration, but details about those metrics yet to be determined (see statistical analysis)

7 Connect. Communicate. Collaborate Alerts – SNMP traps Problem with data on measurement archive: age between 0 and 90 minutes To ensure up to date information for alerts, solutions are either: –Increase frequency of data polling (causing management network overhead and load on the measurement point and archive) –Do analysis on the measurement point in real time (CPU load on the measurement point only, but problem of how to deliver decentralized alerts Solution: Decentralized analysis, and SNMP traps for alerting

8 Connect. Communicate. Collaborate Alerts – SNMP traps Multiple potential use cases for traps –Central visualization to subscribe to all alerts in order to create a powerful map and/or alert list with history –NOCs might subscribe for their uplinks/sensitive paths to important locations (typically already running SNMP capable monitoring facilities)

9 Connect. Communicate. Collaborate Alerts – SNMP traps Benefits –Only causes network traffic when necessary –Real time data for analysis available on the measurement point –SNMP MP usable? Drawbacks –SNMP very often filtered into user networks (web visualisation as intermediate server might solve that) –Won’t alert when the reporting path is affected by the network problem itself

10 Connect. Communicate. Collaborate Alerts – Statistics Higher level of statistical analysis for measurement data might help to determine a „connection footprint“ and show changes in it due to routing changes. Possible numbers to play with: –Line inherent delay (minimal delay that catches all, or a high percentile of all measurement packets) –Regular IPDV (blurry zone in a plot, delta between line inherent delay and maximum of 90 percent of the measurements)

11 Connect. Communicate. Collaborate Alerts – Statistics – Key values 11.4 ms minimal delay subtracted: „Network intrinsic delay“ 1 µs gap: timestamp precision Lower boundary: timer precision

12 Connect. Communicate. Collaborate Alerts – Statistics – Pathfinders First packet in every group of 5: ~7 µs longer delay Most probable reason: Receiver process has to be loaded into the CPU cache before processing the first packet

13 Connect. Communicate. Collaborate Alerts – Statistics – Path fingerprint Comparison of paths on different networks (hardware, lines, configuration differs) Both: small OWD, narrow distribution of delay Path 2: longer distribution tail Path 1: reordering!

14 Connect. Communicate. Collaborate Maintenance Most important part of „going operational“ Current status: –Daily checking of which measurement lines are down (up to 24 hours delay) over the web visualization –Scripts run to catch most anomalies (clock status, old data –perfSONAR MAs are monitored externally (ISTF)

15 Connect. Communicate. Collaborate Maintenance Evaluation of Nagios [1] Could serve as a common platform for alert and maintenance visualization Provides a front end for both SNMP and scripted surveillance [1] http://nagios.org/http://nagios.org/

16 Connect. Communicate. Collaborate Maintenance Goals –Highest possible level of automation –Fixing of simple problems either fully automated (i.e. restarting measurements) or via scripts that can be triggered on the web server –Transparency for users

17 Connect. Communicate. Collaborate Questions / Discussion / Want to contact us? Website: http://www.win-labor.dfn.de/http://www.win-labor.dfn.de/ Email: win-labor@dfn.dewin-labor@dfn.de


Download ppt "Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006."

Similar presentations


Ads by Google