Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006.

Slides:



Advertisements
Similar presentations
NAGIOS AND CACTI NETWORK MANAGEMENT AND MONITORING SYSTEMS.
Advertisements

Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group
1 The ns-2 Network Simulator H Plan: –Discuss discrete-event network simulation –Discuss ns-2 simulator in particular –Demonstration and examples: u Download,
Multi-Mode Survey Management An Approach to Addressing its Challenges
HP Quality Center Overview.
Software Delivery. Software Delivery Management  Managing Requirements and Changes  Managing Resources  Managing Configuration  Managing Defects 
CS526 - Advanced Internet & Web Systems May 11, 2009 Nathan Archer.
Connect. Communicate. Collaborate Towards Multi-domain Monitoring for the Research Networks Nicolas Simar, Dante TNC 2005, Poznan, June 2005.
Connect. Communicate. Collaborate WI5 – tools implementation Stephan Kraft October 2007, Sevilla.
Connect. Communicate. Collaborate GÉANT2 JRA1 & perfSONAR Loukik Kudarimoti, DANTE 28 th May, 2006 RNP Workshop, Curitiba.
Connect. Communicate. Collaborate Introduction to perfSONAR Loukik Kudarimoti, DANTE 27 th September, 2006 SEEREN2 Summer School, Heraklion.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Monitoring System Monitors Basics Monitor Types Alarms Actions RRD Charts Reports.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
VTS INNOVATOR SERIES Real Problems, Real solutions.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Computer Measurement Group, India Reliable and Scalable Data Streaming in Multi-Hop Architecture Sudhir Sangra, BMC Software Lalit.
Barracuda Load Balancer Server Availability and Scalability.
1 ESnet Network Measurements ESCC Feb Joe Metzger
Connect. Communicate. Collaborate perfSONAR and Wavelengths Monitoring LHC meeting, Cambridge, 16 of June 2006 Matthias Hamm - DFN Nicolas Simar - DANTE.
An Introduction to IBM Systems Director
Inventory:OCSNG + GLPI Monitoring: Zenoss 3
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
Application Redundancy Tool A.R.T. CS 495 Fall 2005 Kristi Olson.
Mobile Networking Challenges1 5.6 Mobile Ad Hoc Networks  Ad hoc network does not have any preexisting centralized server nodes to perform packet routing,
Connect communicate collaborate perfSONAR MDM updates: New interface, new possibilities Domenico Vicinanza perfSONAR MDM Product Manager
Connect. Communicate. Collaborate 1 ICISP, Cap Esterel (France), August 26-28, 2006 Complementary Visualization of perfSONAR Performance Measurements Andreas.
Real Time Monitors, Inc. Switch Expert™. 2 Switch Expert™ Overview Switch Expert ™ (SE) currently deployed at 80% percent of the INSIGHT-100.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
Connect communicate collaborate perfSONAR MDM updates: New interface, new weathermap, towards a complete interoperability Domenico Vicinanza perfSONAR.
Computer Emergency Notification System (CENS)
UNIT 6 SEMINAR Unit 6 Chapter 7 and 8, plus Lab 12 Course Name – IT482 Network Design Instructor – David Roberts – Office Hours:
1 Measuring Circuit Based Networks Joint Techs Feb Joe Metzger
Connect. Communicate. Collaborate Implementing Multi-Domain Monitoring Services for European Research Networks Szymon Trocha, PSNC A. Hanemann, L. Kudarimoti,
Connect. Communicate. Collaborate Place your organisation logo in this area The PERT – Evolution from a Centralised to a Federated Organization Toby Rodwell.
Performance monitoring -- Nicolas Simar Performance monitoring TF-NGN meeting, Rome (Italy), Nicolas Simar, Network.
Connect. Communicate. Collaborate BANDWIDTH-ON-DEMAND SYSTEM CASE-STUDY BASED ON GN2 PROJECT EXPERIENCES Radosław Krzywania (speaker) PSNC Mauro Campanella.
Page 1 Unclassified _NB_Next Steps.ppt Phillip E. Paulsen Space Communications Office NASA Glenn Research Center (GRC) Cleveland, Ohio 6 November.
Connect communicate collaborate Intercontinental Multi-Domain Monitoring for the LHC Community Domenico Vicinanza perfSONAR MDM Product Manager DANTE –
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
System Manager February 11, What is System Manager System Manager unites Inter-Tel’s diverse product line into a family of products that can be.
PerfSONAR-PS Functionality February 11 th 2010, APAN 29 – perfSONAR Workshop Jeff Boote, Assistant Director R&D.
Module 4: Design IIS Maintenance and UDDI. Designing Internet Information Services Backup and Recovery Specifying Monitoring requirements Deploying UDDI.
Internet2 End-to-End Performance Initiative Eric L. Boyd Director of Performance Architecture and Technologies Internet2.
Connect. Communicate. Collaborate GEANT2 Monitoring Services Emma Apted, DANTE Operations EGEE III, Budapest, 3 rd October 2007.
Connect communicate collaborate perfSONAR MDM for LHCOPN/LHCONE: partnership, collaboration, interoperability, openness Domenico Vicinanza perfSONAR MDM.
Connect communicate collaborate LHCONE Diagnostic & Monitoring Infrastructure Richard Hughes-Jones DANTE Delivery of Advanced Network Technology to Europe.
Nagios Fusion 2012 Mike Guthrie Twitter: mguthrie88 Projects:
March 2004 At A Glance ITPS is a flexible and complete trending and plotting solution which provides user access to an entire mission full-resolution spacecraft.
Connect. Communicate. Collaborate mcview – A tool for visualising and debugging multicast Stig Venaas, UNINETT TNC 2008, Bruges, May 21 st.
Connect. Communicate. Collaborate JRA1 Status Update Stephan Kraft, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006.
Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.
Connect communicate collaborate perfSONAR MDM News Domenico Vicinanza DANTE (UK)
Automated provisioning of Ethernet OAM in CarrierEthernet networks: the case of GRNET Leonidas Poulopoulos Michalis Mamalis Stauros.
Difference between External and Internal Server Monitoring.
Connect communicate collaborate perfSONAR MDM updates for EGI Domenico Vicinanza perfSONAR MDM Product Manager EGI User Forum,
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Selecting the Right CRM System at AVEBE Refik Kocak CRM Partners (on behalf of AVEBE) CUSTOMER.
PART1 Data collection methodology and NM paradigms 1.
1 Experiences Deploying and Operating a Large-Scale Monitoring Infrastructure Network Monitoring Workshop October 21, 2009 Arne Øslebø
Internet2 End-to-End Performance Initiative
Networking for the Future of Science
Robert Szuman – Poznań Supercomputing and Networking Center, Poland
Monitoring Appliance Status
Network Monitoring and Troubleshooting with perfSONAR MDM
Deployment & Advanced Regular Testing Strategies
ESnet Network Measurements ESCC Feb Joe Metzger
A tool for locating QoS failures on an Internet path
Presentation transcript:

Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. Collaborate Hades Implementation Status List IPv6 Measurements (Up and running in more than half of the JRA1 locations) Multicast Measurements (Implementation) Alerts –Packet Loss Maps (Implemented, Deployed for X-WiN) –SNMP Traps (Server needs to be set up) –Generic Web Interface (Evaluation) Maintenance –To be integrated into one interface with Alerts

Connect. Communicate. Collaborate IPv6 Measurements Running in: –Amsterdam (SURFnet) –Athens (GRNET) –Ljubljana (ARNES) –Paris (RENATER) (currently offline) –Prague (CESNET) –Sofia (ISTF) –Zagreb (CARNET) Owning a JRA1 Hades measurement box as well as an IPv6 capable network but aren‘t on the list? Contact us!

Connect. Communicate. Collaborate Hades weather map (GEANT/NRENs, Geographically)

Connect. Communicate. Collaborate Hades weather maps (Abstract, domain specific)

Connect. Communicate. Collaborate Alerts – Packet Loss Maps One map to show observed packet loss on all Hades monitored links Colour coding on links to show short and long outages Currently still in development, not yet in the european context available Maps for other metrics under consideration, but details about those metrics yet to be determined (see statistical analysis)

Connect. Communicate. Collaborate Alerts – SNMP traps Problem with data on measurement archive: age between 0 and 90 minutes To ensure up to date information for alerts, solutions are either: –Increase frequency of data polling (causing management network overhead and load on the measurement point and archive) –Do analysis on the measurement point in real time (CPU load on the measurement point only, but problem of how to deliver decentralized alerts Solution: Decentralized analysis, and SNMP traps for alerting

Connect. Communicate. Collaborate Alerts – SNMP traps Multiple potential use cases for traps –Central visualization to subscribe to all alerts in order to create a powerful map and/or alert list with history –NOCs might subscribe for their uplinks/sensitive paths to important locations (typically already running SNMP capable monitoring facilities)

Connect. Communicate. Collaborate Alerts – SNMP traps Benefits –Only causes network traffic when necessary –Real time data for analysis available on the measurement point –SNMP MP usable? Drawbacks –SNMP very often filtered into user networks (web visualisation as intermediate server might solve that) –Won’t alert when the reporting path is affected by the network problem itself

Connect. Communicate. Collaborate Alerts – Statistics Higher level of statistical analysis for measurement data might help to determine a „connection footprint“ and show changes in it due to routing changes. Possible numbers to play with: –Line inherent delay (minimal delay that catches all, or a high percentile of all measurement packets) –Regular IPDV (blurry zone in a plot, delta between line inherent delay and maximum of 90 percent of the measurements)

Connect. Communicate. Collaborate Alerts – Statistics – Key values 11.4 ms minimal delay subtracted: „Network intrinsic delay“ 1 µs gap: timestamp precision Lower boundary: timer precision

Connect. Communicate. Collaborate Alerts – Statistics – Pathfinders First packet in every group of 5: ~7 µs longer delay Most probable reason: Receiver process has to be loaded into the CPU cache before processing the first packet

Connect. Communicate. Collaborate Alerts – Statistics – Path fingerprint Comparison of paths on different networks (hardware, lines, configuration differs) Both: small OWD, narrow distribution of delay Path 2: longer distribution tail Path 1: reordering!

Connect. Communicate. Collaborate Maintenance Most important part of „going operational“ Current status: –Daily checking of which measurement lines are down (up to 24 hours delay) over the web visualization –Scripts run to catch most anomalies (clock status, old data –perfSONAR MAs are monitored externally (ISTF)

Connect. Communicate. Collaborate Maintenance Evaluation of Nagios [1] Could serve as a common platform for alert and maintenance visualization Provides a front end for both SNMP and scripted surveillance [1]

Connect. Communicate. Collaborate Maintenance Goals –Highest possible level of automation –Fixing of simple problems either fully automated (i.e. restarting measurements) or via scripts that can be triggered on the web server –Transparency for users

Connect. Communicate. Collaborate Questions / Discussion / Want to contact us? Website: