Presentation is loading. Please wait.

Presentation is loading. Please wait.

Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.

Similar presentations


Presentation on theme: "Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka."— Presentation transcript:

1 Charaka Palansuriya EPCC, The University of Edinburgh charaka@epcc.ed.ac.uk http://www.npm-alarms.org/ An Alarms Service for Federated Networks Charaka Palansuriya, Jeremy Nowell, Florian Scharinger, Kostas Kavoussanakis, Arthur Trew TNC 2009 Málaga 8 June 2009

2 Charaka Palansuriya - An Alarms Service22 Overview Background Challenges of Monitoring Federated Networks Monitoring Federated Networks Why an Alarms Service? Architecture Examples Current Status and Future Work

3 8 June 2009Charaka Palansuriya - An Alarms Service3 Background EPCC: The Supercomputing centre of The University of Edinburgh –Host UK national academic HPC service –Performs technology transfer to industry –Provides both academic and industrial software consultancy –http://www.epcc.ed.ac.uk/http://www.epcc.ed.ac.uk/ EPCC has been working in area of Grid Network Performance Monitoring (NPM) for over 5 years –First within EGEE and EGEE2 projects –Now continue development of related software

4 8 June 2009Charaka Palansuriya - An Alarms Service4 Federated Networks: Composition

5 8 June 2009Charaka Palansuriya - An Alarms Service5 Federated Networks

6 8 June 2009Charaka Palansuriya - An Alarms Service6 Challenges Federated Network Monitoring TypesTools User Groups Data Formats Administrative Domains NOC backboneiperfping netflow RRD SQL Flat file GOC End user project NREN MAN end-to-end perfSONAR

7 8 June 2009Charaka Palansuriya - An Alarms Service7 Federated Network Monitoring Strategy Use existing tools and data –Do not try and force adoption of single tool across large multi- administrative domains –Instead provide framework for accessing distributed data Use standards-based solutions where possible –Allow uniform access to data in different administrative domain –Allow interoperability between grids, projects and networks

8 8 June 2009Charaka Palansuriya - An Alarms Service8 Why an Alarms Service? Monitoring Frameworks such as perfSONAR provides access to historical data –E.g., via Measurement Archives (MAs) However, historical data only useful for diagnosing problems when you already know something is wrong What network operators really needed are… ALARMS

9 8 June 2009Charaka Palansuriya - An Alarms Service9 Requirements A network Alarms Service should –Allow timely detection of problems –Be able to notify users via A web interface Emails etc. –Give “at a glance” view of network status –Allow reconfigurable alarm definition

10 8 June 2009Charaka Palansuriya - An Alarms Service10 Specific Requirements Motivated by the LHCOPN –10 Gb/s private network for moving data generated by the LHC –perfSONAR based monitoring solution deployed and operated by DANTE Need following alarms as minimum –Route Change –Routing Out of Private Network –Router Interface Errors –Router Interface Congestion

11 8 June 2009Charaka Palansuriya - An Alarms Service11 Strategy 1.Query Monitoring Data 2.Analyse Alarm Conditions 3.Notify Status

12 8 June 2009Charaka Palansuriya - An Alarms Service12 Architecture

13 8 June 2009Charaka Palansuriya - An Alarms Service13 Implementation Details Query –NM-WG standard queries to perfSONAR RRD and HADES Measurement Archives Passive Router Data – interface errors, packet drops, utilisation Traceroute Information Analyse –Rules based mechanism to process data against rules defined in configuration files DROOLS library from JBoss Notify Status –Output status in form usable by Nagios Status display, notifications, history –Easily implement more status notifiers

14 8 June 2009Charaka Palansuriya - An Alarms Service14 Examples: Interface Errors Alarm

15 8 June 2009Charaka Palansuriya - An Alarms Service15 Examples

16 8 June 2009Charaka Palansuriya - An Alarms Service16 Examples: Interface Errors Alarm

17 8 June 2009Charaka Palansuriya - An Alarms Service17 Examples: Route Change Alarm

18 8 June 2009Charaka Palansuriya - An Alarms Service18 Current Status Deployed by DANTE to monitor LHCOPN –Monitors those Tier1 sites for which data is available –Plan to add rest of Tier1 sites and the Tier0 as they become available –Gathering feedback from the users Plan to deploy a version for DEISA2 network to allow technical evaluation The service will be further refined and extended based on user feedback Actively looking for other users

19 8 June 2009Charaka Palansuriya - An Alarms Service19 Future Work Implement more alarm conditions Send status information to other consumers, e.g., network weather map Think about data processing –eg “cleaning” of data to remove bad data points –Statistical processing etc

20 8 June 2009Charaka Palansuriya - An Alarms Service20 Summary Monitoring of federated networks is a challenge An Alarms Service is essential for problem discovery The LHCOPN is being monitored using an initial version –and will be developed further to be deployed to monitor the whole network

21 8 June 2009Charaka Palansuriya - An Alarms Service21 Thank you for listening For further details: http://www.npm-alarms.org/http://www.npm-alarms.org/ Email: npm@epcc.ed.ac.uknpm@epcc.ed.ac.uk Acknowledgements –Funding UK Joint Information Systems Committee (JISC) DEISA2 (RI-222919) –Collaboration DANTE DFN WiN-Labor Erlangen LHC-OPN


Download ppt "Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka."

Similar presentations


Ads by Google