NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.

Slides:



Advertisements
Similar presentations
How We Manage SaaS Infrastructure Knowledge Track
Advertisements

Complete Event Log Viewing, Monitoring and Management.
Welcome to Middleware Joseph Amrithraj
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
Complete Event Log Viewing, Monitoring and Management.
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Operating-System Structures
Network+ Guide to Networks, Fourth Edition
Radko Zhelev, IPP BAS Generic Resource Framework for Cloud Systems 1 Generic Resource Framework for Cloud Systems.
Web Server Hardware and Software
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
1 NGOP Overview Jim Fromm Farms and Clustered Systems Group Computing Division Fermilab.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Hands-On Microsoft Windows Server 2008 Chapter 8 Managing Windows Server 2008 Network Services.
UNIT-V The MVC architecture and Struts Framework.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
BMC Software confidential. BMC Performance Manager Will Brown.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Network+ Guide to Networks, Fourth Edition Chapter 1 An Introduction to Networking.
2 Copyright © 2009, Oracle. All rights reserved. Getting Started with Warehouse Builder.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
WP4-install task report WP4 workshop Barcelona project conference 5/03 German Cancio.
An Introduction to IBM Systems Director
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Course Presentation EEL5881, Fall, 2003 Project: Network Reliability Tests Project: Network Reliability Tests Team: Gladiator Team: Gladiator Shuxin Li.
Section 1: Introducing Group Policy What Is Group Policy? Group Policy Scenarios New Group Policy Features Introduced with Windows Server 2008 and Windows.
Fundamentals of Database Chapter 7 Database Technologies.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Overview of MSS System Human Actors Non-Human Actors In-house developed components Third party products.
1 Vulnerability Analysis and Patches Management Using Secure Mobile Agents Presented by: Muhammad Awais Shibli.
Chapter 2: Operating-System Structures. 2.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 2: Operating-System Structures Operating.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
TELE 301 Lecture 10: Scheduled … 1 Overview Last Lecture –Post installation This Lecture –Scheduled tasks and log management Next Lecture –DNS –Readings:
CSCI 530 Lab Intrusion Detection Systems IDS. A collection of techniques and methodologies used to monitor suspicious activities both at the network and.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Computer Emergency Notification System (CENS)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
ALICE, ATLAS, CMS & LHCb joint workshop on
Page 1 © 2001, Epicentric - All Rights Reserved Epicentric Modular Web Services Alan Kropp Web Services Architect WSRP Technical Committee – March 18,
NGOP Overview J.Fromm K.Genser T.Levshina M.Mengel.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
NGOP Prototype Status Report T.Levshina. N ext G eneration O peration GROUP Integrated Systems Development Department Krzysztof.
System Center Lesson 4: Overview of System Center 2012 Components System Center 2012 Private Cloud Components VMM Overview App Controller Overview.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Ceilometer + Gnocchi + Aodh Architecture
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
ECHO A System Monitoring and Management Tool Yitao Duan and Dawey Huang.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
EGEE is a project funded by the European Union under contract IST Installation and configuration of gLite services Robert Harakaly, CERN,
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
SQL Database Management
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
TrueSight Operations Management 11.0 Architecture
Chapter 2: System Structures
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Chapter 2: System Structures
HC Hyper-V Module GUI Portal VPS Templates Web Console
Web Application Development Using PHP
Presentation transcript:

NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov

3/27/2003NGOP CHEP'032 What is NGOP and who is using it? What: –A Distributed Monitoring System that scales to the anticipated requirements for Run II (up to 10,000 nodes during next 5 years) –Provides active monitoring of software and hardware –Provides customizable service-level reporting –Facilitates early error detection and problem prevention –Provides persistent storage of collected data –Executes corrective actions and sending notifications –Offers a framework to create Monitoring Agents for monitoring the overall state of computers and software that are running on them. Who: –System administrators –Software administrators –Help Desk and computer center personnel –Management –Developers (the most curious ones) –End users

3/27/2003NGOP CHEP'033 NGOP Architecture System Status Page System Status Page

3/27/2003NGOP CHEP'034 NGOP Data Flow Local Agent Central Server CPU Load is normal Archive CPU Load is normal Status Engine Operator Status Engine SysAdmin CPU Load is normal Sys Admins Operators Action Server CPU Load is too high Send CPU Load is too high Send CPU Load is too high CPU Load is high

3/27/2003NGOP CHEP'035 NGOP Central Services NGOP Central Server (NCS): –collects messages from multiple monitoring agents –provides clients with requested information –forwards requests to Action Server to perform action –forwards all events to Archive Service Configuration Files Management Service (CFMS) –provides a central repository for all configuration files. –performs configuration sanity check –provides clients with component subscription list –allows dynamic reconfiguration –notifies clients about new configuration Action Server: –gets configuration information from the CFMS –gets action requests from the NCS –verifies user authorization to request the actions –verifies that monitored object associated with an action is not marked as “known bad” –performs actions –notifies the NCS about success/failure of performed actions

3/27/2003NGOP CHEP'036 Configuration Language The NGOP configuration language provides a framework for creating monitoring tools. NGOP configuration language –written in XML –allows to describe monitoring object –allows the creation of hierarchies of monitored objects –describes rules to determine the status of the object –defines when and what kind of actions should be performed –uses expansion mechanism that allows the replication of a particular fragment of an XML document –uses conditions simplified handling of various fragments of XML that are relevant for a particular “role”

3/27/2003NGOP CHEP'037 Monitoring Agents (I) Monitoring Agent (MA) is process that monitors the characteristics of a particular monitored object and report a state to the NCS. MA can monitor multiple objects. MA can perform local actions or request NCS to perform centralized actions. NGOP provides a framework for creation of the MAs: either by using the MA API or the PlugIns Agent. PlugIns Agent: –runs on the local node –allows the monitoring of software or hardware components utilizing existing scripts or executables (plug-ins) –plug-ins should be able to measure and print some quantitative characteristics of the monitored objects. –uses template configuration file

3/27/2003NGOP CHEP'038 Monitoring Agents (II) Ping Agent: –runs on the central node, pinging remote nodes –sends ICMP packets to nodes listed in its configuration file. –performs route discovery and has an ability to distinguish failure to ping the node from the failure to ping the switch, as well as discovery of simultaneous multiple failures. –determines the boot time of a node as well as it’s cpu load if rstatd daemon is running on remote node Swatch Agent –runs on the local node – watches a log file for lines matching a regular expression URL Agent –runs on the central node –scans given URL ’ s for reachability and content

3/27/2003NGOP CHEP'039 Status Engine, Rules and Roles The Status Engine is the component that collects selected information from the NCS and processes it according to the specific rules. Multiple Status Engines can be running simultaneously each configured in such a way that reflects the interests of one particular group of people (role). Rules define the status of the monitored object –A Generic Rule sets the monitored object status based on the event received from the NCS. –A Dependent Rule sets the monitored element status based on the event received from the NCS and the status of each dependent monitored object in some group. Roles define what subset of the configuration will be seen by a particular group of users and what rules will be used to define the status of the monitored objects A full python API is provided allowing users to retrieve information about a particular monitored object. Web and Java Monitors are using API as well.

3/27/2003NGOP CHEP'0310 Snapshots (Web GUI)

3/27/2003NGOP CHEP'0311 Snapshots (Java Gui)

3/27/2003NGOP CHEP'0312 NGOP Archiver Responsible for storing/retrieving messages generated by NGOP. Data stored in Oracle database Cleanup process runs daily – 14 days of data is available. Archive server caches messages from the NGOP Central Server. A separate process (Database Interface) periodically reads cached messages and puts them in Oracle. “Best effort” used to store messages. Some messages may be dropped. Web based interface

3/27/2003NGOP CHEP'0313 Snapshots (Archiver)

3/27/2003NGOP CHEP'0314 WEB Admin tool, Remedy Web Admin Tool can mark any monitored object as known to be out of service, so this object will be excluded from determination of the status of the dependent monitored objects –Schedules maintenance in advance –Provides multiple maintenance intervals –Provides “cron” like maintenance intervals –Shows hierarchy of clusters/nodes, and system/elements –Provides search for particular host/clusters –Provides secure access for authorized users –Keeps change log NGOP is interfacing Remedy Help Desk using Remedy API to generate help desk tickets.

3/27/2003NGOP CHEP'0315 Snapshots (Web Admin Tool)

3/27/2003NGOP CHEP'0316 Scope of deployment Monitoring a total of 1420 nodes Number of Monitored Objects ~ 32,000 Number of agents ~ 2,500 Number of Status Engines 6 Average rate of events per day ~ 3,000 Two dedicated computers: –ngopsrv Central Server CFMS Action Server Ping Agents URL Agents –ngopcli Status Engines Web Admin Tool Web Service

3/27/2003NGOP CHEP'0317 Implementation Details Written primarily in Python (some modules in C, NGOP Monitor in Java) –Compatible with python 2.1 –Java and higher –Python code ( ~18,000 lines), C code (~ 350 lines), Java (~ 3,000 lines) Uses XML (and partially MATHML) for all configuration files. DTD files are provided with distribution. –Central configuration ( ~ 8,000 lines) –Central Agents (URL, Ping) configuration ( ~ 8,000 lines) Uses Oracle Database for event logging Product availability: –Monitoring Agents are available on Linux, Irix, Solaris “PlugIns” Agent was ported to Windows –NGOP Central Services, Web Admin Tool run on Linux –NGOP Web GUI is available via any Web Browser, NGOP Java Monitor runs on Linux, Windows and Sun

3/27/2003NGOP CHEP'0318 Who else is using it and how you can use it too? Working installation (beta – release) in IN2P3 Lyon (P. Olivero) –779 hosts –7 roles –40 Applications –9 Printers queues –42 drives-status NGOP version v2_1 is in Fermi Tools, could be download via anonymous ftp More info: – Documentations Tutorials –

3/27/2003NGOP CHEP'0319 Summary A comprehensive framework was created to fulfill monitoring needs of system administrators, operators and end users. A structured framework was provided to collect events, alarms and actions. NGOP Service has already proven itself in helping to increase the systems uptime and efficiency. NGOP interface to the Fermilab Remedy Help Desk system provides means for possible future complete automation of the notification process. Comprehensive documentation is provided. Creating configuration and rules is quite complicated and time consuming procedure. It requires knowledge of XML and NGOP configuration language. The tools that shield end users from these do not exist.