Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.

Slides:



Advertisements
Similar presentations
This course is designed for system managers/administrators to better understand the SAAZ Desktop and Server Management components Students will learn.
Advertisements

Operating-System Structures
CCTracker Presented by Dinesh Sarode Leaf : Bill Tomlin IT/FIO URL
ActiveXperts Network Monitor Monitors servers, workstations and devices for availability Alerts and corrects.
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Radko Zhelev, IPP BAS Generic Resource Framework for Cloud Systems 1 Generic Resource Framework for Cloud Systems.
Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.
NGOP J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Software Distribution in Microsoft System Center Configuration Manager v.Next: Part 1.
Maintaining and Updating Windows Server 2008
1 NGOP Overview Jim Fromm Farms and Clustered Systems Group Computing Division Fermilab.
Installing Windows XP Professional Using Attended Installation Slide 1 of 41Session 2 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
Understanding and Managing WebSphere V5
Linux Operations and Administration
9.1 © 2004 Pearson Education, Inc. Exam Planning, Implementing, and Maintaining a Microsoft Windows Server 2003 Active Directory Infrastructure.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
ViciDocs for BPO Companies Creating Info repositories from documents.
ArcGIS Workflow Manager An Introduction
Linux Operations and Administration
Web Servers Web server software is a product that works with the operating system The server computer can run more than one software product such as .
DONE-10: Adminserver Survival Tips Brian Bowman Product Manager, Data Management Group.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Screen Snapshot Service Kurt Biery SiTracker Monitoring Meeting, 23-Jan-2007.
Instant Messaging for the Workplace A pure collaborative communication tool that does not distract users from their normal activities.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
Instant Messaging for the Workplace A pure collaborative communication tool that does not distract users from their normal activities.
Section 1: Introducing Group Policy What Is Group Policy? Group Policy Scenarios New Group Policy Features Introduced with Windows Server 2008 and Windows.
11 MANAGING AND DISTRIBUTING SOFTWARE BY USING GROUP POLICY Chapter 5.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Computer Emergency Notification System (CENS)
Jan Hatje, DESY CSS ITER March 2009: Technology and Interfaces XFEL The European X-Ray Laser Project X-Ray Free-Electron Laser 1 CSS – Control.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
What’s New in WatchGuard XCS v9.1 Update 1. WatchGuard XCS v9.1 Update 1  Enhancements that improve ease of use New Dashboard items  Mail Summary >
Module 10 Administering and Configuring SharePoint Search.
What’s new in Kentico CMS 5.0 Michal Neuwirth Product Manager Kentico Software.
Optimizer Deployment Centralized Database module on Optimizer hub server Each monitored server has an instance of optimizer installed.
Section 11: Implementing Software Restriction Policies and AppLocker What Is a Software Restriction Policy? Creating a Software Restriction Policy Using.
CN2140 Server II Kemtis Kunanuraksapong MSIS with Distinction MCT, MCITP, MCTS, MCDST, MCP, A+
NGOP Overview J.Fromm K.Genser T.Levshina M.Mengel.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
SMS Software Distribution. Overview  Explaining How SMS Distributes Software  Managing Distribution Points  Configuring Software Distribution and the.
NGOP Prototype Status Report T.Levshina. N ext G eneration O peration GROUP Integrated Systems Development Department Krzysztof.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Rob Davidson, Partner Technology Specialist Microsoft Management Servers: Using management to stay secure.
System Center Lesson 4: Overview of System Center 2012 Components System Center 2012 Private Cloud Components VMM Overview App Controller Overview.
Auditing Project Architecture VERY HIGH LEVEL Tanya Levshina.
SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
Module 14: Advanced Topics and Troubleshooting. Microsoft ® Windows ® Small Business Server (SBS) 2008 Management Console (Advanced Mode) Managing Windows.
VOX Project Tanya Levshina. 05/17/2004 VOX Project2 Presentation overview Introduction VOX Project VOMRS Concepts Roles Registration flow EDG VOMS Open.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTERSTAGE BPM ARCHITECTURE BPMS.
Maintaining and Updating Windows Server 2008 Lesson 8.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
APACHE Apache is generally recognized as the world's most popular Web server (HTTP server). Originally designed for Unix servers, the Apache Web server.
SQL Database Management
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
Overview – SOE PatchTT December 2013.
Chapter 2: System Structures
Chapter 2: System Structures
HC Hyper-V Module GUI Portal VPS Templates Web Console
Technical Capabilities
Chapter 2: Operating-System Structures
Presentation transcript:

Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov

04/18/2002NGOP HEPIX NGOP Working Group Integrated Systems Development E.Berman T.Jones I.Mandrichenko D.Petravick Operating Systems Support T.Dawson L.Giacchetti K.Schumacher S.Timm Computing Services M.Stolz R.Thies R. Thompson

04/18/2002NGOP HEPIX What is NGOP and who is using it? What: –A Distributed Monitoring System that scales to the anticipated requirements for Run II (up to 10,000 nodes during next 5 years) –Facilitates problem diagnostics and provides ways for early error detection –Provides centralized data collection –Executes corrective and notification actions –Offers a framework to create Monitoring Agents for monitoring the overall state of computers and software that are running on them. –Provides means to define the status of the services Who: –System administrators –Software administrators –Help Desk and computer center personnel –Management –Developers (the most curious ones) –End users

04/18/2002NGOP HEPIX NGOP Project Phases (since last HEPIX) 09/ /2001: First production release. Different sets of configuration for operators and system administrators (“roles”). Interfacing Remedy Help Desk System. 12/ /2002: Deployment of Web Admin Tools that allows modification of hosts/clusters “known-status” via Web and schedule /Remedy tickets generation startup /shutdown. Automatic propagation to NGOP monitor "known status" modification. New options addition to agent action. XML configuration language extension “If” and “Else” to describe roles. 03/2002 – : Installation of designated server machine for NGOP Central Services. Web Admin Tools expansion and improvements. URL Agent - agent that is watching the presence of the web page and its content. NGOP Monitor improvements.

04/18/2002NGOP HEPIX Scope of NGOP deployment Production Installation: –Monitoring a total of 705 nodes –~1015 Monitoring Agents: 24 Ping Agents 3 URL Agents 492 OS Health Agents (IRIX, SUN, Linux) 466 Swatch Agents (Linux) ~30 Custom Agents (FBS Agent, Enstore Cron Agent,…) –Number of Monitored Objects ~15,000 –About 10 instances of “NGOP monitor” (GUI) are running simultaneously. Test Installation (CDF Analysis Farm Cluster) –Monitoring a total of 45 nodes

04/18/2002NGOP HEPIX New Features (URL Monitoring Agent) URL Monitoring Agent scans given URL’s for reachability and content Uses Monitoring Agent API Behavior is defined by XML configuration <URLFailRule ActionLocal=" _cdweb" href=" RegExp="Fermilab" /> <URLFailRule ActionLocal=" _cdweb" href=" bin/telephone.script?format=text&name=wolbers&which=last&e xact=&output=name“ RegExp="WOLBERS"/> Can check for particular entry on the web page Performs several retries Verifies that web server is up before generating event and action Runs on central node

04/18/2002NGOP HEPIX New Features (Configuration Language Expansion) Introduced conditions (, ) –simplified handling of various fragments of XML that are relevant for a particular “role” –“role” can be defined in any part of configuration files by using and XML tags Role reflects requirements of a particular group of people: –Cluster administrators (CMS, Farm, Enstore), operators, default Role defines what subset of configuration will be seen by particular user and what rules will be used to define the status of the monitored objects “ cmsadmin” user will see: All other users will see: – Only “cmsadmin” will see CMS R&D and CMS Reference system views:

04/18/2002NGOP HEPIX Automatic propagation of “known status” modification Increase of the speed of events handling “Time Stamp” Indicator (Last update from NCS) Modification of color setting dialog Modification of default monitor display layout New Features (NGOP Monitor)

04/18/2002NGOP HEPIX Web Admin Tool (Known Status) Secure access by authorized users Displays hierarchy of Cluster/ Hosts or Clusters/Systems Allows changing status of any object or host service type Allows scheduling out of service time period (start date, end date/ duration and comments) Provides Search Keeps change log Displays all out of service objects Provides multi-users locking mechanism

04/18/2002NGOP HEPIX Known Status Interface Implementation (Zope Technology) (Zope Technology) PCGI(persistent CGI) - Circumvents launch overhead by using pcgi- wrapper and sending request to PCGI Application via unix socket Zpublisher – web interface for python objects Web Browser Web Server PCGI Wrapper Zpublisher Unix Socket or INET Port User App

04/18/2002NGOP HEPIX Current Architecture Monitor WebAdmin Configuraton File Management Service Configuraton File Management Service Persistent Config.Data Persistent Config.Data Administrator Central Server Archive Service Archive Service Archive Report Generator Plugins MA Plugins MA Swatch MA Swatch MA URL MA URL MA Ping MA Ping MA Custom MA Custom MA Web Admin Service Web Admin Service Remedy Help Desk Tickets Action Client

04/18/2002NGOP HEPIX Why do we need a new GUI? NGOP monitor has some major deficiencies –Large memory requirements –Sometimes CPU intensive –Shows its own sequence of events that depends on the start time, acknowledged and deleted events and alarms –Generates separate window for each level of hierarchy –Plain ugly We have some new requirements –Access via Web –Ability to “clone” monitors: identical view of all events if monitors are started by the same user with the same role

04/18/2002NGOP HEPIX GUI Redesign Central Server WebMonitor Persistent Config.Data Persistent Config.Data Status Engine IF Role = farmadmin API Web Monitor Service API Monitor API Web Monitor Service Status Engine IF Role = farmadmin

04/18/2002NGOP HEPIX GUI New Look (work in progress)

04/18/2002NGOP HEPIX Summary A comprehensive framework was created to fulfill monitoring needs of system administrators, operators and end users. The current version has proven itself in helping to increase the systems uptime and efficiency. The work started to improve NGOP monitor and provide same functionality via Web. NGOP interface to the Fermilab Remedy Help Desk system provides means for possible future complete automation of the notification process. NGOP could be used as a GRID fabric component as a low level site monitoring tool More Information can be found at: URL: