Presentation is loading. Please wait.

Presentation is loading. Please wait.

A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone.

Similar presentations


Presentation on theme: "A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone."— Presentation transcript:

1 A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone (INFN Napoli), Luca Vaccarossa (INFN Milano) CHEP2003 - March 24-28, 2003 - La Jolla, California

2 Summary introduction to DataTAG project monitoring of grid elements first implementation: WorldGRID resources monitoring the evolution: DataTAG WP4 resources monitoring future activities

3 Introduction

4 DataTAG project DataTAG is an EU-funded project will create a large-scale intercontinental Grid testbed that will focus upon advanced networking issues and interoperability between these intercontinental Grid domains. The project will address the issues which arise in the sector of high performance inter-Grid networking, including sustained and reliable high performance data replication, end-to-end advanced network services, and novel monitoring techniques. The project will also directly address the issues which arise in the sector of interoperability between the Grid middleware layers such as information and security services. The advance made will be disseminated into each of the associated Grid projects. detailed information on: http://www.datatag.org

5 DataTAG Work Package 4 task of DataTAG WP4 (Interoperability between Grid domains) is to address issues of middleware interoperability between the European and US Grid domains and to enable a selected set of applications to run on the Transatlantic Grid Testbed mainly activities include: Grid Resource Model for Computing and Storage resources (GLUE schema) Virtual Organisation Membership Service implementation Grid Monitoring Resource Discovery LHC experiment applications integration

6 Monitoring of grid elements (1/2) LOW LEVEL measurements CPU load memory usage disk usage (per partition) network activity number of processes number of users (UI) … Computing ElementStorage Element Worker Node Resource Broker Information Index Replica Manager Replica Catalog […] Grid services checks gatekeeper gsiftp gris gdmp RB/LB … “GRID” measurements number of total CPUs number of free CPUs number of running jobs number of waiting jobs SE free disk space …

7 Monitoring of grid elements (2/2) sources of information LOW LEVEL measurements plugins/sensors installed on each machine SERVICE checks sensors installed on monitoring server GRID measurements sensors installed on monitoring server aggregate information (monitoring server side) per Virtual Organisation per site …

8 first implementation: WorldGRID resources monitoring

9 WorldGRID testbed WorldGRID is a “transatlantic grid” based on the existent European and American Grids with the goal of offering transparent access to the distributed computing infrastructure necessary to the “data-intensive” modern applications The WorldGRID testbed has been successfully demonstrated during the WorldGRID demos at SuperComputing 2002 (Baltimore) and IST 2002 (Copenhagen) where real HEP application jobs were transparently submitted from US and Europe and run where resources were available, independently of their location

10 WorldGRID monitoring  based on Nagios (a host and service monitoring engine) [detailed information on: http://www.nagios.org]  host local plug-ins – collect info from OS - CPU load - RAM - disk - jobs  MDS plug-ins - collect aggregate info from GRIS - number of running/waiting jobs - number of total/free CPUs  history graphs for all monitoring metrics  aggregate info/graphs per Site and Virtual Organisation

11

12

13 the evolution: DataTAG-WP4 implementation for resources monitoring

14 Description GOAL The objective of the task is to develop software for use in the Grid Operations Centres in order to monitor the overall functioning of the grid. The software should enable the grid administrators to quickly identify problems in the operation of the grid and take appropriate action to rectify them People involved Sergio Andreozzi (INFN CNAF) Vincenzo Ciaschini (INFN CNAF) Sergio Fantinel (INFN Padova) Antonia Ghiselli (INFN CNAF) Flavia Donno (CERN-LCG) Gennaro Tortone (INFN Napoli) Cristina Vistoli (INFN CNAF)

15 Requirements (1/2) Features required: scalability very low intrusivity automatic resource discovery fault detection and notification metrics graphs The GOC administrator should be presented with an integrated view of the grid showing the overall functional status of the grid and the various sites with various levels of detail

16 Requirements (2/2) The system should provide a facility of defining alarms on specific conditions as functions of the various parameters monitored. When such alarms are triggered, the administrator should be alerted appropriately The system should poll all the sites and gather the static and dynamic information about the resources in the site. static information covers parameters like number of computing elements, total storage capacities, total memory etc. dynamic information covers parameters like number of running jobs, number of jobs in queue, free memory, free storage space, load average etc. All interfaces should be web based

17 Features provided by current implementation The new Grid Monitoring Tool is based on Grid Information System implemented by LDAP protocol with GLUE schema it provides various monitoring levels: host level (by GLUE schema monitoring extension) fabric level (by DataGRID WP4 monitoring framework) Virtual Organisation level (by automatic resources discovery and checks scheduling) it also provides an historical database in order to generate graphs or reports of some measurements

18 GLUE schema (host level monitoring) Conceptual model of grid resources to be used as a base schema of the GIS (Grid Information Service) for discovery and monitoring purposes model of computing resources (CE) model of storage resources (SE) model of relationships among them (close CE/SE) Implementation status (v. 1.0) (for Globus MDS) LDAP schema (DataTAG WP4.1) information providers (CE/SE) we implemented an extension to include all monitoring metrics (“host level” added to GLUE schema)

19

20 DataGrid-WP4 monitoring framework (fabric level monitoring) It provides a client (Monitoring Sensor Agent - MSA) running sensors (Monitoring Sensors - MS) on each node to monitor, and a central server (Fabric Monitoring Server - fmonServer) to collect data. The server receives samples as they are measured by MSA, and stores them in a flat file / Oracle database The client is provided with a sensor (sensorLinuxProc) which uses /proc file system to measure various basic quantities on Linux (CPU load, network,etc.)

21 EDG-WP4 monitoring framework local farm element computing element

22 Discovery process (Virtual Organisation level monitoring) Through the GIIS, via LDAP, we can obtain the CE/SE available at a specific time. Using a DB we compare the info from the GIIS with previous status of resources availability (an object can be new, disappeared, re-available) Through the GRIS of the CE/SE we can obtain SITE/HOSTS info (we repeat the discovery process at site level to get site resources/info: queues, worker nodes, network adapters, disk partitions, supported transfer protocols, …)

23 Discovery process: base schema Monitoring Server GIIS GRIS GIIS Server Computing Element/ Storage Element 1 2 3 4 SQL 1: LDAP Query 2: available CE/SE 3: LDAP Query 4: CEIDs, WNs, Steps 3,4 repeated for every CE/SE LDAP Monitoring DB

24 GRIS (GLUE schema)EDG-WP4 fmonserver computing element information providers farm monitoring archive run ldif output write read EDG-WP4 monitoring agent worker node /proc filesystem WP4 sensor run read metric output EDG-WP4 monitoring agent worker node /proc filesystem WP4 sensor run read metric output information index GIIS (GLUE schema) monitoring server discovery servicemonitoring service ldap query web interface Central Monitoring Database

25

26 Future activities

27 job monitoring evaluation of OGSA monitoring service(s) evaluation/usage of SOAP interface provided by DataGrid WP4 monitoring framework in order to implement a distributed archive for monitoring metrics


Download ppt "A monitoring tool for a GRID operation center Sergio Andreozzi (INFN CNAF), Sergio Fantinel (INFN Padova), David Rebatto (INFN Milano), Gennaro Tortone."

Similar presentations


Ads by Google