Presentation is loading. Please wait.

Presentation is loading. Please wait.

A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.

Similar presentations


Presentation on theme: "A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center."— Presentation transcript:

1 A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center

2 2 GGF8 PGM Workshop Motivation l Computational grids: u Many different types of resources u Services deployed on those resources u Applications executed by users l There will be failures u Failures need to be observed u Observation of failures need to be communicated l A grid must be managed u Failure management u General administration

3 3 GGF8 PGM Workshop Approach l Develop a general framework for observation and control u Observe and control a variety of resources and services u Operate in a distributed environment u Secure u Scalable l Use this framework to monitor and manage grids u Observe computer systems, storage systems, networks u Observe job submission, information, file transfer services u Start, stop, and configure services u Notify administrators of problems l Help develop and be compatible with standards u Global Grid Forum

4 4 GGF8 PGM Workshop Why not use an existing system? l Commercial systems u Many fully-featured tools available u Cost that could be too high for smaller partners u Incompatibility between different tools u Incompatible with grid security and authentication mechanisms l Open source systems u Not as many features u Incompatible with each other u Not compatible with grid security mechanisms l Either u Want a testbed for standardization

5 5 GGF8 PGM Workshop High-Level Architecture Manager Actor Directory Service Observer Events Commands Advertise Search

6 6 GGF8 PGM Workshop Monitoring and Managing a Cluster Cluster Manager Management Host Receive observations Decide if any actions need to be taken Ask for actions Log any problems Host Observer Host N CPU load Disk space Memory use Host Actor Kill process Clean temp disk Host Observer Host 1 CPU load Disk space Memory use Host Actor Kill process Clean temp disk Directory Service

7 7 GGF8 PGM Workshop User-written Higher-level Observer l Sensor u Performs a measurement and reports results l Sensor manager u Manages sensors, subscriptions, and queries l Event Producer u Subscribe u Query u Available events u Event schemas l Service Hosting Environment Sensor Observer Service Hosting Environment Sensor Low-level Key Event Producer Sensor Manager

8 8 GGF8 PGM Workshop Actor l Actuator u Performs an action l Actuator Manager u Handles requests for actions by calling actuators l Actor u Request action (RPC) u Available actions u Action schemas l Service Hosting Environment Actuator Actor Actuator User-written Higher-level Low-level Key Service Hosting Environment Actor Actuator Manager

9 9 GGF8 PGM Workshop Manager l Two external interface components u Event Consumer Client u Actor Client l 2 approaches to higher-level components u User writes management logic u User writes management rules and uses an expert system Expert System Manager Event Consumer Client Actor Client Management Rules Management Logic Manager Event Consumer Client Actor Client User-written Higher-level Low-level Key

10 10 GGF8 PGM Workshop Directory Service l Information about observers and actors u Contact location and protocol u Available events and actions u Who has access l Dictionary u Event and action schemas l Future: Information about event consumers u Archives u Channels l Experimental component

11 11 GGF8 PGM Workshop Security l GSI security l Encrypted communication u SSL/TLS l Authentication u X.509 certificates u Proxy certificates l Authorization u Per-observer and per-actor u Pluggable user-defined authorization module l Module for X.509 subject-based access control lists available u Future: per-sensor and per-actuator

12 12 GGF8 PGM Workshop Basic GUI

13 13 GGF8 PGM Workshop Monitoring and Managing a Cluster Receive observations Decide if any actions need to be taken Ask for actions Log any problems Host N CPU Load Sensor Host Observer Event Producer Disk Space Sensor Memory Sensor Host 1 Management Host Management Logic Cluster Manager Event Consumer ClientDirector Client Directory Service Service Hosting Environment Sensor Manager Kill Process Actuator Host Actor Actor File Deletion Actuator Service Hosting Environment Sensor Manager

14 14 GGF8 PGM Workshop Implementation l Communicates using TCP, UDP, or SSL l XML encoding of messages l C++ version u pthreads u Xerces XML parser u Globus I/O for authenticated and secure communication u Currently runs under IRIX, Solaris, Linux u CLIPS expert system l Java version u Xerces XML parser u Globus Java CoG for authenticated and secure communication u JDK 1.3.x or 1.4.x

15 15 GGF8 PGM Workshop Grid Management System l Things to observe: u Resource status and usage l Computer systems and networks u Grid services l GRAM, MDS l Includes processes, log files, and test queries l Things to control: u Add/remove user mappings in grid-mapfiles u Starting and stopping MDS servers u Add/remove/update CA certificates l Provide a nice GUI to do all this

16 16 GGF8 PGM Workshop Grid Management System Management GUI GRAM Management Agent Directory Service MDS Management Agent Query for events that describe problems 1.Events describing current state 2.Action requests Advertise existence Find managers and archive Event Archive 1.Subscribe 2.Events with problems Experimental Component

17 17 GGF8 PGM Workshop Management Agent l Management agents: u Perform observations u Perform actions u Manage local problems l Not doing any management right now l Handle local problems locally Observer Management Agent Local management Actor Manager Local actions Local and remote observations

18 18 GGF8 PGM Workshop GRAM Management Agent l Observes: u Network latency between GRAM hosts: ping u Available network bandwidth between hosts: IPerf u CPU load: Unix uptime, PBS qstat, LSF bjobs u Available memory: vmstat? u Available disk space: df u The Globus GRAM service: Log files l Performs actions: u Modify Globus grid-mapfile u Start/stop IPerf server u Send email l In the future will manage local problems u Receive local observations u Perform local actions when necessary

19 19 GGF8 PGM Workshop MDS Management Agent l Observes: u Network connectivity between GIS hosts: ping u CPU load: uptime u Available memory: vmstat? u Available disk space: df u The status of the LDAP server l The LDAP server process: ps l If LDAP queries are successful: ldap_search() l Performs actions: u Start and stop LDAP server u Send email l In the future will manage local problems

20 20 GGF8 PGM Workshop Event Archive l Allows events to be archived and searched l An XML database u Currently Xindice u Compatible with our XML-based events l Queried using the Xpath language l Use for all events, just errors, … l Experimental component

21 21 GGF8 PGM Workshop Grid Management GUI

22 22 GGF8 PGM Workshop Grid Management GUI l Similar to many you’ve seen before l Java program l Load on systems u System up or down l Latency and bandwidth of network u Network up or down l XML configuration file defines GUI u Which systems to monitor u Which sensors to use on each system u Where to place information on the screen l More detailed information available as dialogs

23 23 GGF8 PGM Workshop Standardization l Performance Working Group of the Grid Forum u Architecture u Event representations u Directory service schema u Producer-consumer communication protocols l Grid Monitoring Architecture Working Group l DAMED Working Group l Grid Event Service Working Group? u BOF at next GGF, hopefully

24 24 GGF8 PGM Workshop Status and Future Work l Current Status: u Worldwide noncommercial release expected Real Soon Now u Release quality l CODE framework used day-to-day in the IPG u Preliminary grid management system l Our future plans include: u Define and be compatible with Grid Forum standards u Use in the IPG (Need a web interface) u Develop more sensors and actuators u Sensors and actuators as programs as well as classes u More sophisticated event service l event routing network, more subscription models and options u OGSI as hosting environment u Work with IPG (and other) administrators to improve the grid management system u A public release! Open source!


Download ppt "A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center."

Similar presentations


Ads by Google