Grid Discovery and Monitoring Systems Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project.

Slides:



Advertisements
Similar presentations
MDS4 Roadmap Items Laura Pearlman USC Information Sciences Institute.
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
Domain Name System. DNS is a client/server protocol which provides Name to IP Address Resolution.
Radko Zhelev, IPP BAS Generic Resource Framework for Cloud Systems 1 Generic Resource Framework for Cloud Systems.
7-2.1 Additional Features of WSRF/GT4 Services A brief outline © 2011 B. Wilkinson/Clayton Ferner. Fall 2011 Grid computing course. Modification date:
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Extensible Scalable Monitoring for Clusters of Computers Eric Anderson U.C. Berkeley Summer 1997 NOW Retreat.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Building services in WSRF Ben Clifford Draft For GGF summer school, July 2004.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Computer Science 101 Web Access to Databases Overview of Web Access to Databases.
Client/Server Architecture
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
TeraGrid Information Services John-Paul “JP” Navarro TeraGrid Grid Infrastructure Group “GIG” Area Co-Director for Software Integration and Information.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
The EU DataGrid – Information and Monitoring Services The European DataGrid Project Team
Overview of the NorduGrid Information System Balázs Kónya 3 rd NorduGrid Workshop 23 May, 2002, Helsinki.
CH2 System models.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Information System (IS) Valeria Ardizzone.
A. Cavalli - F. Semeria INFN Experience With Globus GIS 1 A. Cavalli - F. Semeria INFN First INFN Grid Workshop Catania, 9-11 April 2001 INFN Experience.
Distributed Computing COEN 317 DC2: Naming, part 1.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
1 Overview of the Application Hosting Environment Stefan Zasada University College London.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Oracle 10g Database Administrator: Implementation and Administration Chapter 2 Tools and Architecture.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Grid Workload Management Massimo Sgaravatto INFN Padova.
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
1 Web Servers (Chapter 21 – Pages( ) Outline 21.1 Introduction 21.2 HTTP Request Types 21.3 System Architecture.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Service Discovery Protocols Mobile Computing - CNT Dr. Sumi Helal Professor Computer & Information Science & Engineering Department University.
Jini Architecture Introduction System Overview An Example.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Lundi 7 décembre 2015 Lavoisier. Motivations data sources provided by many partners –heterogeneity of used technologies objectives –reduce complexity.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Introduction to Active Directory
© 2004 IBM Corporation WS-ResourceFramework Service Groups Tom Maguire.
GIIS Implementation and Requirements F. Semeria INFN European Datagrid Conference Amsterdam, 7 March 2001.
April 4, 2002Atlas Testbed Workshop ATLAS Hierarchical MDS Server Patrick McGuigan.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
WP2: Data Management Gavin McCance University of Glasgow.
Database System Concepts and Architecture
Open Source distributed document DB for an enterprise
CHAPTER 3 Architectures for Distributed Systems
gLite Information System
#01 Client/Server Computing
The Globus Toolkit™: Information Services
Lecture 1: Multi-tier Architecture Overview
EGEE Middleware: gLite Information Systems (IS)
Information System (BDII)
Information Services Claudio Cherubino INFN Catania Bologna
#01 Client/Server Computing
Presentation transcript:

Grid Discovery and Monitoring Systems Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team

Outline l Overview of information systems l Some real implementations u Globus MDS2 / BDII u Globus MDS4 u Inca u GMA / R-GMA

Discovery and Monitoring l Discovery: finding resources that exist, at any moment, possibly meeting some criteria u E.g., “find linux boxes with Java 1.5 installed” l Monitoring: determining the state of one or more resources u E.g., “how much memory is free on machine X”? l “Monitoring” and “Discovery” information sometimes overlap u “find me machines with 2G memory” vs. “how much memory does Machine X have”

Examples of Useful Information l Characteristics of a compute resource u Software available, networks connected to, load, type of CPU, disk space l Characteristics of a network u Bandwidth and latency, protocols l Information about a service u Contact info, version number, etc.

Who uses this information? l Individual users, trying to pick the ‘best’ resource l Brokers or workflow systems trying to find suitable resources l VO administrators who want to know the state of every resource. u System administrators may use this information, but probably also have local site monitoring systems in place

What Interfaces are Needed? l Graphic and command-line interfaces for individual users and administrators l Programmatic interfaces for brokers, workflow systems, etc. l Asynchronous notifications for administrators u “send me mail when we’re almost out of disk space”

Monitoring/Discovery Problems in Grids l Dynamic in nature u VOs come and go u Resources join and leave VOs u Resources change status and fail l Geographically distributed users l Geographically distributed resources l Heterogeneous implementations

Grid Information: Facts of Life l Information is always old l Distributed state hard to obtain l Components will fail u We must deal with this gracefully l Scalability and overhead l Many different usage scenarios

Resource Discovery/Monitoring l Distributed users and resources l Variable resource status l Variable grouping R R R R R R ? ? R R R R RR R RR ? ? R R R dispersed users VO-AVO-B network R R

Resource Discovery/Monitoring l Some resources have failed l A network partition has occurred l Still, some work can get done… R R R R R R ? ? R R R R RR R RR ? ? R R R R R dispersed users VO-AVO-B network

Scalability l Large numbers u Many resources u Many users l Independence u Resources shouldn’t affect one another u VOs shouldn’t affect one another l Graceful degradation of service u “As much function as possible” u Tolerate partitions, prune failures

Failure Scenarios l User is disconnected l Resource fails or is disconnected l Discovery service fails or is disconnected l Network partition

When a user is disconnected l This should not adversely affect other users l Some state (such as the user’s subscriptions) may need to be cleaned up. l Some systems use soft-state to deal with this issue: u Subscriptions are valid for a limited time and must be periodically refreshed u If the user does not come back in time to refresh the subscription, it will be removed automatically.

When a resource disappears l Monitoring services should indicate that the resource is no longer there l Discovery services should stop advertising the resource l Neither of these can be gauranteed to happen instantaneously.

When a discovery service dies l Users cannot discover new resources. l They may have old information cached – this data is still useful, although it degrates in quality/usefulness. l Users can contact the resources directly and determine their status. l Some implementations allow for mirroring of discovery services.

When the network is partitioned l This could be seen as a generalization of some the previous scenarios – all of the previous scenarios can be modelled as appropriate network partitions. l If there is a discovery service in a user’s partition, the user should be able to discover resources in that partition.

Information Systems l We sometimes refer to Discovery and Monitoring as “Information Systems” u This is misleading, as we’re not including general-purpose database systems l Discovery and Monitoring information is: u Often stale as soon as it’s reported u Sometimes inconsistent u Often updated by running probes, either on-demand or periodically

Discovery Services l Used to locate monitoring services with information about resources. l May cache some resource data u May even cache enough resource data to act as a monitoring system. l Generally involve a database-like query interface u Languages like ldap, xpath, sql l Usually a relatively small number (maybe even just one, or one with a mirror) are deployed in a VO.

Two Models for Discovery Services Discovery Service Monitoring Service Monitoring & Discovery Service Monitoring Service Monitoring Service Monitoring Service Monitoring Service Monitoring Service

Monitoring Services l Used to monitor the state of a resource l Service interface usually involves db-like queries u With languages like ldap, xpath, sql u Often also provides for asynchronous notification l Typically also includes a back-end provider interface u Allows locally-written scripts, programs, etc. to collect information for the monitoring service l Typically deployed on each host that houses a resource.

How Different Implementations Differ l Overall architecture u Are monitoring and discovery separate? l Wire protocol u LDAP, Web Services, custom l Query Language u LDAP, Xpath, SQL l Caching Strategies l Schemas u Really more a deployment issue

MDS2 / BDII history l MDS2 was developed as part of the Globus Toolkit u It’s now superseded by MDS4, which has a different architecture. l BDII is a reimplementation of MDS2 by EGEE, and is still in use.

MDS2 Architecture Overview l The Grid Resource Information Service (GRIS) collects information about a local resource and responds to requests for that information u Uses pluggable information providers l The Grid Index Information Service (GIIS) aggregates information from various GRIS servers l Users may query the GIIS for aggregated information or query the GRIS servers directly. l GIIS servers may be arranged hierarchically.

MDS2 Architecture GRIS IP GRIS IP GRIS IP GIIS

MDS2 GIIS l Grid Index Information Service (GIIS) servers aggregate information from GRIS servers and other GIIS servers. u These other servers register themselves to the GIIS server. u Registrations must be periodically refreshed l GIIS servers cache information (results from previous queries). l If a GIIS server receives a query for which there is no fresh cached information, it forwards the query to its registered servers.

MDS2 GRIS l A Grid Resource Information Server (GRIS): u Runs on each host that has resources to be monitored. u Accepts requests for information about local resources l May come from users or GIIS servers u Runs a local “information provider” to collect and format the information l Unless the requested information is cached and relatively fresh u Caches the information and replies to the request

MDS2 Query Language l Both the GIIS and GRIS servers use LDAP as the service protocol and query language.

LDAP Basics l Hierarchical data model l Each entry has a distinguished name and a set of attribute/value pairs l Distinguished name u Is a collection of name-value pairs u Must be unique u Determines the entry’s place in the hierarchy l Each entry’s DN must include its parent’s DN l Queries u Can search on attributes or DNs u Results can include children (or not) or include only certain attributes.

MDS4 Overview l MDS4 is a redesign of MDS l The MDS4 Index Service acts as both a monitoring and discovery service. u Uses WSRF standard resource property queries as its query interface. l A second monitoring service, the MDS4 Trigger Service, examines aggregated information and takes action when certain conditions are met. u E.g., “send when a remote system appears to be down”. l MDS4 uses WSRF standards for its query and registration interfaces.

WS-Resource Review l A WS-Resource is a Web Service that exposes internal state as Resource Properties u An XML element of arbitrary complexity l Each WS-Resource has a Resource Property Document u An XML document that includes all its Resource Properties l Example: The WS-GRAM service advertises information about its associated queues and clusters as a resource property.

Retrieving Resource Properties l GetResourceProperty u Gets a single named resource property l GetMultipleResourceProperties u Gets a set of named resource properties l QueryResourceProperty u Returns the results of a query against a resource’s resource property set l Subscription/notification u Clients subscribe and get periodic or occasional notifications

What this means… l Standard requests can be used to get state information from any WS-Resource. l This means that every WS-Resource is also a monitoring service! u But not necessarily monitoring anything (i.e., providing any interesting state) l We sometimes want information from sources other than WS Resources u Non-WSRF services u General system information u Catalogues of installed software

Service Groups Review l A service group is a service that represents a group of other services or resources l Service groups contain Service Group Entries (SGEs), which consist of: u The address of the SGE itself, u The address of the Service Group that the SGE belongs to, and u A Content element consisting of arbitrarily-formatted data l SGEs are created via the Service Group Add request

The MDS4 Index Service l Acts as a Discovery Service u Gathers information from other WS- Resources u Including other Index Servers l Acts as a Monitoring Service u Caches all the information it gathers u Also has a pluggable interface for Information Providers l Programs or Java classes that gather information

An MDS4 Index Deployment Index GRAMRFT Index GRAMRFT Index IP

The MDS4 Index Data Model l The Index Service keeps its data as a Service Group u Registering a new resource to be monitored is accomplished by adding a service group entry to the service group. l The data in each SGE contains both: u Configuration information l E.g., “query the X resource property from server Y” u and the actual collected data.

Index Data Model (simplified) Index Service Group SGE SG EPRSGE EPRContent ConfigData GLUECE QueueCluster NameStateNameOS RPEPR GetRP

Data Model continued l In the Index Service data model, data is grouped with its configuration information l Can have the “same” data two different places in the tree, if it was acquired from two different information sources. u E.g., information about a host’s load average from two different GRAM servers running on that host. l Relatively easy to find where each piece of data came from.

How the Index Updates its Data l Periodically, the Index Service examines each SGE in its Service Group l If the SGE’s registration has expired and not been renewed, it is destroyed. l Otherwise, the Index u looks at the Config part of the SGE content, u gathers data as specified by that config information, and u updates the data in the Data part of the SGE content l Data is updated periodically, not on demand.

Querying the Index Service l The Index Service advertises its service group as a resource property u You can fetch the whole thing with GetRP or GetMultipleRPs u Most people use QueryRP to query it. l QueryRP allows you to specify a dialect and a query u Currently, only Xpath is supported as a dialect

XPath Queries l Search an XML document and return some subset of the XML entities. l If an entity is included in the results, it’s included in its entirety u Unlike LDAP, no way to leave out attributes or children

MDS4 Trigger Service l A second monitoring service in MDS4 l The Index is geared more towards queries intended for resource location and selection. l The Trigger service is intended to alert people to problems. u Can be configured to take action (e.g., send mail to an administrator) when issues arise.

MDS4 Trigger Service l Maintains information in a service group, like the Index Service l SGE config information also includes an xpath query and an action u The action is the name of a program to run. l Periodically, the trigger service looks at each SGE in its servicegroup: u It evaluates the SGE’s xpath query against the SGE’s data. u If the query returns true, it runs the program specified by the action.

MDS4 WebMDS l Provides a simple HTTP interface to query an MDS Index Service u Really, to query resource properties of any WS-Resource l Optionally applies XSLT transforms to the query results. l Designed as a user interface, to be used with a web browser u But some people are using it to provide a REST-like interface to MDS4.

INCA l Monitoring system developed at SDSC l Users define tests for Inca to run. l Inca runs them and stores the results in a database. l Users can view the results on a web page. l Can be configured to send mail if tests fail, etc. l Can run tests using the user’s credentials

From the Inca 2.1 User’s Guide,

Inca Query Interface l Uses an SQL database internally l End-users can query using a web page or receive notifications via . l A web-services interface is also available u Uses a custom query language l Overall a nice monitoring/testing framework l Not designed as a discovery service

GMA (Grid Monitoring Architecture) l Proposed architecture with three components: u Producers produce information u Consumers consume information u Directories keep track of what information is available l what producers can be queried, not the actual data Diagram from “A Grid Monitoring Architecture”, B. Tierney et al., didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-2.pdf

R-GMA l Relational Grid Monitoring Architecture l Implements the GMA model u Except that users never interact with the directory service (called a “registry” in R- GMA) u A consumer service does that instead, and users query the consumer service. l Uses SQL as its query language.

An R-GMA Query Diagram from “R-GMA: Architectural Design” at consumers.html Client sends SQL query to Consumer Service Consumer Service contacts registry for list of producers to contact Consumer service queries producers and buffers results Client retrieves results from consumer service

For More Information l Globus: l Inca: l R-GMA: l XML / Xpath / XSLT: