Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)

Similar presentations


Presentation on theme: "Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)"— Presentation transcript:

1 Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)

2 Nov 2, 20042 Overview l Brief overview of what I mean by “Grid monitoring” l Tool for Monitoring/Discovery: –Globus Toolkit MDS 4 l Tool for Monitoring/Status Tracking –Inca from the TeraGrid project l Just added: GLUE schema in a nutshell

3 Nov 2, 20043 What do I mean by monitoring? l Discovery and expression of data l Discovery: –Registry service –Contains descriptions of data that is available –Sometimes also where last value of data is kept (caching) l Expression of data –Access to sensors, archives, etc. –Producer (in consumer producer model)

4 Nov 2, 20044 What do I mean by Grid monitoring? l Grid level monitoring concerns data that is: –Shared between administrative domains –For use by multiple people –Often summarized –(think scalability) l Different levels of monitoring needed: –Application specific –Node level –Cluster/site Level –Grid level l Grid monitoring may contain summaries of lower level monitoring

5 Nov 2, 20045 Grid Monitoring Does Not Include… l All the data about every node of every site l Years of utilization logs to use for planning next hardware purchase l Low-level application progress details for a single user l Application debugging data (except perhaps notification of a failure of a heartbeat) l Point-to-point sharing of all data over all sites

6 Nov 2, 20046 What monitoring systems look like GMA architecture Consumer Producer Registry register lookup

7 Nov 2, 20047 Compound Producer-Consumers l In order to have more than just data sources and simple sinks approaches combine these Consumer Producer Consumer Producer

8 Nov 2, 20048 Pieces of a Grid Monitoring System l Producer –Any component that publishes monitoring data (also called a sensor, data source, information provider, etc) l Consumer –Any component the requests data from a producer l Registry or directory service –A construct (database?) containing information on what producer publishes what events, and what the event schemas are for those events –Some approaches cache data (last value) as well l Higher-Level services –Aggregation, Trigger Services, Archiving l Client Tools –APIs, Viz services, etc

9 Nov 2, 20049 PGI Monitoring Defined Usecases l Joint PPDG, GriPhyN and iVDGL effort to define monitoring requirements l http://www.mcs.anl.gov/~jms/pg- monitoring l 19 use cases from ~9 groups l Roughly 4 categories –Health of system (NW, servers, cpus, etc) –System upgrade evaluation –Resource selection –Application-specific progress tracking

10 Nov 2, 200410 Why So Many Monitoring Systems? l There is no ONE tool for this job –Nor would you ever get agreement between sites to all deploy it if there was l Best you can hope for is –An understanding of overlap –Standard-defined interactions when possible

11 Nov 2, 200411 Things to Think About When Comparing Systems l What is the main use case your system addresses? l What are the base set of sensors given with a system? l How does that set get extended? l What are you doing for discovery/registry? l What schema are you using (do you interact with)? l Is this system meant to monitor a machine, a cluster, or send data between sites, or some combination of the above? l What kind of testing has been done in terms of scalability (several pieces to this - how often is data updated, how many users, how many data sources, how many sites, etc)

12 Nov 2, 200412 Two Systems To Consider l Globus Toolkit Monitoring and Discovery System 4 (MDS4) –WSRF-compatible –Resource Discovery –Service Status l Inca test harness and reporting framework –TeraGrid project –Service agreement monitoring – software stack, service up/down, performance

13 Nov 2, 200413 Monitoring and Discovery Service in GT4 (MDS4) l WS-RF compatible l Monitoring of basic service data l Primary use case is discovery of services l Starting to be used for up/down statistics

14 Nov 2, 200414 MDS4 Producers: Information Providers l Code that generates resource property information –Were called service data providers in GT3 l XML Based – not LDAP l Basic cluster data –Interface to Ganglia –GLUE schema l Some service data from GT4 services –Start, timeout, etc l Soft-state registration l Push and pull data models

15 Nov 2, 200415 MDS4 Registry: Aggregator l Aggregator is both registry and cache l Subscribes to information providers –Data, datatype, data provider information l Caches last value of all data l In memory default approach

16 Nov 2, 200416 MDS4 Trigger Service l Compound consumer-producer service l Subscribe to a set of resource properties l Set of tests on incoming data streams to evaluate trigger conditions l When a condition matches, email is sent to pre-defined address l GT3 tech-preview version in use by ESG l GT4 version alpha is in GT4 alpha release currently available

17 Nov 2, 200417 MDS4 Archive Service l Compound consumer-producer service l Subscribe to a set of resource properties l Data put into database (Xindice) l Other consumers can contact database archive interface l Will be in GT4 beta release

18 Nov 2, 200418 MDS4 Clients l Command line, Java and C APIs l MDSWeb Viz service –Tech preview in current alpha (3.9.3 last week)

19 Nov 2, 200419

20 Nov 2, 200420 Coming Up Soon… l Extend MDS4 information providers –More data from GT4 services (GRAM, RFT, RLS) –Interface to other tests (Inca, GRASP) –Interface to archiver (PinGER, Ganglia, others) l Scalability testing and development l Additional clients l If tracking job stats is of interest this is something we can talk about

21 Nov 2, 200421 TeraGrid Inca l Originally developed for the TeraGrid project to verify its software stack l Now part of the NMI GRIDS center software l Now performs automated verification of service- level agreements –Software versions –Basic software and service tests – local and cross- site –Performance benchmarks l Best use: CERTIFICATION –Is this site Project Compliant? –Have upgrades taken place in a timely fashion?

22 Nov 2, 200422 Inca Producers: Reporters l Over 100 tests deployed on each TG resource (9 sites) –Load on host systems less than 0.05% overall l Primarily specific software versions and functionality tests –Versions not functionality because functionality is an open question –Grid service capabilities cross-site –GT 2.4.3 GRAM jobs submission & GridFTP –OpenSSH –MyProxy l Soon to be deployed: SRB, VMI, BONNIE benchmarks, LAPACK Benchmarks

23 Nov 2, 200423 Support Services l Distributed controller –runs on each client resource –controls the local data collection through the reporters l Centralized controller –system administrators can change data collection rates and deployment of the reporters l Archive system (depot) –collects all the reporter data using a round-robin database scheme.

24 Nov 2, 200424

25 Nov 2, 200425 Interfaces l Command line, C, and Perl APIs l Several GUI clients l Executive view –http://tech.teragrid.org/inca/TG/html/execV iew.htmlhttp://tech.teragrid.org/inca/TG/html/execV iew.html l Overall Status –http://tech.teragrid.org/inca/TG/html/stack Status.htmlhttp://tech.teragrid.org/inca/TG/html/stack Status.html

26 Nov 2, 200426 Example Summary View Snapshot History of percentage of tests passed in “Grid” category for a one week period All tests passed: 100% One or more tests failed: < 100% Key Tests not applicable to machine or have not yet been ported

27 Nov 2, 200427

28 Nov 2, 200428 Inca Future Plans l Paper being presented at SC04 –Scalability results (soon to be posted here) –www.mcs.anl.gov/~jms/Pubs/jmspubs.htmlwww.mcs.anl.gov/~jms/Pubs/jmspubs.html l Extending information and sites l Restructuring depot (archiving) for added scalability (RRDB won’t meet future needs) l Cascading reporters – trigger more info on failure l Discussions with several groups to consider adoption/certification programs –NEES, GEON, UK NGS, others

29 Nov 2, 200429 GLUE Schema l Why do we need a fixed schema? –Communication between projects l Condor doesn’t have one – why do we need one? –Condor has a defacto schema –OS won’t match to OpSys – major problem when matchmaking between sites l What about doing updates? –Schema updates should NOT be done on the fly if you want to maintain compatibility –On the other hand, they don’t need to be since by definition they include deploying new sensors to gather data –Whether or not sw has to be re-started after a deployment is an implementation issue, not a schema issue

30 Nov 2, 200430 Glue Schema l Does a schema have to define everything? –No – GLUE schema v1 was in use and by plan did NOT define everything –It had extendable pieces so we could get more hands on use –This is what projects have been doing since it was defined 18 months ago

31 Nov 2, 200431 Extending the GLUE Schema l Sergio Andreozzi proposed extending the GLUE schema to take into account project-specific details –We now have hands on experience –Every project has added their own extension –We need to unify them l Mailman list –www.hicb.org/mailman/listinfo/glue-schema l Bugzilla-like system for tracking the proposed changes –infnforge.cnaf.infn.it/projects/glueinfomodel / –Currently only used by Sergio :) l Mail this morning suggesting better requirement gathering and phone call/meeting to move forward

32 Nov 2, 200432 Ways Forward l Sharing of tests between infrastructures l Help contribute to GLUE schema l Share use cases and scalability requirements l Hardest thing in Grid computing isn’t technical, it’s socio-political and communication

33 Nov 2, 200433 For More Information l Jennifer Schopf –jms@mcs.anl.govjms@mcs.anl.gov –http://www.mcs.anl.gov/~jmshttp://www.mcs.anl.gov/~jms l Globus Toolkit MDS4 –http://www.globus.org/mdshttp://www.globus.org/mds l Inca –http://tech.teragrid.org/incahttp://tech.teragrid.org/inca l Scalability comparison of MDS2, Hawkeye, R-GMA www.mcs.anl.gov/~jms/Pubs/xuehaijeff-hpdc2003.pdf l Monitoring Clusters, Monitoring the Grid – ClusterWorld –http://www.grids-center.org/news/clusterworld/http://www.grids-center.org/news/clusterworld/


Download ppt "Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)"

Similar presentations


Ads by Google