Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)

Slides:

Advertisements

Similar presentations

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.

Advertisements

SAN DIEGO SUPERCOMPUTER CENTER Inca 2.0 Shava Smallen Grid Development Group San Diego Supercomputer Center June 26, 2006.

21 Sep 2005LCG's R-GMA Applications R-GMA and LCG Steve Fisher & Antony Wilson.

Test harness and reporting framework Shava Smallen San Diego Supercomputer Center Grid Performance Workshop 6/22/05.

Grid Monitoring Discussion Dantong Yu BNL. Overview Goal Concept Types of sensors User Scenarios Architecture Near term project Discuss topics.

A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.

Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.

Massimo Cafaro GridLab Review GridLab WP10 Information Services Massimo Cafaro CACT/ISUFI University of Lecce, Italy.

1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.

Sergey Belov, LIT JINR 15 September, NEC’2011, Varna, Bulgaria.

Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.

Globus Computing Infrustructure Software Globus Toolkit 11-2.

1 Globus Developments Malcolm Atkinson for OMII SC 18 th January 2005.

Globus 4 Guy Warner NeSC Training.

Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.

Grid Monitoring By Zoran Obradovic CSE-510 October 2007.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

Monitoring and Accounting on the NGS Guy Warner NeSC TOE Team.

SUN HPC Consortium, Heidelberg 2004 Grid(Lab) Resource Management System (GRMS) and GridLab Services Krzysztof Kurowski Poznan Supercomputing and Networking.

GIG Software Integration: Area Overview TeraGrid Annual Project Review April, 2008.

TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.

TeraGrid Information Services John-Paul “JP” Navarro TeraGrid Grid Infrastructure Group “GIG” Area Co-Director for Software Integration and Information.

Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,

TeraGrid Information Services JP Navarro, Lee Liming University of Chicago TeraGrid Architecture Meeting September 20, 2007.

Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Daniel Vanderster University of Victoria National Research Council and the University of Victoria 1 GridX1 Services Project A. Agarwal, A. Berman, A. Charbonneau,

OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

NeSC Apps Workshop July 20 th, 2002 Customizable command line tools for Grids Ian Kelley + Gabrielle Allen Max Planck Institute for Gravitational Physics.

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.

INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

GRID IIII D UK Particle Physics GridPP Collaboration meeting - R.P.Middleton (RAL/PPD) 23-25th May Grid Monitoring Services Robin Middleton RAL/PPD24-May-01.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Application code Registry 1 Alignment of R-GMA with developments in the Open Grid Services Architecture (OGSA) is advancing. The existing Servlets and.

1 / 18 Federal University of Rio de Janeiro – COPPE/UFRJ Author : Wladimir S. Meyer – Doctorate Student Advisors : Jano Moreira de Souza – Ph.D. Milton.

Grid Workload Management Massimo Sgaravatto INFN Padova.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

Shannon Hastings Multiscale Computing Laboratory Department of Biomedical Informatics.

TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.

An information and monitoring system for static and dynamic information about grid resources, applications, networks … RDBMS Servlet aware of API during.

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.

1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.

SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.

Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.

E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.

SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.

Portal Update Plan Ashok Adiga (512)

1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.

INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.

Alain Roy Computer Sciences Department University of Wisconsin-Madison Condor & Middleware: NMI & VDT.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.

 CMS data challenges. The nature of the problem.  What is GMA ?  And what is R-GMA ?  Performance test description  Performance test results  Conclusions.

GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.

Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.

SAN DIEGO SUPERCOMPUTER CENTER Welcome to the 2nd Inca Workshop Sponsored by the NSF September 4 & 5, 2008 Presenters: Shava Smallen

EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.

Monitoring Guy Warner NeSC Training.

DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.

Current Globus Developments Jennifer Schopf, ANL.

A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.

Report on GLUE activities 5th EU-DataGRID Conference

Presentation transcript:

Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)

Nov 2, Overview l Brief overview of what I mean by “Grid monitoring” l Tool for Monitoring/Discovery: –Globus Toolkit MDS 4 l Tool for Monitoring/Status Tracking –Inca from the TeraGrid project l Just added: GLUE schema in a nutshell

Nov 2, What do I mean by monitoring? l Discovery and expression of data l Discovery: –Registry service –Contains descriptions of data that is available –Sometimes also where last value of data is kept (caching) l Expression of data –Access to sensors, archives, etc. –Producer (in consumer producer model)

Nov 2, What do I mean by Grid monitoring? l Grid level monitoring concerns data that is: –Shared between administrative domains –For use by multiple people –Often summarized –(think scalability) l Different levels of monitoring needed: –Application specific –Node level –Cluster/site Level –Grid level l Grid monitoring may contain summaries of lower level monitoring

Nov 2, Grid Monitoring Does Not Include… l All the data about every node of every site l Years of utilization logs to use for planning next hardware purchase l Low-level application progress details for a single user l Application debugging data (except perhaps notification of a failure of a heartbeat) l Point-to-point sharing of all data over all sites

Nov 2, What monitoring systems look like GMA architecture Consumer Producer Registry register lookup

Nov 2, Compound Producer-Consumers l In order to have more than just data sources and simple sinks approaches combine these Consumer Producer Consumer Producer

Nov 2, Pieces of a Grid Monitoring System l Producer –Any component that publishes monitoring data (also called a sensor, data source, information provider, etc) l Consumer –Any component the requests data from a producer l Registry or directory service –A construct (database?) containing information on what producer publishes what events, and what the event schemas are for those events –Some approaches cache data (last value) as well l Higher-Level services –Aggregation, Trigger Services, Archiving l Client Tools –APIs, Viz services, etc

Nov 2, PGI Monitoring Defined Usecases l Joint PPDG, GriPhyN and iVDGL effort to define monitoring requirements l monitoring l 19 use cases from ~9 groups l Roughly 4 categories –Health of system (NW, servers, cpus, etc) –System upgrade evaluation –Resource selection –Application-specific progress tracking

Nov 2, Why So Many Monitoring Systems? l There is no ONE tool for this job –Nor would you ever get agreement between sites to all deploy it if there was l Best you can hope for is –An understanding of overlap –Standard-defined interactions when possible

Nov 2, Things to Think About When Comparing Systems l What is the main use case your system addresses? l What are the base set of sensors given with a system? l How does that set get extended? l What are you doing for discovery/registry? l What schema are you using (do you interact with)? l Is this system meant to monitor a machine, a cluster, or send data between sites, or some combination of the above? l What kind of testing has been done in terms of scalability (several pieces to this - how often is data updated, how many users, how many data sources, how many sites, etc)

Nov 2, Two Systems To Consider l Globus Toolkit Monitoring and Discovery System 4 (MDS4) –WSRF-compatible –Resource Discovery –Service Status l Inca test harness and reporting framework –TeraGrid project –Service agreement monitoring – software stack, service up/down, performance

Nov 2, Monitoring and Discovery Service in GT4 (MDS4) l WS-RF compatible l Monitoring of basic service data l Primary use case is discovery of services l Starting to be used for up/down statistics

Nov 2, MDS4 Producers: Information Providers l Code that generates resource property information –Were called service data providers in GT3 l XML Based – not LDAP l Basic cluster data –Interface to Ganglia –GLUE schema l Some service data from GT4 services –Start, timeout, etc l Soft-state registration l Push and pull data models

Nov 2, MDS4 Registry: Aggregator l Aggregator is both registry and cache l Subscribes to information providers –Data, datatype, data provider information l Caches last value of all data l In memory default approach

Nov 2, MDS4 Trigger Service l Compound consumer-producer service l Subscribe to a set of resource properties l Set of tests on incoming data streams to evaluate trigger conditions l When a condition matches, is sent to pre-defined address l GT3 tech-preview version in use by ESG l GT4 version alpha is in GT4 alpha release currently available

Nov 2, MDS4 Archive Service l Compound consumer-producer service l Subscribe to a set of resource properties l Data put into database (Xindice) l Other consumers can contact database archive interface l Will be in GT4 beta release

Nov 2, MDS4 Clients l Command line, Java and C APIs l MDSWeb Viz service –Tech preview in current alpha (3.9.3 last week)

Nov 2,

Nov 2, Coming Up Soon… l Extend MDS4 information providers –More data from GT4 services (GRAM, RFT, RLS) –Interface to other tests (Inca, GRASP) –Interface to archiver (PinGER, Ganglia, others) l Scalability testing and development l Additional clients l If tracking job stats is of interest this is something we can talk about

Nov 2, TeraGrid Inca l Originally developed for the TeraGrid project to verify its software stack l Now part of the NMI GRIDS center software l Now performs automated verification of service- level agreements –Software versions –Basic software and service tests – local and cross- site –Performance benchmarks l Best use: CERTIFICATION –Is this site Project Compliant? –Have upgrades taken place in a timely fashion?

Nov 2, Inca Producers: Reporters l Over 100 tests deployed on each TG resource (9 sites) –Load on host systems less than 0.05% overall l Primarily specific software versions and functionality tests –Versions not functionality because functionality is an open question –Grid service capabilities cross-site –GT GRAM jobs submission & GridFTP –OpenSSH –MyProxy l Soon to be deployed: SRB, VMI, BONNIE benchmarks, LAPACK Benchmarks

Nov 2, Support Services l Distributed controller –runs on each client resource –controls the local data collection through the reporters l Centralized controller –system administrators can change data collection rates and deployment of the reporters l Archive system (depot) –collects all the reporter data using a round-robin database scheme.

Nov 2,

Nov 2, Interfaces l Command line, C, and Perl APIs l Several GUI clients l Executive view – iew.htmlhttp://tech.teragrid.org/inca/TG/html/execV iew.html l Overall Status – Status.htmlhttp://tech.teragrid.org/inca/TG/html/stack Status.html

Nov 2, Example Summary View Snapshot History of percentage of tests passed in “Grid” category for a one week period All tests passed: 100% One or more tests failed: < 100% Key Tests not applicable to machine or have not yet been ported

Nov 2,

Nov 2, Inca Future Plans l Paper being presented at SC04 –Scalability results (soon to be posted here) – l Extending information and sites l Restructuring depot (archiving) for added scalability (RRDB won’t meet future needs) l Cascading reporters – trigger more info on failure l Discussions with several groups to consider adoption/certification programs –NEES, GEON, UK NGS, others

Nov 2, GLUE Schema l Why do we need a fixed schema? –Communication between projects l Condor doesn’t have one – why do we need one? –Condor has a defacto schema –OS won’t match to OpSys – major problem when matchmaking between sites l What about doing updates? –Schema updates should NOT be done on the fly if you want to maintain compatibility –On the other hand, they don’t need to be since by definition they include deploying new sensors to gather data –Whether or not sw has to be re-started after a deployment is an implementation issue, not a schema issue

Nov 2, Glue Schema l Does a schema have to define everything? –No – GLUE schema v1 was in use and by plan did NOT define everything –It had extendable pieces so we could get more hands on use –This is what projects have been doing since it was defined 18 months ago

Nov 2, Extending the GLUE Schema l Sergio Andreozzi proposed extending the GLUE schema to take into account project-specific details –We now have hands on experience –Every project has added their own extension –We need to unify them l Mailman list – l Bugzilla-like system for tracking the proposed changes –infnforge.cnaf.infn.it/projects/glueinfomodel / –Currently only used by Sergio :) l Mail this morning suggesting better requirement gathering and phone call/meeting to move forward

Nov 2, Ways Forward l Sharing of tests between infrastructures l Help contribute to GLUE schema l Share use cases and scalability requirements l Hardest thing in Grid computing isn’t technical, it’s socio-political and communication

Nov 2, For More Information l Jennifer Schopf – l Globus Toolkit MDS4 – l Inca – l Scalability comparison of MDS2, Hawkeye, R-GMA l Monitoring Clusters, Monitoring the Grid – ClusterWorld –