Monitoring and Discovery in a Web Services Framework: Functionality and Performance of Globus Toolkit MDS4 Jennifer M. Schopf Argonne National Laboratory.

Slides:



Advertisements
Similar presentations
Monitoring and Discovery in a Web Services Framework: Functionality and Performance of Globus Toolkit MDS4 Jennifer M. Schopf Argonne National Laboratory.
Advertisements

TeraGrid Deployment Test of Grid Software JP Navarro TeraGrid Software Integration University of Chicago OGF 21 October 19, 2007.
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Using Globus to Locate Services Case Study 1: A Distributed Information Service for TeraGrid John-Paul Navarro, Lee Liming.
October 2003 Iosif Legrand Iosif Legrand California Institute of Technology.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
1 Globus Developments Malcolm Atkinson for OMII SC 18 th January 2005.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Grid Monitoring By Zoran Obradovic CSE-510 October 2007.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
TeraGrid Information Services December 1, 2006 JP Navarro GIG Software Integration.
TeraGrid Information Services John-Paul “JP” Navarro TeraGrid Grid Infrastructure Group “GIG” Area Co-Director for Software Integration and Information.
TeraGrid Information Services JP Navarro, Lee Liming University of Chicago TeraGrid Architecture Meeting September 20, 2007.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Low Level Grid Services (Job Management, Data.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Module 7: Fundamentals of Administering Windows Server 2008.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
The Anatomy of the Grid Introduction The Nature of Grid Architecture Grid Architecture Description Grid Architecture in Practice Relationships with Other.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
SAN DIEGO SUPERCOMPUTER CENTER Inca TeraGrid Status Kate Ericson November 2, 2006.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Registries, ebXML and Web Services in short. Registry A mechanism for allowing users to announce, or discover, the availability and state of a resource:
Grid Monitoring and Information Services: Globus Toolkit MDS4 & TeraGrid Inca Jennifer M. Schopf Argonne National Lab UK National eScience Center (NeSC)
Portal Update Plan Ashok Adiga (512)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Introduction to Grid Computing and its components.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
GT3 Index Services Lecture for Cluster and Grid Computing, CSCE 490/590 Fall 2004, University of Arkansas, Dr. Amy Apon.
Gennaro Tortone, Sergio Fantinel – Bologna, LCG-EDT Monitoring Service DataTAG WP4 Monitoring Group DataTAG WP4 meeting Bologna –
The Earth System Grid (ESG) A Fault Monitoring System for ESG Components DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
DataGrid is a project funded by the European Commission EDG Conference, Heidelberg, Sep 26 – Oct under contract IST OGSI and GT3 Initial.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
Current Globus Developments Jennifer Schopf, ANL.
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
DataTAG is a project funded by the European Union CERN, 8 May 2003 – n o 1 / 10 Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
CSF4 Meta-Scheduler Zhaohui Ding College of Computer Science & Technology Jilin University.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
GPIR GridPort Information Repository
TeraGrid Information Services Developer Introduction
Information Services Discussion TeraGrid ‘08
The Globus Toolkit™: Information Services
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Monitoring and Discovery in a Web Services Framework: Functionality and Performance of Globus Toolkit MDS4 Jennifer M. Schopf Argonne National Laboratory UK National eScience Centre (NeSC) May 17, 2006

2 What is a Grid l Resource sharing u Computers, storage, sensors, networks, … u Sharing always conditional: issues of trust, policy, negotiation, payment, … l Coordinated problem solving u Beyond client-server: distributed data analysis, computation, collaboration, … l Dynamic, multi-institutional virtual orgs u Community overlays on classic org structures u Large or small, static or dynamic l Made more difficult by the lack of central control, shared resources, and increased need for communication

3 What Is Grid Monitoring? l Sharing of community data between sites using a standard interface for querying and notification u Data of interest to more than one site u Data of interest to more than one person u Summary data is possible to help scalability l Must deal with failures u Both of information sources and servers l Data likely to be inaccurate u Generally needs to be acceptable for data to be dated

4 Grid Monitoring l User level information l Static and dynamic data l Generally course grain u Queue length from 5 mins ago, not 5 ms ago

5 What is MDS4? l Grid-level monitoring system used most often for resource selection u Aid user/agent to identify host(s) on which to run an application l Part of the Globus Toolkit v4 l Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification u WS-ResourceProperties, WS-BaseNotification, WS- ServiceGroup l Functions as an hourglass to provide a common interface to lower-level monitoring tools

6 Standard Schemas (GLUE schema, eg) Information Users : Schedulers, Portals, Warning Systems, etc. Cluster monitors (Ganglia, Hawkeye, Clumon, and Nagios) Services (GRAM, RFT, RLS) Queuing systems (PBS, LSF, Torque) WS standard interfaces for subscription, registration, notification

7 MDS4 Components l Information providers u Monitoring is a part of every WSRF service u Non-WS services are also be used l Higher level services u Index Service – a way to aggregate data u Trigger Service – a way to be notified of changes u Both built on common aggregator framework l Clients u WebMDS l All of the tool are schema-agnostic, but interoperability needs a well-understood common language

8 Information Providers l Data sources for the higher-level services l Some are built into services u Any WSRF-compliant service publishes some data automatically u WS-RF gives us standard Query/Subscribe/Notify interfaces u GT4 services: ServiceMetaDataInfo element includes start time, version, and service type name u Most of them also publish additional useful information as resource properties

9 Information Providers (2) l Other sources of data u Any executables u Other (non-WS) services u Interface to another archive or data store u File scraping l Just need to produce a valid XML document

10 Information Providers: GT4 Services l Reliable File Transfer Service (RFT) u Service status data, number of active transfers, transfer status, information about the resource running the service l Community Authorization Service (CAS) u Identifies the VO served by the service instance l Replica Location Service (RLS) u Note: not a WS u Location of replicas on physical storage systems (based on user registrations) for later queries

11 Information Providers: Cluster and Queue Data l Interfaces to Hawkeye, Ganglia, CluMon, Nagios u Basic host data (name, ID), processor information, memory size, OS name and version, file system data, processor load data u Some condor/cluster specific data u This can also be done for sub-clusters, not just at the host level l Interfaces to PBS, Torque, LSF u Queue information, number of CPUs available and free, job count information, some memory statistics and host info for head node of cluster

12 Higher-Level Services l Index Service u Caching registry l Trigger Service u Warn on error conditions l Archive Service u Database store for history (in devleopment) l All of these have common needs, and are built on a common framework

13 Common Aggregator Framework l Basic framework for higher-level functions u Subscribe to Information Provider(s) u Do some action u Present standard interfaces

14 Aggregator Framework Features 1) Common configuration mechanism u Specify what data to get, and from where 2) Self cleaning u Services have lifetimes that must be refreshed 3) Soft consistency model u Published information is recent, but not guaranteed to be the absolute latest 4) Schema Neutral u Valid XML document needed only

15 MDS4 Index Service l Index Service is both registry and cache u Datatype and data provider info, like a registry (UDDI) u Last value of data, like a cache l In memory default approach u DB backing store currently being developed to allow for very large indexes l Can be set up for a site or set of sites, a specific set of project data, or for user- specific data only l Can be a multi-rooted hierarchy u No *global* index

16 MDS4 Trigger Service l Subscribe to a set of resource properties l Evaluate that data against a set of pre- configured conditions (triggers) l When a condition matches, action occurs u is sent to pre-defined address u Website updated l Similar functionality in Hawkeye

17 WebMDS User Interface l Web-based interface to WSRF resource property information l User-friendly front-end to Index Service l Uses standard resource property requests to query resource property data l XSLT transforms to format and display them l Customized pages are simply done by using HTML form options and creating your own XSLT transforms l Sample page: u ds?info=indexinfo&xsl=servicegroupxsl

18 WebMDS Service

19

20 Any questions before I walk through two current deployments? l Grid Monitoring Defintions l MDS4 u Information Providers u Higher-level services u WebMDS l Deployments u Metascheduling Data for TeraGrid u Service Failure warning for ESG

21 Working with TeraGrid l Large US project across 9 different sites u Different hardware, queuing systems and lower level monitoring packages l Starting to explore MetaScheduling approaches u GRMS (Poznan) u W. Smith (TACC) u K. Yashimoto (SDSC) u User Portal l Need a common source of data with a standard interface for basic scheduling info

22 Data Collected l Provide data at the subcluster level u Sys admin defines a subcluster, we query one node of it to dynamically retrieve relevant data l Can also list per-host details l Interfaces to Ganglia, Hawkeye, CluMon, and Nagios available now u Other cluster monitoring systems can write into a.html file that we then scrape l Also collect basic queuing data, some TeraGrid specific attributes

23

24 Status l Demo system running since November u Queuing data from SDSC and NCSA u Cluster data using CluMon interface l All sites deploying u Queue data from 7 sites reporting in u Cluster data coming online l General patch for 4.0.x deployments available now

25 ESG use of MDS4 Trigger Service l Need a way to notify system administrators and users what the status of their services are l In particular, interested in u Replica Location Service (RLS) u Storage Resource Manager service (SRM) u OpenDAP u Web Server (HTTP) u GridFTP fileservers

26 Trigger Service and ESG Cont. l The Trigger service periodically checks to see if services are up and running l If a service is gone down or is unavailable for any reason, an action script is executed u Sends to administrators u Update portal status page l Been in use for over a year (used GT3 version previously)

27

28 Contributing to MDS4 l Globus is opening up it’s development environment – similar to Apache Jakarta l MDS4 is a project in the new scheme l Contact me for more details u l

29 Thanks l MDS4 Core Team: Mike D’Arcy (ISI), Laura Pearlman (ISI), Neill Miller (UC), Jennifer Schopf (ANL/NeSC) l MDS4 Additional Development help: Eric Blau, John Bresnahan, Mike Link l Students: Ioan Raicu, Xuehai Zhang l This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under contract W Eng-38, and NSF NMI Award SCI This work also supported by DOESG SciDAC Grant, iVDGL from NSF, and others.

30 For More Information l Jennifer Schopf u u l Globus Toolkit MDS4 u l Monitoring and Discovery in a Web Services Framework: Functionality and Performance of the Globus Toolkit's MDS4 u mds4.hpdc06.pdf

31 WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b Site 1 West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA BB CC DD EE VO Index FF Trigger action

32 Site 1 Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index BB CC DD EE VO Index FF Trigger action Index Container Service Registration II RFTABC

33

34

35 WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b Site 1 West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA BB CC DD EE VO Index FF Trigger action

36 Web Service Resource Framework (WS-RF) l Defines standard interfaces and behaviors for distributed system integration, especially (for us): u Standard XML-based service information model u Standard interfaces for push and pull mode access to service data l Notification and subscription

37 MDS4 Uses Web Service Standards l WS-ResourceProperties u Defines a mechanism by which Web Services can describe and publish resource properties, or sets of information about a resource u Resource property types defined in service’s WSDL u Resource properties can be retrieved using WS- ResourceProperties query operations l WS-BaseNotification u Defines a subscription/notification interface for accessing resource property information l WS-ServiceGroup u Defines a mechanism for grouping related resources and/or services together as service groups

38 OUTLINE l Grid Monitoring and Use Cases l MDS4 u Index Service u Trigger Service u Information Providers l Deployments u Metascheduling Data for TeraGrid u Service Failure warning for ESG l Performance Numbers

39 Scalability Experiments l MDS index u Dual 2.4GHz Xeon processors, 3.5 GB RAM u Sizes: 1, 10, 25, 50, 100 l Clients u 20 nodes also dual 2.6 GHz Xeon, 3.5 GB RAM u 1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128, 256, 384, 512, 640, 768, 800 l Nodes connected via 1Gb/s network l Each data point is average of 8 minutes u Ran for 10 mins but first 2 spent getting clients up and running u Error bars are SD over 8 mins l Experiments by Ioan Raicu, U of Chicago, using DiPerf

40 Size Comparison l In our current TeraGrid demo u 17 attributes from 10 queues at SDSC and NCSA u Host data - 3 attributes for approx 900 nodes u 12 attributes of sub-cluster data for 7 subclusters u ~3,000 attributes, ~1900 XML elements, ~192KB. l Tests here- 50 sample entries u element count of 1113 u ~94KB in size

41 MDS4 Throughput

42 MDS4 Response Time

43 MDS4 Stability Vers.Index Size Time up (Days) Queries Processed Query Per Sec. Round- trip Time (ms) ,701, ,306, ,686, ,890, ,395,

44 Index Maximum Size Heap Size (MB) Approx. Max. Index Entries Index Size (MB)

45 Performance l Is this enough? u We don’t know! u Currently gathering up usage statistics to find out what people need l Bottleneck examination u In the process of doing in depth performance analysis of what happens during a query u MDS code, implementation of WS-N, WS- RP, etc (These numbers are in an HPDC submission)