Download presentation
Presentation is loading. Please wait.
Published byTerrance Goodrick Modified over 9 years ago
1
Monitoring and Discovery in a Web Services Framework: Functionality and Performance of Globus Toolkit MDS4 Jennifer M. Schopf Argonne National Laboratory UK National eScience Centre (NeSC) May 17, 2006
2
2 What is a Grid l Resource sharing u Computers, storage, sensors, networks, … u Sharing always conditional: issues of trust, policy, negotiation, payment, … l Coordinated problem solving u Beyond client-server: distributed data analysis, computation, collaboration, … l Dynamic, multi-institutional virtual orgs u Community overlays on classic org structures u Large or small, static or dynamic l Made more difficult by the lack of central control, shared resources, and increased need for communication
3
3 What Is Grid Monitoring? l Sharing of community data between sites using a standard interface for querying and notification u Data of interest to more than one site u Data of interest to more than one person u Summary data is possible to help scalability l Must deal with failures u Both of information sources and servers l Data likely to be inaccurate u Generally needs to be acceptable for data to be dated
4
4 Grid Monitoring l User level information l Static and dynamic data l Generally course grain u Queue length from 5 mins ago, not 5 ms ago
5
5 What is MDS4? l Grid-level monitoring system used most often for resource selection u Aid user/agent to identify host(s) on which to run an application l Part of the Globus Toolkit v4 l Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification u WS-ResourceProperties, WS-BaseNotification, WS- ServiceGroup l Functions as an hourglass to provide a common interface to lower-level monitoring tools
6
6 Standard Schemas (GLUE schema, eg) Information Users : Schedulers, Portals, Warning Systems, etc. Cluster monitors (Ganglia, Hawkeye, Clumon, and Nagios) Services (GRAM, RFT, RLS) Queuing systems (PBS, LSF, Torque) WS standard interfaces for subscription, registration, notification
7
7 MDS4 Components l Information providers u Monitoring is a part of every WSRF service u Non-WS services are also be used l Higher level services u Index Service – a way to aggregate data u Trigger Service – a way to be notified of changes u Both built on common aggregator framework l Clients u WebMDS l All of the tool are schema-agnostic, but interoperability needs a well-understood common language
8
8 Information Providers l Data sources for the higher-level services l Some are built into services u Any WSRF-compliant service publishes some data automatically u WS-RF gives us standard Query/Subscribe/Notify interfaces u GT4 services: ServiceMetaDataInfo element includes start time, version, and service type name u Most of them also publish additional useful information as resource properties
9
9 Information Providers (2) l Other sources of data u Any executables u Other (non-WS) services u Interface to another archive or data store u File scraping l Just need to produce a valid XML document
10
10 Information Providers: GT4 Services l Reliable File Transfer Service (RFT) u Service status data, number of active transfers, transfer status, information about the resource running the service l Community Authorization Service (CAS) u Identifies the VO served by the service instance l Replica Location Service (RLS) u Note: not a WS u Location of replicas on physical storage systems (based on user registrations) for later queries
11
11 Information Providers: Cluster and Queue Data l Interfaces to Hawkeye, Ganglia, CluMon, Nagios u Basic host data (name, ID), processor information, memory size, OS name and version, file system data, processor load data u Some condor/cluster specific data u This can also be done for sub-clusters, not just at the host level l Interfaces to PBS, Torque, LSF u Queue information, number of CPUs available and free, job count information, some memory statistics and host info for head node of cluster
12
12 Higher-Level Services l Index Service u Caching registry l Trigger Service u Warn on error conditions l Archive Service u Database store for history (in devleopment) l All of these have common needs, and are built on a common framework
13
13 Common Aggregator Framework l Basic framework for higher-level functions u Subscribe to Information Provider(s) u Do some action u Present standard interfaces
14
14 Aggregator Framework Features 1) Common configuration mechanism u Specify what data to get, and from where 2) Self cleaning u Services have lifetimes that must be refreshed 3) Soft consistency model u Published information is recent, but not guaranteed to be the absolute latest 4) Schema Neutral u Valid XML document needed only
15
15 MDS4 Index Service l Index Service is both registry and cache u Datatype and data provider info, like a registry (UDDI) u Last value of data, like a cache l In memory default approach u DB backing store currently being developed to allow for very large indexes l Can be set up for a site or set of sites, a specific set of project data, or for user- specific data only l Can be a multi-rooted hierarchy u No *global* index
16
16 MDS4 Trigger Service l Subscribe to a set of resource properties l Evaluate that data against a set of pre- configured conditions (triggers) l When a condition matches, action occurs u Email is sent to pre-defined address u Website updated l Similar functionality in Hawkeye
17
17 WebMDS User Interface l Web-based interface to WSRF resource property information l User-friendly front-end to Index Service l Uses standard resource property requests to query resource property data l XSLT transforms to format and display them l Customized pages are simply done by using HTML form options and creating your own XSLT transforms l Sample page: u http://mds.globus.org:8080/webmds/webm ds?info=indexinfo&xsl=servicegroupxsl
18
18 WebMDS Service
19
19
20
20 Any questions before I walk through two current deployments? l Grid Monitoring Defintions l MDS4 u Information Providers u Higher-level services u WebMDS l Deployments u Metascheduling Data for TeraGrid u Service Failure warning for ESG
21
21 Working with TeraGrid l Large US project across 9 different sites u Different hardware, queuing systems and lower level monitoring packages l Starting to explore MetaScheduling approaches u GRMS (Poznan) u W. Smith (TACC) u K. Yashimoto (SDSC) u User Portal l Need a common source of data with a standard interface for basic scheduling info
22
22 Data Collected l Provide data at the subcluster level u Sys admin defines a subcluster, we query one node of it to dynamically retrieve relevant data l Can also list per-host details l Interfaces to Ganglia, Hawkeye, CluMon, and Nagios available now u Other cluster monitoring systems can write into a.html file that we then scrape l Also collect basic queuing data, some TeraGrid specific attributes
23
23
24
24 Status l Demo system running since November u Queuing data from SDSC and NCSA u Cluster data using CluMon interface l All sites deploying u Queue data from 7 sites reporting in u Cluster data coming online l General patch for 4.0.x deployments available now
25
25 ESG use of MDS4 Trigger Service l Need a way to notify system administrators and users what the status of their services are l In particular, interested in u Replica Location Service (RLS) u Storage Resource Manager service (SRM) u OpenDAP u Web Server (HTTP) u GridFTP fileservers
26
26 Trigger Service and ESG Cont. l The Trigger service periodically checks to see if services are up and running l If a service is gone down or is unavailable for any reason, an action script is executed u Sends email to administrators u Update portal status page l Been in use for over a year (used GT3 version previously)
27
27
28
28 Contributing to MDS4 l Globus is opening up it’s development environment – similar to Apache Jakarta l MDS4 is a project in the new scheme l Contact me for more details u jms@mcs.anl.gov l http://dev.globus.org
29
29 Thanks l MDS4 Core Team: Mike D’Arcy (ISI), Laura Pearlman (ISI), Neill Miller (UC), Jennifer Schopf (ANL/NeSC) l MDS4 Additional Development help: Eric Blau, John Bresnahan, Mike Link l Students: Ioan Raicu, Xuehai Zhang l This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under contract W-31-109-Eng-38, and NSF NMI Award SCI- 0438372. This work also supported by DOESG SciDAC Grant, iVDGL from NSF, and others.
30
30 For More Information l Jennifer Schopf u jms@mcs.anl.gov u http://www.mcs.anl.gov/~jms l Globus Toolkit MDS4 u http://www.globus.org/toolkit/mds l Monitoring and Discovery in a Web Services Framework: Functionality and Performance of the Globus Toolkit's MDS4 u http://www.mcs.anl.gov/~jms/Pubs/ mds4.hpdc06.pdf
31
31 WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b Site 1 West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA BB CC DD EE VO Index FF Trigger action
32
32 Site 1 Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index BB CC DD EE VO Index FF Trigger action Index Container Service Registration II RFTABC
33
33
34
34
35
35 WebMDS Site 3 App B Index App B Index Site 3 Index Site 3 Index Rsc3.a RLS I Rsc3.b RLS II Rsc3.b Site 1 West Coast Index West Coast Index Trigger Service Rsc2.a Hawkeye Rsc2.b GRAM II Site 2 Index Site 2 Index Site 2 Index Ganglia/LSF Rsc1.c GRAM (LSF) I Ganglia/LSF Rsc1.c GRAM (LSF) GRAM (LSF) II Rsc1.a Ganglia/PBS Rsc1.b GRAM (PBS) I Ganglia/PBS Rsc1.b GRAM (PBS) GRAM (PBS) II Site 1 Index Site 1 Index Site 1 Index RFT Rsc1.d II AA BB CC DD EE VO Index FF Trigger action
36
36 Web Service Resource Framework (WS-RF) l Defines standard interfaces and behaviors for distributed system integration, especially (for us): u Standard XML-based service information model u Standard interfaces for push and pull mode access to service data l Notification and subscription
37
37 MDS4 Uses Web Service Standards l WS-ResourceProperties u Defines a mechanism by which Web Services can describe and publish resource properties, or sets of information about a resource u Resource property types defined in service’s WSDL u Resource properties can be retrieved using WS- ResourceProperties query operations l WS-BaseNotification u Defines a subscription/notification interface for accessing resource property information l WS-ServiceGroup u Defines a mechanism for grouping related resources and/or services together as service groups
38
38 OUTLINE l Grid Monitoring and Use Cases l MDS4 u Index Service u Trigger Service u Information Providers l Deployments u Metascheduling Data for TeraGrid u Service Failure warning for ESG l Performance Numbers
39
39 Scalability Experiments l MDS index u Dual 2.4GHz Xeon processors, 3.5 GB RAM u Sizes: 1, 10, 25, 50, 100 l Clients u 20 nodes also dual 2.6 GHz Xeon, 3.5 GB RAM u 1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128, 256, 384, 512, 640, 768, 800 l Nodes connected via 1Gb/s network l Each data point is average of 8 minutes u Ran for 10 mins but first 2 spent getting clients up and running u Error bars are SD over 8 mins l Experiments by Ioan Raicu, U of Chicago, using DiPerf
40
40 Size Comparison l In our current TeraGrid demo u 17 attributes from 10 queues at SDSC and NCSA u Host data - 3 attributes for approx 900 nodes u 12 attributes of sub-cluster data for 7 subclusters u ~3,000 attributes, ~1900 XML elements, ~192KB. l Tests here- 50 sample entries u element count of 1113 u ~94KB in size
41
41 MDS4 Throughput
42
42 MDS4 Response Time
43
43 MDS4 Stability Vers.Index Size Time up (Days) Queries Processed Query Per Sec. Round- trip Time (ms) 4.0.12566+81,701,9251469 4.0.15066+49,306,1048115 4.0.11003314,686,6385194 4.0.011493,890,2487613 4.0.0196623,395,8777413
44
44 Index Maximum Size Heap Size (MB) Approx. Max. Index Entries Index Size (MB) 646001.0 12812752.2 25626504.5 51254009.1 10241080017.7 15361620026.18
45
45 Performance l Is this enough? u We don’t know! u Currently gathering up usage statistics to find out what people need l Bottleneck examination u In the process of doing in depth performance analysis of what happens during a query u MDS code, implementation of WS-N, WS- RP, etc (These numbers are in an HPDC submission)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.