Presentation is loading. Please wait.

Presentation is loading. Please wait.

GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.

Similar presentations


Presentation on theme: "GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting."— Presentation transcript:

1 GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting 02.03.2007

2 Gridview : New Developments (During 16 th September to 1 st March) New Developments have been made in the following areas –Transport Mechanism for Gridftp data – File Transfer Monitoring – Service Availability Monitoring – Job Monitoring –Version Management

3 Transport Mechanism for Gridftp Data Loss of tuples and instabilities in R-GMA severely affected data transfer rates displayed by GridView As a quick solution, Developed a new Archiver Module to –periodically copy Gridftp logs from CERN hosts –insert data directly in GridView Database This New module is in Production since last 3 months and there is absolutely no data loss

4 WS based Transport Mechanism Developed a Web Services based transport mechanism as an alternative to R-GMA for collection of data in GridView We deployed it earlier for collection of SAM data and it is working reliably Now we developed Web Services based solution for collection of Gridftp Data as well

5 Development of WS based Transport Development of WS based Transport for Gridftp data involved –Development of Server Module to Archive the Data –Development of WS Client Module to publish the data –Packaging of the client module as full fledged RPM to take care of upgrade, erase, static configuration, deployment on i386 as well as x86_64 systems

6 Deployment of WS based Transport Deployment of WS based Transport for Gridftp data involved –Deployment of WS Server Module to archive data on Validation DB –Testing of WS Client Module with dummy data –Deployment of WS Client initially over 4 Gridftp servers at CERN to publish live data –Setup of comparison scripts to validate the data received via WS transport with the original source –Large scale deployment of WS Client on all Gridftp servers at CERN (over 200+ servers) –Validation of data followed by series of bug fixes and enhancements –Finally Deployment of WS Server Module to archive data to production DB

7 WS based Transport : Current Status WS based transport for Gridftp is fully in production for CERN servers Direct copy based Archiver Module stopped now Gridview does not rely on R-GMA for Gridftp data from CERN servers Data from sites outside CERN is still received from R- GMA WS based solution should be integrated with gLite distribution for deployment at other sites WS based transport will also be deployed for collection of Job Status Data (We have to go thru similar Development and Deployment cycles)

8 File Transfer Monitoring Implemented Weekly and Monthly Reports for VO-wise data transfers (Hourly and Daily reports were implemented earlier) Implemented Weekly and Monthly Reports for Site-wise data transfers for following cases (missing earlier) –Transfers from All sites to All sites –Transfers from All Sites to a particular site

9 File Transfer Monitoring : VO-wise Weekly Report

10 Service Availability Monitoring Developed Graphs and Reports for presentation of SAM Test Results with various levels of details ranging from –Bar Graphs indicating status –Summary tables displaying result summaries –Detailed results displaying output of the tests (useful for troubleshooting purposes) Implemented Traceability from Service Availability Graphs to corresponding test results providing full transparency in Availability numbers generated.

11 Service Availability Monitoring : Site Detail Availability

12 Service Availability Monitoring : Bar Graphs for Test Results

13 Service Availability Monitoring : Test Summary Table

14 Job Monitoring Fixed a few bugs and made some changes on users’ feedback Added tool-tips for Job status Graphs Developed report for RB-wise classification of jobs lost from monitoring (due to records missing from R-GMA or other problems)

15 Version Management Implemented Version Management and Display in GridView Individual version numbers for the modules Overall version number for the project Modules are tagged in the CVS Stable versions are deployed to production instance

16 Participation in WLCG Monitoring Workgroup Participating in the WLCG Monitoring core Workgroup (member of core working group) Monitoring Workgroup is working on standardization of Grid Sensors, Transport, Repository/Schema, Visualization, Interfaces between monitoring tools/components etc. Implementing the recommendations from the workgroup in GridView

17 Ongoing Work : SAM/GridView Integration A few similar and some complementary components/features were present in SAM and GridView In order to maintain integrity of data and avoid duplication of work it was decided to integrate SAM and GridView We had a series of Meetings and Discussions and agreed upon the integration strategy with clearly defined roles of each

18 SAM/GridView Integration Strategy SAM and GridView will be complementary tools constituting an integral whole SAM and GridView Databases are tightly coupled sharing tables across each other SAM is using GOCDB and other related tables from GridView SAM will basically act as a test framework and its database will host test related tables and test results All derived metrics like Service/Site Availability, Downtimes, Reliability will be computed/maintained only by GridView but will be accessed by SAM GridView will be the primary interface and the entry-point for Service Availability Visualization GridView will develop a common controller interface to integrate SAM portal with GridView

19 Planned Future Work Deployment of Web Services Based Transport for collection of Job Status data in GridView To design and implement common controller interface in order to integrate SAM portal with GridView To improve navigation in GridView Service Availability pages and across GridView and SAM portal components To provide Wiki pages for GridView Documentation/FAQs Exploration of ways by which we could collect data for Jobs submitted directly to CE (possibly from CE logs)

20 New Requirements Gridview is now widely used in WLCG/EGEE Many requests for new features have come from different user groups like –Site Admins –VO Admins –WLCG Management –Service Challenges –Monitoring Working Group –EGEE We are currently interacting with users, understanding/analysing/prioritizing new requirements

21 Requirements from WLCG Management To compute and visualize a few new metrics like Site Reliability, Scheduled Downtime etc To improve the Service Availability computation by taking into consideration some additional factors like –Scheduled Downtimes for sites –Occasional unavailability of SAM test results To provide PDF generation option in Gridview pages To automate the generation of “Site Reliability/ Availability Report” circulated to LCG Management Board

22 Requirements from WLCG Management To automate the Generation of LCG MB report about “Data Transfer Performance Targets achived by Tier 1 sites” in order to verify 2007 targets. To generate reports about data transfers from Tier1s to their assocaited Tier2s To Export Data in CSV (Comma Separated Values) or Excel format for all data transfer Graphs in Gridview Modifying Data Transfer GUI in order to enable selection of T2 sites as per their associated T1s and VOs

23 Requirements from Monitoring Working Group To explore transport mechanism that would be suitable for use by multiple tools, scalable, reliable and could be deployed Gridwide –We are asked to evaluate ActiveMQ, a Java Messaging System based product from Apache as a Transport Mechanism To provide standardized URL based access to GridView as decided by Monitoring Working Group To provide programmatic interface to GridView so that sites can access their relevant metrics

24 Requirements from VO Admins To compute and plot success rates for every individual SAM test (whether critical or not) aggregated by sites and also by duration as hourly, daily, weekly, monthly To display test results for VO specific tests

25 Other New Requirements To make all Gridview pages bookmarkable so that sites sould directly view their relevant pages (Site Admins) To develop gridview sensor for DCache SE (Service Challenge) To visualize SE statistics like the Used Space/ Free Space etc. To visualize some EGEE metrics for service Availability Visualization of FTS Statistics

26 Thank You Your comments and suggestions please


Download ppt "GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting."

Similar presentations


Ads by Google