GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.

Slides:



Advertisements
Similar presentations
Implementing High Availability
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
Automatic Report Generation for WLCG/EGEE D. D. Sonvane (Gridview Team) B.A.R.C.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
GDB March User-Level, VOMS Groups and Roles Dave Kant CCLRC, e-Science Centre.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Julia Andreeva on behalf of the MND section MND review.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
Geant4 is a toolkit to simulate the passage of particles through matter, and is widely used in HEP, in medical physics and for space applications. Ongoing.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
Computation of Service Availability Metrics in Gridview Digamber Sonvane, Rajesh Kalmady, Phool Chand, Kislay Bhatt, Kumar Vaibhav Computer Division, BARC,
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
GridView - Presentation of Work done at CERN by D. D. Sonvane B.A.R.C.
Simulation Production System Science Advisory Committee Meeting UW-Madison March 1 st -2 nd 2007 Juan Carlos Díaz Vélez.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
APEL Architecture Alison Packer. Overview Grid jobs accounting tool APEL Client software - installed in sites (CEs, gLite- APEL node) APEL Server accepts.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay.
Transition to EGI PSC-06 Istanbul Ioannis Liabotis Greece GRNET
POW MND section.
Evolution of SAM in an enhanced model for monitoring the WLCG grid
FTS Monitoring Ricardo Rocha
Savannah to Jira Migration
Cristina del Cano Novales STFC - RAL
Monitoring of the infrastructure from the VO perspective
Leigh Grundhoefer Indiana University
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Presentation transcript:

GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting

Gridview : New Developments (During 16 th September to 1 st March) New Developments have been made in the following areas –Transport Mechanism for Gridftp data – File Transfer Monitoring – Service Availability Monitoring – Job Monitoring –Version Management

Transport Mechanism for Gridftp Data Loss of tuples and instabilities in R-GMA severely affected data transfer rates displayed by GridView As a quick solution, Developed a new Archiver Module to –periodically copy Gridftp logs from CERN hosts –insert data directly in GridView Database This New module is in Production since last 3 months and there is absolutely no data loss

WS based Transport Mechanism Developed a Web Services based transport mechanism as an alternative to R-GMA for collection of data in GridView We deployed it earlier for collection of SAM data and it is working reliably Now we developed Web Services based solution for collection of Gridftp Data as well

Development of WS based Transport Development of WS based Transport for Gridftp data involved –Development of Server Module to Archive the Data –Development of WS Client Module to publish the data –Packaging of the client module as full fledged RPM to take care of upgrade, erase, static configuration, deployment on i386 as well as x86_64 systems

Deployment of WS based Transport Deployment of WS based Transport for Gridftp data involved –Deployment of WS Server Module to archive data on Validation DB –Testing of WS Client Module with dummy data –Deployment of WS Client initially over 4 Gridftp servers at CERN to publish live data –Setup of comparison scripts to validate the data received via WS transport with the original source –Large scale deployment of WS Client on all Gridftp servers at CERN (over 200+ servers) –Validation of data followed by series of bug fixes and enhancements –Finally Deployment of WS Server Module to archive data to production DB

WS based Transport : Current Status WS based transport for Gridftp is fully in production for CERN servers Direct copy based Archiver Module stopped now Gridview does not rely on R-GMA for Gridftp data from CERN servers Data from sites outside CERN is still received from R- GMA WS based solution should be integrated with gLite distribution for deployment at other sites WS based transport will also be deployed for collection of Job Status Data (We have to go thru similar Development and Deployment cycles)

File Transfer Monitoring Implemented Weekly and Monthly Reports for VO-wise data transfers (Hourly and Daily reports were implemented earlier) Implemented Weekly and Monthly Reports for Site-wise data transfers for following cases (missing earlier) –Transfers from All sites to All sites –Transfers from All Sites to a particular site

File Transfer Monitoring : VO-wise Weekly Report

Service Availability Monitoring Developed Graphs and Reports for presentation of SAM Test Results with various levels of details ranging from –Bar Graphs indicating status –Summary tables displaying result summaries –Detailed results displaying output of the tests (useful for troubleshooting purposes) Implemented Traceability from Service Availability Graphs to corresponding test results providing full transparency in Availability numbers generated.

Service Availability Monitoring : Site Detail Availability

Service Availability Monitoring : Bar Graphs for Test Results

Service Availability Monitoring : Test Summary Table

Job Monitoring Fixed a few bugs and made some changes on users’ feedback Added tool-tips for Job status Graphs Developed report for RB-wise classification of jobs lost from monitoring (due to records missing from R-GMA or other problems)

Version Management Implemented Version Management and Display in GridView Individual version numbers for the modules Overall version number for the project Modules are tagged in the CVS Stable versions are deployed to production instance

Participation in WLCG Monitoring Workgroup Participating in the WLCG Monitoring core Workgroup (member of core working group) Monitoring Workgroup is working on standardization of Grid Sensors, Transport, Repository/Schema, Visualization, Interfaces between monitoring tools/components etc. Implementing the recommendations from the workgroup in GridView

Ongoing Work : SAM/GridView Integration A few similar and some complementary components/features were present in SAM and GridView In order to maintain integrity of data and avoid duplication of work it was decided to integrate SAM and GridView We had a series of Meetings and Discussions and agreed upon the integration strategy with clearly defined roles of each

SAM/GridView Integration Strategy SAM and GridView will be complementary tools constituting an integral whole SAM and GridView Databases are tightly coupled sharing tables across each other SAM is using GOCDB and other related tables from GridView SAM will basically act as a test framework and its database will host test related tables and test results All derived metrics like Service/Site Availability, Downtimes, Reliability will be computed/maintained only by GridView but will be accessed by SAM GridView will be the primary interface and the entry-point for Service Availability Visualization GridView will develop a common controller interface to integrate SAM portal with GridView

Planned Future Work Deployment of Web Services Based Transport for collection of Job Status data in GridView To design and implement common controller interface in order to integrate SAM portal with GridView To improve navigation in GridView Service Availability pages and across GridView and SAM portal components To provide Wiki pages for GridView Documentation/FAQs Exploration of ways by which we could collect data for Jobs submitted directly to CE (possibly from CE logs)

New Requirements Gridview is now widely used in WLCG/EGEE Many requests for new features have come from different user groups like –Site Admins –VO Admins –WLCG Management –Service Challenges –Monitoring Working Group –EGEE We are currently interacting with users, understanding/analysing/prioritizing new requirements

Requirements from WLCG Management To compute and visualize a few new metrics like Site Reliability, Scheduled Downtime etc To improve the Service Availability computation by taking into consideration some additional factors like –Scheduled Downtimes for sites –Occasional unavailability of SAM test results To provide PDF generation option in Gridview pages To automate the generation of “Site Reliability/ Availability Report” circulated to LCG Management Board

Requirements from WLCG Management To automate the Generation of LCG MB report about “Data Transfer Performance Targets achived by Tier 1 sites” in order to verify 2007 targets. To generate reports about data transfers from Tier1s to their assocaited Tier2s To Export Data in CSV (Comma Separated Values) or Excel format for all data transfer Graphs in Gridview Modifying Data Transfer GUI in order to enable selection of T2 sites as per their associated T1s and VOs

Requirements from Monitoring Working Group To explore transport mechanism that would be suitable for use by multiple tools, scalable, reliable and could be deployed Gridwide –We are asked to evaluate ActiveMQ, a Java Messaging System based product from Apache as a Transport Mechanism To provide standardized URL based access to GridView as decided by Monitoring Working Group To provide programmatic interface to GridView so that sites can access their relevant metrics

Requirements from VO Admins To compute and plot success rates for every individual SAM test (whether critical or not) aggregated by sites and also by duration as hourly, daily, weekly, monthly To display test results for VO specific tests

Other New Requirements To make all Gridview pages bookmarkable so that sites sould directly view their relevant pages (Site Admins) To develop gridview sensor for DCache SE (Service Challenge) To visualize SE statistics like the Used Space/ Free Space etc. To visualize some EGEE metrics for service Availability Visualization of FTS Statistics

Thank You Your comments and suggestions please