Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services GS group meeting 07.03.08 Monitoring and Dashboards section Activity.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services GS group meeting 07.03.08 Monitoring and Dashboards section Activity."— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services GS group meeting 07.03.08 Monitoring and Dashboards section Activity Overview

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Monitoring and Dashboards section Julia Andreeva James Casey Catalin Cirstoiu Benjamin Gaidioz Anastasia Ivanchenko Gerhild Maier Andrey Nechaevskiy Daniel Rodrigues Ricardo Rocha Pablo Saiz Irina Sidorova Alexander Uzhinskiy Sergey Belov 5 staff, 4 project associates, 1 Openlab fellow funded by EDS, 1 PhD student, 1 technical student, 1 visitor (collaboration with Dubna) Members

3 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Main directions of work Dashboard project Covering 4 LHC experiments, various areas of activities and monitoring aspects: job monitoring, data management monitoring, monitoring of sites and services Architecture of monitoring solutions for WLCG Coordination of collaboration with GridView and monitoring activity in OSG CCRC08 monitoring

4 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Julia Coordinating the Dashboard project Managing and support of the Dashboard for CMS Contributing to the development of the CMS Dashboard applications - Job monitoring - Site availability based on the results of the SAM tests - CMS MC production monitoring Redesign of the dashboard job monitoring application (schema, collectors, UI) for support of the pilot jobs Chairing System Analysis Working Group Coordinating development of the common application for monitoring of the LHC experiments workflows for CCRC08 and beyond Section Leader

5 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services James Architecture of monitoring solutions for WLCG Focus on: –Site Monitoring with Nagios –ActiveMQ messaging system as transport layer –APIs/Protocols to present the information to other tools Manage the Gridview collaboration for CERN –With Rajesh Kalmady from BARC Collaborate with OSG on interoperation of monitoring for WLCG CCRC’08 –ServiceMap –MoU reporting

6 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Dashboard Framework –Development of the common components –Configuration and logging, Database access, Command line tools, Messaging and RPC APIs, Web application, Agent Startup and management, … Dashboard Build –Python oriented, based on distutils –Enforces common procedures on developers module structure, package naming and versioning, no need for direct cvs interaction for tagging or branching –Gives back automatic generation of binary tools (like CLIs), documentation, deliverables –Multiple release branches with RPMs and tarballs, APT and YUM repositories Support of both –Within the dashboard team –Within the ATLAS DDM team, which uses the Dashboard Build and several framework components: configuration/logging, messaging, agent configurator Ricardo

7 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services ATLAS DDM Dashboard –Monitoring of the ATLAS Distributed Data Management system –Single entry point to get an overview of dataset subscriptions, transfer throughput, transfer and registration errors, site services health –But also detailed information regarding individual transfer attempts Coordination of ATLAS Monitoring activities –Tools like the Dashboards: DDM, Production system, Job and Task monitoring, Panda –Integration with other components like SAM, software installations, file consistency checks, WLCG monitoring Ricardo

8 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Benjamin dashboard activities: –job monitoring: reimplementation two years ago, maintenance for ATLAS, LHCb and ALICE (with Pablo Saiz). installation guide (installed and maintained outside CERN by VleMED in NIKHEF). –ATLAS dashboard: production monitoring: assistance to shifters, CLI, API, user's guide, tutorials, etc. –framework: level: guru couple of code contributions, also developer's guide and tutorials.

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Pablo in the Dashboard One of the dashboards developers: –Grid reliability for the 4 LHC VOs Thanks to Eamonn!! –CMS Site status board –CMS Input Collections Monthly site efficiency report -Taken over from Massimo

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Pablo in ALICE Main developer of several AliEn components: –File & metadata catalogue –TaskQueue and JobAgent system –File Transfer Daemon Support for the previous components One of the ‘on-call grid experts’ (together with Patricia and Fabrizio)

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Pablo’s current activities Please, don’t hate me…. (please excuse me for not presenting this myself… )

12 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Daniel Rodrigues :: Openlab fellow funded by EDS. Objectives:  Validation and testing of a new grid messaging system for deployment within wlcg  Re-engineering of components within the WLCG Service to use the new messaging system Tasks Summary:  Testing of Apache ActiveMQ performance and features for usage as broker: https://twiki.cern.ch/twiki/bin/view/LCG/GridPublisherDevelopmenthttps://twiki.cern.ch/twiki/bin/view/LCG/GridPublisherDevelopment  Integrating existing components for using the messaging system: Gridview gridftp logs + Dashboard. https://twiki.cern.ch/twiki/bin/view/LCG/GridPublisherSpecificationGridView https://twiki.cern.ch/twiki/bin/view/LCG/GridPublisherSpecificationGridView  Writing best practices for using a messaging system within the WLCG context for other developers. Presentation: Openlab, January 08: http://dfrodrig.web.cern.ch/dfrodrig/AnOverviewOnAMessagingSystemForT heGrid_v1.2.pps Daniel

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Catalin Development of the monitoring system for ALICE based on Monalisa Monalisa related support for Dashboards (ATLAS and CMS), including installation and support of the Monalisa servers and repositories Work on PhD “Optimization Framework for Data Intensive Applications in Large Scale Distributed Systems”, framework for optimization of data transfers based on Monalisa Unfortunately, Catalin is leaving us soon. GOOD LUCK!

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Irina Activities 2007 Implemented central repository for CMS MC production monitoring information Set up procedures for aggregation of monitoring data in summary tables used by the UI Develop the API for data retrieval in the XML format

15 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Irina Current activities and plans Take part in the redesign of the Dashboard schema for job monitoring. The new schema should support the pilot job submission which is more and more used by the LHC experiments Take part in the development of data collection for the Dashboard job monitoring

16 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Andrey and Alexander Operations Procedures Daily log – tracking of current problems and open issues Weekly Report – summary report for the Joint Operations Meeting Weekly Tier-0 – summary of issues noticed on the Castor Tier-0 service

17 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Andrey & Alexander Summary  Operating activities  DashBoard installation  new FTS SLC4 pilot has been installed  right now we are working under development of the new schema or schema-patch(supposably it will be separate from the FTS schema and will be installing like module) with DB part for the monitoring prototype  Next plans: test new pilot strat implementation of our Monitoring Tools in DashBoard

18 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Gerhild PhD Work  Automatic Detection of Error Sources of Failed Grid Jobs  reported exit codes ≠ description of error source  first: distinction between user’s fault and site’s fault  f Dashboard Database  ?  Patterns  Rules  Report, Web Page, Alert System, … Data  Data Mining  Additional Knowledge  Representation User, Site, Exit Code,…  Looking at the data,…  All jobs of user X fail,…  List of observations

19 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Gerhild Dashboard Work  Web Interfaces in production for CMS: Daily Job Summary: information about jobs visualized with plots −terminated, submitted, pending, running jobs −status of terminated jobs −failed jobs by reason (grid errors, application errors) −status of site load −parallel running jobs Task Monitoring: detailed information about a user’s tasks (also in production for ATLAS) SAM Test Result Visualization −latest test results, historical test results −site and service availability plots −test history  future work: maintenance, improvements

20 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Anastasia Current tasks: ● implement job summary plots using GraphTools library ● implement Dashboard home page Future tasks: ● take part in the development of the common system for monitoring of the workflows for the LHC experiments

21 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Sergey Job monitoring for Condor Goal is to obtain extra information about job submitted via Condor-G, even before job start Extended job information from Condor event log – The event of interest is job status change – It’s possible to get user-specified attributes from ClassAd Tool runs as a job on Condor submission host Data is prepared to be in accordance with other Dashboard job information Job information is sending to collection server (using messaging system or MonALISA)‏ Current task is to finalize the developments and to provide the tool for real tests on production sites


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services GS group meeting 07.03.08 Monitoring and Dashboards section Activity."

Similar presentations


Ads by Google