INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.

Slides:



Advertisements
Similar presentations
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
Advertisements

Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Framework for Automated Builds Natalia Ratnikova CHEP’03.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
ATLAS Dashboard Recent Developments Ricardo Rocha.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
Julia Andreeva on behalf of the MND section MND review.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite configuration (plans) Robert Harakaly.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
INFSO-RI Enabling Grids for E-sciencE DGAS, current status & plans Andrea Guarise EGEE JRA1 All Hands Meeting Plzen July 11th, 2006.
WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Towards an Information System Product Team.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
Daniele Bonacorsi Andrea Sciabà
Key Activities. MND sections
POW MND section.
FTS Monitoring Ricardo Rocha
New monitoring applications in the dashboard
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 2 Outline Background Dashboard Framework VO Monitoring Applications –Job Monitoring –Site Monitoring / Efficiency –Data Management Monitoring –Additional Applications Conclusion and Future Work

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 3 Background Started in 2005 inside the EGEE/ARDA group First application: Grid Job / Application Monitoring for the CMS experiment –implementation in PHP / Python Redesign early 2006 –fully python based solution –more modular / structured approach –easily extensible Additional application areas: data management (ATLAS DDM), site efficiency monitoring, …

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 4 Dashboard Framework Dashboard Clients –Scripts: pycurl, … –Command line tools (optparser + pycurl) –Shell based: curl, … Web Application –Apache + mod_python –Model View Controller (MVC) pattern –multiple output formats: plain text, CSV, XML, XHTML –GSI support using gridsite‏ Agents –collectors: RGMA, ICXML, BDII, … –stats generation, alert managers, … –Service Configurator pattern –common configuration (XML file) and management: stop, start, status, list –common monitoring mechanism Data Access Layer (DAO) –interfaces available to different backends (Oracle and PostgreSQL mainly, easy to add additional ones)‏ –connection pooling Web / HTTP Interface Data Access Layer (DAO) Agents

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 5 Dashboard Framework Build and development environment –based on python distutils (with several extensions)‏ –covers code validation, binaries and documentation generation, unit testing and reports –automatic build for each of the release branches –packaging uses RPMs – APT repository available Release procedure –three main branches: nightly, unstable, stable –releases per component –enforced versioning scheme (no manual versioning or tagging, all done via distutils command extensions)‏ Interesting links –Developers guide:  –Savannah Project 

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 6 Job Monitoring Real time and summary views over the virtual organization (VO) grid jobs Several instances in production serving different communities: CMS, ATLAS, LHCb, Alice, VLMED Various grid information sources used: –RGMA –GridPP XML files collection –LCG BDII Value added: VO specific information –through job instrumentation. Using Monalisa's ApMon (CMS), Panda and Ganga monitoring (ATLAS)‏ –directly querying VO databases (ex: ATLAS production database)‏ Key Features: –sensible merging of information from different sources –advanced filtering for different usages (VO manager, site admin, community user)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 7 Job Monitoring Real time and summary views over the virtual organization (VO) grid jobs Several instances in production serving different communities: CMS, ATLAS, LHCb, Alice, VLMED Various grid information sources used: –RGMA –GridPP XML files collection –LCG BDII Additional VO specific information –through job instrumentation. Using Monalisa's ApMon (CMS), Panda and Ganga monitoring (ATLAS)‏ –directly querying VO databases (ex: ATLAS production database)‏ Key Features: –sensible merging of information from different sources –advanced filtering for different usages (VO manager, site admin, community user)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 8 Job Monitoring Task Monitoring –deployed and used in CMS Integration with SAM tests –already using the new LCG standards –prototype in place Alert mechanism –in development HTTP API for publishing job information –very easy to integrate with existing tools –similar to the mechanism used for data management

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 9 Grid / Site Efficiency Built on top of the job monitoring data Main goal: identify reasons for job failures in sites Uses the information coming from RGMA Available today for the same set of communities: ATLAS, CMS, LHCb, Alice, VLMED Provides both summary and detailed information Current ongoing work –provide generic (non VO) specific view over the data

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 10 Grid / Site Efficiency Built on top of the job monitoring data Main goal: identify reasons for job failures in sites Uses the information coming from RGMA Available today for the same set of communities: ATLAS, CMS, LHCb, Alice, VLMED Provides both summary and detailed information Current ongoing work –provide generic (non VO) specific view over the data

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 11 Grid / Site Efficiency Built on top of the job monitoring data Main goal: identify reasons for job failures in sites Uses the information coming from RGMA Available today for the same set of communities: ATLAS, CMS, LHCb, Alice, VLMED Provides both summary and detailed information Current ongoing work –provide generic (non VO) specific view over the data

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 12 Data Management Tied to the ATLAS Distributed Data Management (DDM) system Used successfully both in the production and Tier0 test environments Data sources: –DDM site services: the main source, providing all the transfer and placement information –SAM tests: for correlation of DDM results with the state of the grid fabric services –Storage space availability: from BDII but soon including other available tools Views over the data: –Global: site overview covering different metrics (throughput, files / datasets completed,...); summary of the most common errors (transfer and placement)‏ –Detailed: starting from the dataset state, to the state of each of its files, to the history of each single file placement (all state changes)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 13 Data Management Tied to the ATLAS Distributed Data Management (DDM) system Used successfully both in the production and Tier0 test environments Data sources: –DDM site services: the main source, providing all the transfer and placement information –SAM tests: for correlation of DDM results with the state of the grid fabric services –Storage space availability: from BDII but soon including other available tools Views over the data: –Global: site overview covering different metrics (throughput, files / datasets completed,...); summary of the most common errors (transfer and placement)‏ –Detailed: starting from the dataset state, to the state of each of its files, to the history of each single file placement (all state changes)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 14 Data Management Tied to the ATLAS Distributed Data Management (DDM) system Used successfully both in the production and Tier0 test environments Data sources: –DDM site services: the main source, providing all the transfer and placement information –SAM tests: for correlation of DDM results with the state of the grid fabric services –Storage space availability: from BDII but soon including other available tools Views over the data: –Global: site overview covering different metrics (throughput, files / datasets completed,...); summary of the most common errors (transfer and placement)‏ –Detailed: starting from the dataset state, to the state of each of its files, to the history of each single file placement (all state changes)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 15 Data Management Other features –periodic site behavior reports (sent by )‏ –alerts (on specific errors, when a site goes below a certain threshold,...)‏ Coming soon –user specific views (authentication via X509 certificates) – “my datasets” –better site summary data: overview of dataset / file states in the site (radar plots), average time in each placement step, additional error summaries –python query API module –python publish API module (open the tool to other applications / communities)‏

Enabling Grids for E-sciencE INFSO-RI To change: View -> Header and Footer 16 Conclusion The Dashboard monitors the grid from the point of view of its communities –and focuses on the different user's interests (managers, admins, end users)‏ Grid information is not enough (additional VO information is invaluable)‏ Framework –Flexible and stable: proven by the variety of applications available in production –Effort put into install / packaging paid off: first external installation has already been done (VLMED)‏ Future work –integration with local monitoring systems (feed summaries back to the site admins)‏ –improved alert system –adapt to recently defined data exchange / query standards