EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE www.eu-egee.org Ricardo Rocha CERN (IT/GS) EGEE’08, 22-26 September 2008, Istanbul, TURKEY Experiment.

Slides:



Advertisements
Similar presentations
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Advertisements

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks R. Brunetti INFN-Torino The Italian Regional.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks VO-specific systems for the monitoring of.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
INFSO-RI Enabling Grids for E-sciencE 1 Downtime Process Author : Osman AIDEL Hélène Cordier.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Dashboard program of work Julia Andreeva GS Group meeting
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
ATLAS Production System Monitoring John Kennedy LMU München CHEP 07 Victoria BC 06/09/2007.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
ATLAS Dashboard Recent Developments Ricardo Rocha.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
INFSO-RI Enabling Grids for E-sciencE ATLAS DDM Operations - II Monitoring and Daily Tasks Jiří Chudoba ATLAS meeting, ,
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
Julia Andreeva on behalf of the MND section MND review.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
Conclusions on Monitoring CERN A. Read ADC Monitoring1.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Computational chemistry with ECCE on EGEE.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CharonGUI A Graphical Frontend on top of.
Kati Lassila-Perini EGEE User Support Workshop Outline: – CMS collaboration – User Support clients – User Support task definition – passive support:
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
The GridPP DIRAC project DIRAC for non-LHC communities.
ATLAS Distributed Computing ATLAS session WLCG pre-CHEP Workshop New York May 19-20, 2012 Alexei Klimentov Stephane Jezequel Ikuo Ueda For ATLAS Distributed.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Towards an Information System Product Team.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
Daniele Bonacorsi Andrea Sciabà
Key Activities. MND sections
POW MND section.
FTS Monitoring Ricardo Rocha
New monitoring applications in the dashboard
Experiment Dashboard overviw of the applications
Readiness of ATLAS Computing - A personal view
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment Dashboard for monitoring of the ATLAS computing activities

Enabling Grids for E-sciencE EGEE-III INFSO-RI Dashboards in ATLAS All applications built on top of the dashboard framework –Build and testing environment, persistent data access, messaging APIs, command line tools, agent management, plotting libraries, multiple output formats (CSV / XML / RSS / …) –Some of these packages have been taken in ATLAS for other uses (build, messaging APIs, cli tools, agent management) Some are generic Experiment Dashboards –As seen also in other experiments, with minor additions But others are very much ATLAS specific –Developed in close collaboration with ATLAS application providers

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS Specific Dashboards Features mainly driven by shifter’s needs –With many additional features filling other use cases (e.g. overview plots for managers, many historic summaries) Integration with ATLAS and GRID operations tools –Both as input (CIC portal, SAM, BDII, …) and output (GGUS, Savannah, e-Logs, …) Critical tools with extensive use in the ATLAS shifters effort (24/7) GGUS E-LOG SAVANNAH

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS DDM Dashboard Monitoring of data movement within clouds and individual sites –Clouds being groups of sites in the ATLAS experiment topology, not the computing clouds we’ve heard about this week Available Data –Topology: clouds, sites, services, storage space tokens –Dataset: content, location and completeness –File: transfer attempt history, location, details on storage (src/dest surl, checksum, …) –Statistics: throughput, efficiency, error summaries, avg transfer attempt number, dataset queued/completion time, …) DATASET CONTENTS FILE TRANSFER ATTEMPTS SERVICE ACCESS ERRORS … … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS DDM Dashboard Monitoring of data movement within clouds and individual sites –Clouds being groups of sites in the ATLAS experiment topology, not the computing clouds we’ve heard about yesterday Available Data –Topology: clouds, sites, services –Dataset: content, location and completeness –File: transfer attempt history, location, details on storage (src/dest surl, checksum, …) –Statistics: throughput, efficiency, error summaries, avg xs attempt number, dataset queued/completion time, …) DATASET CONTENTS FILE TRANSFER ATTEMPTS SERVICE ACCESS ERRORS … … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS DDM Dashboard Monitoring of data movement within clouds and individual sites –Clouds being groups of sites in the ATLAS experiment topology, not the computing clouds we’ve heard about yesterday Available Data –Topology: clouds, sites, services –Dataset: content, location and completeness –File: transfer attempt history, location, details on storage (src/dest surl, checksum, …) –Statistics: throughput, efficiency, error summaries, avg xs attempt number, dataset queued/completion time, …) DATASET CONTENTS FILE TRANSFER ATTEMPTS SERVICE ACCESS ERRORS … … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS DDM Dashboard Monitoring of data movement within clouds and individual sites –Clouds being groups of sites in the ATLAS experiment topology, not the computing clouds we’ve heard about yesterday Available Data –Topology: clouds, sites, services –Dataset: content, location and completeness –File: transfer attempt history, location, details on storage (src/dest surl, checksum, …) –Statistics: throughput, efficiency, error summaries, avg xs attempt number, dataset queued/completion time, …) DATASET CONTENTS FILE TRANSFER ATTEMPTS SERVICE ACCESS ERRORS … … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS Prodsys Dashboard Monitoring of production jobs in all ATLAS grids –Centralized repository for activity in EGEE, OSG and NDGF Data sources –More heterogeneous, multiplicity of systems required more work  Panda database, ProdDB database, ARC collection Available Data –Topology: clouds, sites, services, computing queues –Tasks: definition, contents, cloud assignment –Jobs: attempt history, definition details (application, dataset, …) –Statistics: progress, jobs run, grid and application execution errors summaries … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS Prodsys Dashboard Monitoring of production jobs in all ATLAS –Centralized repository for activity in EGEE, OSG and NDGF Data sources –More heterogeneous, multiplicity of systems required more work  Panda database, ProdDB database, ARC collection Available Data –Topology: clouds, sites, services, computing queues –Tasks: definition, contents, cloud assignment –Jobs: attempt history, definition details (application, dataset, …) –Statistics: progress, jobs run, grid and application execution errors summaries … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS Prodsys Dashboard Monitoring of production jobs in all ATLAS –Centralized repository for activity in EGEE, OSG and NDGF Data sources –More heterogeneous, multiplicity of systems required more work  Panda database, ProdDB database, ARC collection Available Data –Topology: clouds, sites, services, computing queues –Tasks: definition, contents, cloud assignment –Jobs: attempt history, definition details (application, dataset, …) –Statistics: progress, jobs run, grid and application execution errors summaries … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI ATLAS Prodsys Dashboard Monitoring of production jobs in all ATLAS –Centralized repository for activity in EGEE, OSG and NDGF Data sources –More heterogeneous, multiplicity of systems required more work  Panda database, ProdDB database, ARC collection Available Data –Topology: clouds, sites, services, computing queues –Tasks: definition, contents, cloud assignment –Jobs: attempt history, definition details (application, dataset, …) –Statistics: progress, jobs run, grid and application execution errors summaries … STATS GENERATION NOTIFICATIONS DOWNTIME RET. …

Enabling Grids for E-sciencE EGEE-III INFSO-RI Generic Dashboards JOB MONITORING Mostly analysis users PANDA jobs collected directly from their db GANGA jobs collected via the messaging API

Enabling Grids for E-sciencE EGEE-III INFSO-RI Generic Dashboards ACCOUNTING Developed by an ATLAS collaborator who joined the dashboard team Contribution now available to all experiments Data gathered via APEL and GRATIA

Enabling Grids for E-sciencE EGEE-III INFSO-RI Generic Dashboards Service Availability and Monitoring Initially effort targeting CMS requirements Later taken for use within ATLAS with some new requirements (e.g. different set of critical tests)

Enabling Grids for E-sciencE EGEE-III INFSO-RI Current focus / Ongoing work Shifter’s will continue to be our main clients –Making it easier to do shifts will mean more shifters –Constant improvements to the DDM and Prodsys Dashboards –Better integration of activity summary data Physicists are already becoming more active –New requirements will come regarding monitoring of individual user analysis jobs –Better usability of the interfaces –Better authentication / authorization Additional developments –Dashboard for Tier0 operations –AGIS (ATLAS Information System), using the same common software framework