CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.

Slides:



Advertisements
Similar presentations
ONE STOP THE TOTAL SERVICE SOLUTION FOR REMOTE DEVICE MANAGMENT.
Advertisements

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Performance and Exception Monitoring Project Tim Smith CERN/IT.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
ATLAS Dashboard Recent Developments Ricardo Rocha.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Julia Andreeva on behalf of the MND section MND review.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE Grid monitoring from the VO/User perspective. Dashboard for the LHC experiments Julia Andreeva CERN, IT/PSS.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
28 Nov 2007 Alessandro Di Girolamo 1 A “Hands On” overview of the ATLAS Distributed Data Management Disclaimer & Special Thanks Things are changing (of.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
ConTZole Tomáš Kubeš, 2010 atlas-tz-monitoring.cern.ch An Interactive ATLAS Tier-0 Monitoring.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
CERN - IT Department CH-1211 Genève 23 Switzerland t ASM and Oracle Service Availability Monitoring LCG 3D Workshop CERN, January 26 th,
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS Section input to GLM For GLM attended by Director for Computing.
CERN IT Department CH-1211 Genève 23 Switzerland t Load testing & benchmarks on Oracle RAC Romain Basset – IT PSS DP.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
HPDC Grid Monitoring Workshop June 25, 2007 Grid monitoring from the VO/user perspectives Shava Smallen.
Monitoring Evolution and IPv6
The ATLAS “DQ2 Accounting and Storage Usage Service”
ATLAS Use and Experience of FTS
Key Activities. MND sections
POW MND section.
FTS Monitoring Ricardo Rocha
Presentation transcript:

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf of the ARDA Dashboard team

CERN - IT Department CH-1211 Genève 23 Switzerland t CHEP2007,Victoria, Canada - 2 Outline Dashboard project ATLAS DDM system DDM Dashboard Monitoring for operators Monitoring for end users Conclusion

CERN - IT Department CH-1211 Genève 23 Switzerland t Dashboard Project Started inside the ARDA group of the EGEE/LCG project in 2005 –Initially covering only job monitoring for CMS Evolved into a python framework providing a set of flexible tools allowing coverage of other grid application areas The framework consists of a set of different components –Data access layer (DAO) –Service configuration (agents) –Web application –Command line tools –APIs Strong focus on allowing easy access to the information –HTTP query interface –Output in HTML (web interfaces), but also XML and CSV for integration with external tools Applications currently cover job monitoring (for all HEP experiments + VLEMED/Biomed), data management, site efficiency / reliability, and many others CHEP2007,Victoria, Canada - 3

CERN - IT Department CH-1211 Genève 23 Switzerland t ATLAS DDM System Distributing all the data in the ATLAS infrastructure Data is organized in datasets – collections of files Users issue subscriptions on these datasets Different agents take care of the several tasks required for the successful movement of the data CHEP2007,Victoria, Canada - 4

CERN - IT Department CH-1211 Genève 23 Switzerland t ATLAS DDM System Each set of agents serves one or a group of sites (typically related to each other - cloud) Initial deployment involved the setup of these services on the VO boxes at each Tier1 Debugging was extremely difficult (still is…) –Involved logging into each of the machines –And correlating this information… Accounting and monitoring virtually impossible A central point where all the information would be made available became vital CHEP2007,Victoria, Canada - 5

CERN - IT Department CH-1211 Genève 23 Switzerland t DDM Dashboard Main focus on ATLAS specific services (DQ2 system), receiving information from the different agents via HTTP callbacks –Transfer state changes –Dataset complete –Transfer complete –Transfer / registration errors But also on grid fabric services –Data management related services up and running –Storage space availability Data is put together in a structured way –Oracle database at CERN Different tools (agents) responsible for generating statistics and metrics CHEP2007,Victoria, Canada - 6

CERN - IT Department CH-1211 Genève 23 Switzerland t DDM Dashboard Serves different sets of use cases, coming from different types of users Site / system operators –“How is the overall system doing”? –“How is site X doing”? –“What is the most common error, and what is triggering it”? End users / production coordinators –“What is the status of this (set of) dataset subscription(s)”? –“When will the data become available”? –Essential to have real time information How much data? A lot! –Millions of file transfers, each reporting the different steps Average week means 2 million hits (90%+ bulk reports with up to 500 items) –Especially critical when systems misbehave (more errors) –Lot of work on partitioning the data, optimizing the database and the web server setup (Apache) CHEP2007,Victoria, Canada - 7

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for operators “How is the whole system performing”? CHEP2007,Victoria, Canada - 8

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for operators “What is wrong with site X”? CHEP2007,Victoria, Canada - 9

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for operators “What files are causing error ‘…’”? CHEP2007,Victoria, Canada - 10

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for operators “What files are causing error ‘…’”? CHEP2007,Victoria, Canada - 11

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for operators “What files are causing error ‘…’”? CHEP2007,Victoria, Canada - 12

CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring for end users “I subscribed to dataset X on site Y. What is the status”? CHEP2007,Victoria, Canada - 13

CERN - IT Department CH-1211 Genève 23 Switzerland t Conclusion Essential tool for all ATLAS operations Currently used by a large number of people, from site / system responsibles to end users Usage goes beyond the web interface –Data being queried by different external tools for automating operations (catalogs cleanup, consistency checks, alarms and notifications, …) What is coming next –Integration with the site specific monitoring tools –More alarms and notifications –Automated reaction to specific events –More focus on the end user CHEP2007,Victoria, Canada - 14

CERN - IT Department CH-1211 Genève 23 Switzerland t Tier 0 export Production Homepage Contact CHEP2007,Victoria, Canada - 15