XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.

Slides:



Advertisements
Similar presentations
Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Elasticsearch in Dashboard Data Management Applications David Tuckett IT/SDC 30 August 2013 (Appendix 11 November 2013)
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
Enabling Grids for E-sciencE Overview of System Analysis Working Group Julia Andreeva CERN, WLCG Collaboration Workshop, Monitoring BOF session 23 January.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
Xrootd Monitoring and Control Harsh Arora CERN. Setting Up Service  Monalisa Service  Monalisa Repository  Test Xrootd Server  ApMon Module.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Julia Andreeva on behalf of the MND section MND review.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
DDM FAX Dashboard status and future Luca Magnoni IT/SDC 2 nd June 2014.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
Import XRootD monitoring data from MonALISA Sergey Belov, JINR, Dubna DNG section meeting,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Dashboard Latency monitoring (Update) Alexandre Beche.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
03/09/2007http://pcalimonitor.cern.ch/1 Monitoring in ALICE Costin Grigoras 03/09/2007 WLCG Meeting, CHEP.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
WLCG Transfers Dashboard A unified monitoring tool for heterogeneous data transfers. Alexandre Beche.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
Streaming Analytics with Spark 1 Magnoni Luca IT-CM-MM 09/02/16EBI - CERN meeting.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
WLCG Operations Coordination report Maria Dimou Andrea Sciabà IT/SDC On behalf of the WLCG Operations Coordination team GDB 12 th November 2014.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
WLCG Transfers monitoring EGI Technical Forum Madrid, 17 September 2013 Pablo Saiz on behalf of the Dashboard Team CERN IT/SDC.
IT Monitoring Service Status and Progress 1 Alberto AIMAR, IT-CM-MM.
Federating Data in the ALICE Experiment
Daniele Bonacorsi Andrea Sciabà
WLCG Transfers Dashboard
Monitoring Evolution and IPv6
Update on CERN IT Unified Monitoring Architecture (UMA)
WP18, High-speed data recording Krzysztof Wrona, European XFEL
ALICE Monitoring
POW MND section.
FTS Monitoring Ricardo Rocha
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
A Messaging Infrastructure for WLCG
Monitoring Of XRootD Federation
Ákos Frohner EGEE'08 September 2008
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

XRootD Monitoring Report A.Beche D.Giordano

Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 A.Beche – Federated Workshop 2

XRootD federation monitoring  Activity started during summer 2012  4 sites for FAX, 11 for AAA Monitoring data increased accordingly July 2012March 2014 AAA606k43M FAX15k22M 10 – April - 14 A.Beche – Federated Workshop 3

Why monitoring?  Understand data flows to estimate data traffic  Provide information for efficient operations  Identify access patterns and propose data placement strategies 10 – April - 14 A.Beche – Federated Workshop 4

Raw Stats Raw Stats 10 minutes XRootD monitoring dataflow Federation GLED Collector Consumer WEB API WEB API Dashboard UI Dashboard UI External applications External applications real time UDP stomp asynchronous ActiveMQ 10 – April - 14 A.Beche – Federated Workshop 5

GLED Deployment model CERN Shared cluster 5 nodes CERN Shared cluster 5 nodes AAA UCSD (16Hz) AAA UCSD (16Hz) EOS CERN (10Hz) EOS CERN (10Hz) EOS CERN (150Hz) EOS CERN (150Hz) FAX US SLAC (9Hz) FAX US SLAC (9Hz) FAX EU CERN (1 site) FAX EU CERN (1 site) 10 – April - 14 A.Beche – Federated Workshop 6

Consolidated dataflow  Two usage of these raw data:  Dashboard monitoring  XRootD popularity  Now share the same database:  Storage optimization  Consistency guaranteed 10 – April - 14 A.Beche – Federated Workshop 7

AAA ~300 GB ~1B records AAA ~300 GB ~1B records Database FAX ~600 GB ~2B records FAX ~600 GB ~2B records Daily insert 2 GB / 6M rows Daily insert 2 GB / 6M rows  Storage  Raw, statistics, metadata  Tables daily partitioned, no global indexes * Indexes excluded 10 – April - 14 A.Beche – Federated Workshop 8

Database  Raw data aggregation:  Done using PL/SQL procedures  Events are unordered  Stateless: Full re-computation of touched bins each time  Compute stats from raw data in 10 min bins  Aggregate 10 min stats in daily bins 10 – April - 14 A.Beche – Federated Workshop 9

Aggregation methods 2pm3pm4pm5pm6pm7pm Transfers Easy method Transfers10021 Bytes – April - 14 A.Beche – Federated Workshop 10

Aggregation methods 2pm3pm4pm5pm6pm7pm Transfers10021 Bytes Transfers1 (1) 1 (0) 2 (0) 3 (2) 1 (1) Bytes8114 (9+6) 15 (1+9+5) 5 Easy method Transfers Adopted method 10 – April - 14 A.Beche – Federated Workshop 11

Visualization Interface 10 – April - 14 A.Beche – Federated Workshop 12

Pre-defined set of views 10 – April - 14 A.Beche – Federated Workshop 13

Use case example Understand site access patterns 1.Which sites are reading from FNAL 2.Zoom to a specific site to understand which users are reading 3.Understand which files are read by a user – April - 14 A.Beche – Federated Workshop 14

Data popularity  XRootD monitoring provides information about file access patterns:  Including non official collections (ie: user files)  Contribute to simplify and make more efficient the usage of disk resources  Popularity data analytics built on this information:  Adopted already for CMS-EOS  will be extended to full AAA 10 – April - 14 A.Beche – Federated Workshop 15

Archive recommendation for CMS-EOS  Help to manage the disk space of EOS including user space  No central bookkeeping system  Unused files: created > 4 months ago, no access in the last 3 months:  ~500 TB of space occupied and not used 30% of total for these areas 10 – April - 14 A.Beche – Federated Workshop 16 % TB

Open issues  Missing servers:  Dcache sites  Server should provide their site name.  CMS: only 5 sites:  anon, BUDAPEST, Hephy-Vienna, T2_US_USCD, UKI-LT2-Brunel  Not coherent convention naming  ATLAS: GLED RPM to be deployed  GLED Collector improvements:  Reliability of the service:  Recover time, can be long due to time difference  GLED should be operated as a production service  Scalability:  to be fixed with automatic reconnection soon 10 – April - 14 A.Beche – Federated Workshop 17

Future work  Strong requirement from ATLAS to understand efficiency:  Need the concept of error / failure  How XRootD server could be instrumented to report it?  European GLED collector is up and running:  Only 1 pilot site is reporting to it (CNAF)  Should we keep it?  Data mining activity (not started yet):  Almost 2 years of raw data (1TB) 10 – April - 14 A.Beche – Federated Workshop 18

Data Mining  Extract further knowledge from the data…  Detect inefficiencies  Propose deletion strategies  Define data placement  … by  Understand access patterns and data usage  Correlate data traffic and data access performance  Possibility to automate some operations 10 – April - 14 A.Beche – Federated Workshop 19

Application usage FAX AAA 10 – April - 14 A.Beche – Federated Workshop 20

Summary  Monitoring federations is a challenge  High rate of traffic & information  Challenge met by data aggregation, scalable technologies  Dashboard is not actively used  Less than 10 daily users (FAX), less than 15 (AAA)  Is there any missing functionalities?  Improvement work is ongoing  New requests are coming  XRootD monitoring is a one piece of the entire Data transfers puzzle  See next talk 10 – April - 14 A.Beche – Federated Workshop 21

Beyond XRootD monitoring A.Beche D.Giordano

Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage & aggregation  User interface & use cases  Open issues & future work  Summary  Talk 2: Beyond XRootD monitoring  HTTP/WebDAV integration  Integration in the WLCG Transfers Dashboard 10 – April - 14 A.Beche – Federated Workshop 23

HTTP Federation is coming  HTTP protocol will be used in the future  XRootD servers can be accessed  See Fabrizio’s presentation on xrdhttp  Two kind of accesses:  Pure HTTP access (through Apache)  HTTP gate to XRootD server  Can’t be monitor in the same way 10 – April - 14 A.Beche – Federated Workshop 24

Monitoring XRootD access protocol  XRootD 4 will now reports the user protocol:  All the monitoring chain needs to be updated  Dashboard DB and UI are fully ready HTTP XRootD 10 – April - 14 A.Beche – Federated Workshop 25

Site GLED collector GLED collector ActiveMQ JOB XRootD Federation XRootD Site SE HTTP/WebDAV federation monitoring 10 – April - 14 A.Beche – Federated Workshop 26

Site GLED collector GLED collector ActiveMQ JOB XRootD Federation XRootD Site SE HTTP Federation Site HTTP/WebDAV federation monitoring 10 – April - 14 A.Beche – Federated Workshop 27

Site 28 GLED collector GLED collector ActiveMQ JOB XRootD Federation HTTP Federation XRootD Xrd HTTP Site SE 29 November 2013 Alexandre Beche - ITTF HTTP/WebDAV federation monitoring

Site GLED collector GLED collector ActiveMQ JOB XRootD Federation HTTP Federation XRootD Xrd HTTP Apache Site SE HTTP/WebDAV federation monitoring 10 – April - 14 A.Beche – Federated Workshop 29

Site GLED collector GLED collector ActiveMQ JOB XRootD Federation HTTP Federation XRootD Xrd HTTP Apache Site SE ? HTTP/WebDAV federation monitoring 10 – April - 14 A.Beche – Federated Workshop 30

How to compare data from different applications? 10 – April - 14 A.Beche – Federated Workshop 31

data transfers & accesses monitoring tools WEB API / UI WEB API / UI WEB API/UI WEB API/UI WEB API/UI WEB API/UI WLCG FAX AAA FAXEOSAAAEOSFTS 10 – April - 14 A.Beche – Federated Workshop 32

WLCG Transfers Dashboard federated approach WEB API / UI WEB API / UI WEB API/UI WEB API/UI WEB API/UI WEB API/UI FTS FAX AAA FAXEOSAAAEOS FTS WLCG Transfers Dashboard API / UI WLCG Transfers Dashboard API / UI 10 – April - 14 A.Beche – Federated Workshop 33

Some plots 10 – April - 14 A.Beche – Federated Workshop 34 FTS XRootD ALTAS CMS LHCb ALICE

Summary  Lots of effort has been put in XRootD monitoring workflow and dashboard in the last 2 years  Reliable system achieved  Lots of use cases covered  HTTP Monitoring already started  Will require a lot of effort to reach XRootD monitoring level  New WLCG Transfers Dashboard architecture  Highly extensible system  Cross-VO or cross-technology analysis 10 – April - 14 A.Beche – Federated Workshop 35

Credits  Andreeva Julia  Cons Lionel  Giordano Domenico  Saiz Pablo  Tadel Matevz  Tuckett David  Vukotic Ilija  The AAA and FAX deployment team  …. 10 – April - 14 A.Beche – Federated Workshop 36

Useful links  AAA Dashboard   FAX Dashboard:   CHEP materials   s&confId= s&confId=  es&confId= es&confId=  Xbrowse framework:  Thanks for your attention 10 – April - 14 A.Beche – Federated Workshop 37