FAX UPDATE 26 TH AUGUST 2013. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.

Slides:



Advertisements
Similar presentations
The Components There are three main components of inDepth Lite, inDepth and inDepth+ Real Time Component Reporting Package Configuration Tools.
Advertisements

FAX status. Overview Status of endpoints and redirectors Monitoring Failover Overflow.
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
15/07/2010Swiss WLCG Operations Meeting Summary of the last GridKA Cloud Meeting (07 July 2010) Marc Goulette (University of Geneva)
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
ATLAS federated xrootd monitoring requirements Rob Gardner July 26, 2012.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
Xrootd Monitoring for the CMS Experiment Abstract: During spring and summer 2011 CMS deployed Xrootd front- end servers on all US T1 and T2 sites. This.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
Efi.uchicago.edu ci.uchicago.edu FAX meeting intro and news Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Federated Xrootd.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
New perfSonar Dashboard Andy Lake, Tom Wlodek. What is the dashboard? I assume that everybody is familiar with the “old dashboard”:
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
1 The System Menu. 2 The System menu Dashboard Page displayed upon every login. It encompasses several boxes organised in two columns that provide a complete.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
DELETION SERVICE ISSUES ADC Development meeting
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
XROOTD AND FEDERATED STORAGE MONITORING CURRENT STATUS AND ISSUES A.Petrosyan, D.Oleynik, J.Andreeva Creating federated data stores for the LHC CC-IN2P3,
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Network integration with PanDA Artem Petrosyan PanDA UTA,
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons Maarten Litmaath On behalf of the WG participants GDB 09/09/2015.
EU privacy issue Ilija Vukotic 6 th October 2014.
XRootD Monitoring Report A.Beche D.Giordano. Outlines  Talk 1: XRootD Monitoring Dashboard  Context  Dataflow and deployment model  Database: storage.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
PanDA Configurator and Network Aware Brokerage Fernando Barreiro Megino, Kaushik De, Tadashi Maeno 14 March 2015, US ATLAS Distributed Facilities Meeting,
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
WLCG Transfers Dashboard
Main objectives To have by the end of August :
ATLAS Grid Information System
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
FDR readiness & testing plan
Analysis Operations Monitoring Requirements Stefano Belforte
Monitoring Of XRootD Federation
Presentation transcript:

FAX UPDATE 26 TH AUGUST 2013

Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation dCache monitor Collector Dashboard 50 shades of green Ilija Vukotic 2 CONTENT

RUNNING ISSUES Dead endpoints: Frascati, Manchester, LAL cmsd services are dead at: Taiwan-lcg2, LPSC, Protvino, SWT2_CPB /atlas/dq2/user/gangarbt lookups Made half of federation endpoints not accessible from upstream redirectors. will be more explained by Johannes. Remaining issues with x509 communicating our wish to get it turned on BU, DESY-HH, DESY-ZN, FZK, LRZ-LMU, MPPMU, Freiburg, Wuppertal, Geogrid Ilija Vukotic 3

RUNNING ISSUES Ilija Vukotic 4 Rather green considering it’s August ! Quite a bit of traffic considering it’s August ! New functional HC tests should not contribute much AFAIK

FAX FAILOVER FAX failover works Developments: Cloud is shown and corrected queue names Side menu In works: Filtering on user Graphing To ponder: Site admins are not aware of this possibility. How do we communicate to them that it is in their best interest to turn it on? Ilija Vukotic 5

FAX FAILOVER Ilija Vukotic 6 FAX dedicated submenu Will add here panda brokered job statistics FAX dedicated submenu Will add here panda brokered job statistics Production jobs failing over to FAX

MOVING TO NEW AMQ SERVER All FAX related info was sent to pilot.msg.cern.ch There was no authentication Moved to Dashboard test broker Consumer now uses STOMP+SSL Required change to new stomp version This week will move to production server Ilija Vukotic 7

INFORMING ON ENDPOINT STATUS Mailing from SSB works and gives results. Do we want SAM updates too? What would it take? Who would do it? Ilija Vukotic 8

MONITORING DEVELOPMENTS There is a need to remotely check if cmsd works. We had (and still have) sites showing as green for direct access and red for downstream redirection. Investigation shows that actually cmsd’s are dead/not responding. Need a way to directly probe cmsd’s Andy will look at the ways to do it. To develop new columns for SSB: xRootD version Rucio support Monitoring status Ilija Vukotic 9

MONITORING VALIDATION First step is validation that results shown by Matevz’s collector are correct. I was sending xrootd summary messages to collector and checking what I see in plots. While messages arrive and get shown, there is something wrong in calculating/plotting summaries. Ilija Vukotic 10

Ilija Vukotic 11

DCACHE MONITOR dCache monitor mostly rewritten: dCache compatible logging UDP messaging from same ports Sends “=” stream Sends more data (substitutes DN \CN with username etc.) Made compatible with collector Tested at MWT2. Very good results. End of the week, RPM will be produced and placed in WLCG repository. CMS will be informed about new version. Ilija Vukotic 12

COLLECTOR New version being prepared by Matevz New AMQ version BIG ISSUE: Some CMS sites are sending info to our collector. Will be raised with Brian B. Ilija Vukotic 13

DCACHE MONITOR Ilija Vukotic 14 Now gives really important and actionable information. Just during debugging I noticed: Files opened, read a small percentage and kept open for hours. Same file open twice in the same session (?!) Rather small usage of vector reads.

IN DASHBOARD Ilija Vukotic 15 Why difference between table and plots? What’s idea of “Site history” tab? Need to investigate why CMS sites appear here (CERN-CMSTEST)

PANDA RE-BROKERING Discussed at last CERN S&C week We agreed on providing an estimate of cost to move data in WAN to PANDA, so it could re-broker jobs from very long queues to sites with free slots that have good connection to data. Cost matrix exist in SSB. Code reading it from SSB doing exponential decay smoothing runs and sends info to AGIS. Have to check scalability of AGIS bulk update. Waiting for Artem to code moving data from AGIS to schedconfig. Next step is Tadashi making use of that table from schedconfig and actually re-broker. Finally we’ll have to monitor it the same way we do with Failover. Ilija Vukotic 16 No developments

50 SHADES OF GREEN Green color in any of the FAX SSB monitor metrics is based on one and the same file. This involves a lot of cached information. Need to find out a percentage of successfully obtained files from much large file pool while avoiding caching effects. Simple code developed to test all endpoints having FDR datasets. Doing _file0->ls() on each of the ~800 files. Sequential. Currently run by hand. You may find it in FAXtools/FAXtestsFDR of our CERN FAX git repo. Ilija Vukotic 17