FAX UPDATE 1 ST JULY 2013. Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.

Slides:



Advertisements
Similar presentations
Internet Contracting Estimating and Accounting System ICEAS Multi-Client Software Presented By: I C E A S.
Advertisements

Buffered Data Processing Procedure Version of Comments MG / CCSDS Fall Meeting 2012 Recap on Previous Discussions Queue overflow processing.
FAX status. Overview Status of endpoints and redirectors Monitoring Failover Overflow.
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Instant Queue Manager Version 4 Enterprise Click to Chat For Lotus Sametime.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
ATLAS federated xrootd monitoring requirements Rob Gardner July 26, 2012.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
DECREASING LOST TO FOLLOW-UP : AN INFORMATICS ASSISTED APPROACH (AND HOW TO BUILD A REPORTING APP) Daniel Ochieng Jonathan Dick.
PanDA Monitor Development ATLAS S&C Workshop by V.Fine (BNL)
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Efi.uchicago.edu ci.uchicago.edu Status of the FAX federation Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 /
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
ATLAS Dashboard Recent Developments Ricardo Rocha.
Project Chartering & Approval Process
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
Conclusions on Monitoring CERN A. Read ADC Monitoring1.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Ops Portal New Requirements.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
1 SUZAKU HUG 12-13April, 2006 Suzaku archive Lorella Angelini/HEASARC.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Protocol Deviations. MTN protocol deviation policy  MTN has recently revised their policy on PDs- this policy will be made available on the MTN website.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
JUCMNav Milestone Five Quality Assurance Presentation.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Shifters Jamboree Kaushik De ADC Jamboree, CERN December 4, 2014.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Network integration with PanDA Artem Petrosyan PanDA UTA,
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
EU privacy issue Ilija Vukotic 6 th October 2014.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
PanDA Configurator and Network Aware Brokerage Fernando Barreiro Megino, Kaushik De, Tadashi Maeno 14 March 2015, US ATLAS Distributed Facilities Meeting,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
DPM in FAX (ATLAS Federation) Wahid Bhimji University of Edinburgh As well as others in the UK, IT and Elsewhere.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
1 DEPLOYMENT AND OPERATIONS MODULE 23 ECM SPECIALIST COURSE 1 Copyright AIIM.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Enabling Grids for E-sciencE EGEE-II INFSO-RI ROC managers meeting at EGEE 2007 conference, Budapest, October 1, 2007 Admin Matters Vera Hanser.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Update on CERN IT Unified Monitoring Architecture (UMA)
ATLAS Grid Information System
Future of WAN Access in ATLAS
New monitoring applications in the dashboard
Data Federation with Xrootd Wei Yang US ATLAS Computing Facility meeting Southern Methodist University, Oct 11-12, 2011.
1 VO User Team Alarm Total ALICE ATLAS CMS
FDR readiness & testing plan
Plan and design the solution
Registering a systematic review on PROSPERO
Presentation transcript:

FAX UPDATE 1 ST JULY 2013

Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue with redirection to CERN Recent deployments Ilija Vukotic 2

FAILOVER TO FAX Recall the first FAX use is failover to FAX in case pilot could not stage input files. Turned on for all of USA and two UK sites. Documentation on how to enable a queue for failover: No problems reported reported as yet. Does it work? YES IT DOES! Ilija Vukotic 3

FAX FAILOVER OBSERVATIONS In one of the first days it saved 1K+ jobs from failing. Most on OU_OCHEP_SWT2 (700+), SWT2_CPM (100+) and the rest distributed all around. Sometimes cases of 5-10 files recovered. Information on these jobs can be obtained by constructing a URL like this: 9]{1}*&jobparam=JOBMETRICS&jobStatus=finished&hours=&tstart= :00:00&tend= :00 9]{1}*&jobparam=JOBMETRICS&jobStatus=finished&hours=&tstart= :00:00&tend= :00 However this is (naturally) quite slow. We may also want daily summary statistics to assess operational performance Ilija Vukotic 4

FAX FAILOVER DEVELOPMENT Need better way to monitor how many jobs were saved, how many files, how many still failed, what’s amount of data served by FAX. After discussions with Valeri, Torre, decision is to send info to panda logger Will require pilot modifications to send failover records Also a python plugin to create web pages that we want Open questions: What process should we follow for switching on a site? Note: any Panda production queue can be enabled When the pilot comes supports Rucio file names, will the fallback mechanism still work? Ilija Vukotic 5

STEPS TOWARD AUTOMATED OPERATIONAL NOTIFICATION Moving towards production operation sites and ADC shifters will need well defined procedures and awareness of potential problems with endpoints Some failures are obvious, others will require intervention by experts Perhaps start with SSB “Direct” and “Upstream” tests ssb.cern.ch/dashboard/request.py/siteview#currentView=FAX+e ndpoints&fullscreen=true&highlight=false Align with existing site (cloud) notification channels Ilija Vukotic 6

PANDA RE-BROKERING One of the original use-cases, discussed again at last CERN S&C week Idea is to re-broker jobs to sites with free CPUs / short queues provided transfer or direct access read “cost” is reasonable FAX team responsible for providing an estimate of cost to move data across the WAN to PANDA Cost matrix exists in SSB, ready for AGIS integration Final step is Tadashi making use of that table from AGIS to actually re-broker Ilija Vukotic 7

ISSUE WITH REDIRECTION TO CERN High level symptom is that downstream redirection to CERN endpoint often fails: xrdcp -f -d 1 root://atlas-xrd- eu.cern.ch:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.cern- prod/user.HironoriIto.xrootd.cern-prod-1M i.e. the client gets redirected into an endless loop between lpsc-se- dpm-server.in2p3.fr and atlas-xrd-fr.cern.ch Note IN2P3-LPSC has a deployed xrootd door, but not in AGIS This might be the cause Reported to FR cloud contact, awaiting clarification Ilija Vukotic 8

RECENT DEPLOYMENTS PIC is in and validated IT Cloud – is the site at Bologna coming online? Will need to revisit at some point the strategy for Asian sites including Australia Ilija Vukotic 9