FAX status. Overview Status of endpoints and redirectors Monitoring Failover Overflow.

Slides:



Advertisements
Similar presentations
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Advertisements

CVMFS AT TIER2S Sarah Williams Indiana University.
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
AMOD Report Doug Benjamin Duke University. Running Jobs last 7 days 120K MC sim Users MC Rec Group.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
OSG Area Coordinator’s Report: Workload Management April 20 th, 2011 Maxim Potekhin BNL
FAX Redirection Topology Wei Yang 7/16/121. Redirector hardware at CERN Redundant redirectors for EU, UK, DE, FR – Redundant (the “+” sign below) VMs.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Tracking your tasks with Task Monitoring PAT eLearning – Module 11 Edward.
MW Readiness Verification Status Andrea Manzi IT/SDC 21/01/ /01/15 2.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Performance Tests of DPM Sites for CMS AAA Federica Fanzago on behalf of the AAA team.
Network awareness and network as a resource (and its integration with WMS) Artem Petrosyan (University of Texas at Arlington) BigPanDA Workshop, CERN,
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Grid and Cloud Computing Alessandro Usai SWITCH Sergio Maffioletti Grid Computing Competence Centre - UZH/GC3
CERN IT Department CH-1211 Genève 23 Switzerland PES 1 Ermis service for DNS Load Balancer configuration HEPiX Fall 2014 Aris Angelogiannopoulos,
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
PanDA Status Report Kaushik De Univ. of Texas at Arlington ANSE Meeting, Nashville May 13, 2014.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
AliEn central services Costin Grigoras. Hardware overview  27 machines  Mix of SLC4, SLC5, Ubuntu 8.04, 8.10, 9.04  100 cores  20 KVA UPSs  2 * 1Gbps.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
WLCG Service Report ~~~ WLCG Management Board, 23 rd November
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
ATLAS Off-Grid sites (Tier-3) monitoring A. Petrosyan on behalf of the ATLAS collaboration GRID’2012, , JINR, Dubna.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
EU privacy issue Ilija Vukotic 6 th October 2014.
Division Brought to you by powerpointpros.com. Lesson Menu Click on the links below to start with a specific topic. What is Division? Using Division Practice.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
PanDA Configurator and Network Aware Brokerage Fernando Barreiro Megino, Kaushik De, Tadashi Maeno 14 March 2015, US ATLAS Distributed Facilities Meeting,
Data Distribution Performance Hironori Ito Brookhaven National Laboratory.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
Best 20 jobs jobs sites.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Valencia Cluster status Valencia Cluster status —— Gang Qin Nov
WLCG IPv6 deployment strategy
Experience of Lustre at QMUL
Computing Operations Roadmap
Experience of Lustre at a Tier-2 site
1 VO User Team Alarm Total ALICE ATLAS CMS
Update from the HEPiX IPv6 WG
FDR readiness & testing plan
Brookhaven National Laboratory Storage service Group Hironori Ito
AliEn central services (structure and operation)
Monitoring Of XRootD Federation
Presentation transcript:

FAX status

Overview Status of endpoints and redirectors Monitoring Failover Overflow

Endpoints Status on Sat. 15 Nov. Got one more site: RO-07-NIPNE Problems: We work on CSCS Not working at all: Nikhef Flip-flopping: FZK-LCG2 and NDGF-T1

Direct access Expired cert Wrong config Test jobs were unable to get proxy

Upstream redirection

Downstream redirection Redirectors moved to AI machines

Moving redirectors Herve had to move all the EU redirectors to the Agile Infrastructure. Simultaneously upgraded to xrootd Started with DE redirector. Had to re-implement access rules. Continued with two redirectors per day. But old machines got re-introduced, confused everybody. A new set of changes being applied right now. Now situation clear, but sites need to restart their services as IP’s changed.

Monitoring Machine receiving info from AMQ and giving it to SSB etc. had to move to Agile Infrastructure. Took much more time then expected but it’s done now. EU sites were moving to sending monitoring data to CERN. Current state may be seen here (thanks to Igor Pelevanyuk): xrootd-comp.cern.ch/cosmic/ATLASmigrationMonitoring/ xrootd-comp.cern.ch/cosmic/ATLASmigrationMonitoring/ Still a lot of effort needed to make summary and detailed monitoring match: Started deeper analysis of Panda job info data transported into Hadoop at CERN. Further improvements in FSB

Cost matrix

Overflow Slowly expanding: BNL still missing, even the reverse proxy hardware is there. ANALY_AGLT2_SL6ANALY_INFN-T1 ANALY_CONNECTANALY_IN2P3-CC ANALY_BU_ATLASANALY_MPPMU ANALY_MWT2_SL6ANALY_DESY-HH ANALY_OU_OCHEPANALY_QMUL_SL6 ANALY_SLAC ANALY_SFU Can’t use data from the rest of EU cloud

Snakey overflow plots - success

Snakey overflow plots - failures

Overflow - workload

Overflow – workload

Overflow – job efficiency

Overflow – CPU efficiency

Reactions Up to now only two sites noticed the overflows: – TRIUMF – Jedi sent a lot of jobs to almost all US cloud sites, all reading from TRIUMF. Saturated their proxy (1Gb/s). They since made it 2 Gb/s. – QMUL – Chris Walker noticed 5Gbps+ at their NAT gateway, ~10TB/day. Not a problem for now.

Failover Jobs per 4 hours