Efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration.

Slides:



Advertisements
Similar presentations
Current Testbed : 100 GE 2 sites (NERSC, ANL) with 3 nodes each. Each node with 4 x 10 GE NICs Measure various overheads from protocols and file sizes.
Advertisements

Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Efi.uchicago.edu ci.uchicago.edu FAX update Rob Gardner Computation and Enrico Fermi Institutes University of Chicago Sep 9, 2013.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS Computing Integration.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
External and internal data traffic in Tier-2 ATLAS farms. Sketch of farm organization Some approximate estimate s of internal and external data flows in.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Tier 3 Data Management, Tier 3 Rucio Caches Doug Benjamin Duke University.
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
FAX UPDATE 26 TH AUGUST Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
Efi.uchicago.edu ci.uchicago.edu Towards FAX usability Rob Gardner, Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago US ATLAS.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
1© Copyright 2012 EMC Corporation. All rights reserved. EMC Mission Critical Infrastructure For Microsoft SQL Server 2012 Accelerated With VFCache EMC.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
Efi.uchicago.edu ci.uchicago.edu FAX Dress Rehearsal Status Report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation.
PanDA Update Kaushik De Univ. of Texas at Arlington XRootD Workshop, UCSD January 27, 2015.
1 Database mini workshop: reconstressing athena RECONSTRESSing: stress testing COOL reading of athena reconstruction clients Database mini workshop, CERN.
Efi.uchicago.edu ci.uchicago.edu FAX status developments performance future Rob Gardner Yang Wei Andrew Hanushevsky Ilija Vukotic.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
V.Ilyin, V.Gavrilov, O.Kodolova, V.Korenkov, E.Tikhonenko Meeting of Russia-CERN JWG on LHC computing CERN, March 14, 2007 RDMS CMS Computing.
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.
NTOF DAQ status D. Macina (EN-STI-EET) Acknowledgements: EN-STI-ECE section: A. Masi, A. Almeida Paiva, M. Donze, M. Fantuzzi, A. Giraud, F. Marazita,
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
1 Worker Node Requirements TCO – biggest bang for the buck –Efficiency per $ important (ie cost per unit of work) –Processor speed (faster is not necessarily.
FAX PERFORMANCE TIM, Tokyo May PERFORMANCE TIM, TOKYO, MAY 2013ILIJA VUKOTIC 2  Metrics  Data Coverage  Number of users.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
FAX UPDATE 12 TH AUGUST Discussion points: Developments FAX failover monitoring and issues SSB Mailing issues Panda re-brokering to FAX Monitoring.
Efi.uchicago.edu ci.uchicago.edu Data Federation Strategies for ATLAS using XRootD Ilija Vukotic On behalf of the ATLAS Collaboration Computation and Enrico.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Efi.uchicago.edu ci.uchicago.edu Ramping up FAX and WAN direct access Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.
Efi.uchicago.edu ci.uchicago.edu Storage federations, caches & WMS Rob Gardner Computation and Enrico Fermi Institutes University of Chicago BigPanDA Workshop.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
 IO performance of ATLAS data formats Ilija Vukotic for ATLAS collaboration CHEP October 2010 Taipei.
SLACFederated Storage Workshop Summary Andrew Hanushevsky SLAC National Accelerator Laboratory April 10-11, 2014 SLAC.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Efi.uchicago.edu ci.uchicago.edu Federating ATLAS storage using XrootD (FAX) Rob Gardner on behalf of the atlas-adc-federated-xrootd working group Computation.
Efi.uchicago.edu ci.uchicago.edu Sharing Network Resources Ilija Vukotic Computation and Enrico Fermi Institutes University of Chicago Federated Storage.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computation and Enrico Fermi.
Efi.uchicago.edu ci.uchicago.edu Caching FAX accesses Ilija Vukotic ADC TIM - Chicago October 28, 2014.
Efi.uchicago.edu ci.uchicago.edu FAX splinter session Rob Gardner Computation and Enrico Fermi Institutes University of Chicago ATLAS Tier 1 / Tier 2 /
Dynamic Extension of the INFN Tier-1 on external resources
Atlas IO improvements and Future prospects
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Solid State Disks Testing with PROOF
Experience of Lustre at a Tier-2 site
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
FDR readiness & testing plan
Migration Strategies – Business Desktop Deployment (BDD) Overview
Figure 14.1 The modern integrated computer environment
Presentation transcript:

efi.uchicago.edu ci.uchicago.edu Using FAX to test intra-US links Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group Computing Integration and Operations meeting February 19, 2014

efi.uchicago.edu ci.uchicago.edu 2 Idea US FAX infrastructure – All running with Rucio enabled N2N – All ready for tests Try to check connectivity between sites using IO intensive jobs Check realistic physics analysis bandwidth needs

efi.uchicago.edu ci.uchicago.edu 3 Connectivity tests Test: – Using site specific 744 FDR files (2.7 TB) – Reading 10% events using 30MB TTC – Jobs submitted using ATLAS connect – Running at MWT2 UC only – One job one file

efi.uchicago.edu ci.uchicago.edu 4 Connectivity tests – results – AGLT2 Stress 1

efi.uchicago.edu ci.uchicago.edu 5 Connectivity tests – results – AGLT2 Stress 2 Stress jobs

efi.uchicago.edu ci.uchicago.edu 6 Connectivity tests – results – BNL

efi.uchicago.edu ci.uchicago.edu 7 Connectivity tests – results – BNL We repeatedly tried to obtain higher bandwidth from BNL Used resources of MTW (UC, IU) + AGLT2 + Fresno In the process discovered and fixed several FAX endpoint configuration issues Found that BNL connection has 10Gbps limit Will repeat the test when 100Gbps gets established

efi.uchicago.edu ci.uchicago.edu 8 Connectivity tests – results – BU

efi.uchicago.edu ci.uchicago.edu 9 Connectivity tests – results – MWT2 Jobs running at IU reading from UC. Test performed before bandwidth upgrade

efi.uchicago.edu ci.uchicago.edu 10 Connectivity tests – results – OU_OCHEP_SWT2

efi.uchicago.edu ci.uchicago.edu 11 Connectivity tests – results – SWT2_CPB

efi.uchicago.edu ci.uchicago.edu 12 Connectivity tests – results – WT2

efi.uchicago.edu ci.uchicago.edu 13 Connectivity tests - summary Very low failure rate from all endpoints Surprisingly small bandwidth from BU and WT2

efi.uchicago.edu ci.uchicago.edu 14 Realistic analysis tests Friedrich’s Higgs to WW analysis – Using RootCore framework – All corrections applied – Reads 512 from 8543 branches (13% of total size) – Writes out <1% events and a number of histograms/TTrees – 10MB TTC (suboptimal) – Default learning phase (100 events - suboptimal)

efi.uchicago.edu ci.uchicago.edu 15 Realistic analysis tests – results Having the data file cached in memory – 1300 ev/s – 20.2 MB/s of useful data Having the data on a local disk – single spindle, no other activity – 600 ev/s – 8.6 MB/s of useful data – CPU efficiency 58%

efi.uchicago.edu ci.uchicago.edu 16 Realistic analysis tests - results WAN access run at IU reading data from UC (5ms RTT) Each job analyzing: – 25 files – 90.5 GB – 832 kEvents Slowly ramping up in order to see when we hit a bottleneck

efi.uchicago.edu ci.uchicago.edu 17 Realistic analysis tests - results concurrent jobs ev/serr ev/sMB/serr MB/s Basically flat up to MB/s Much larger scale needed to saturate the link now that we have confirmed 60 Gbps. Not much slower than local disk With default TTC and shorter learning phase should approach 10MB/s. Will test.

efi.uchicago.edu ci.uchicago.edu 18 Conclusions FAX endpoint at US sites much more stable now. Stability and low failure rate make it good enough for direct access use. With special direct access test was able to reach the link limits of most endpoints. Realistic analysis used less bandwidth (5MB/s) than expected 10 MB/s. Will repeat the tests at larger scale with/without further optimization. Will try to saturate 100Gbps MWT2-BNL link when it comes online.

efi.uchicago.edu ci.uchicago.edu 19 Reserve

efi.uchicago.edu ci.uchicago.edu 20 WAN Testing in DE cloud Functional HC tests by Friedrich We should expect similar numbers from US

efi.uchicago.edu ci.uchicago.edu 21 WAN Load Test (200 job scale) – DE cloud Some uncertainty of #concurrently running jobs (not directly controllable) Indicates reasonable opportunity for re-brokering

efi.uchicago.edu ci.uchicago.edu 22 WAN Testing in US cloud Numbers lower than in DE cloud. Possible explanations: – Diagonal elements – maybe we overprovision CPU at the larger scale than DE – Off-diagonal elements - suboptimal TTC size and learning phase result in larger than optimal RTT penalty. Size of US makes this price higher for US cloud. – …

efi.uchicago.edu ci.uchicago.edu 23 WAN Load Test (200 job scale) – US cloud Surprisingly low figures Under investigation