USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

TeraPaths TeraPaths: Flow-Based End-to-End QoS Paths through Modern Hybrid WANs Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
TeraPaths: End-to-End Network Path QoS Configuration Using Cross-Domain Reservation Negotiation Bruce Gibbard Dimitrios Katramatos Shawn McKee Dantong.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
(part 3).  Switches, also known as switching hubs, have become an increasingly important part of our networking today, because when working with hubs,
Network Topology. Cisco 2921 Integrated Services Router Security Embedded hardware-accelerated VPN encryption Secure collaborative communications with.
Outline Network related issues and thinking for FAX Cost among sites, who has problems Analytics of FAX meta data, what are the problems  The main object.
Minerva Infrastructure Meeting – October 04, 2011.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
CMS Data Transfer Challenges LHCOPN-LHCONE meeting Michigan, Sept 15/16th, 2014 Azher Mughal Caltech.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
TeraPaths : A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research USATLAS Tier 1 & Tier 2 Network Planning Meeting December.
1 ESnet Network Measurements ESCC Feb Joe Metzger
TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research Bruce Gibbard & Dantong Yu High-Performance Network Research.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
TeraPaths TeraPaths: establishing end-to-end QoS paths - the user perspective Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
Module 7: Fundamentals of Administering Windows Server 2008.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Center of Excellence Wireless and Information Technology CEWIT 2008 TeraPaths: Managing Flow-Based End-to-End QoS Paths Experience and Lessons Learned.
Module 1: Installing and Configuring Servers. Module Overview Installing Windows Server 2008 Managing Server Roles and Features Overview of the Server.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Identifying Application Impacts on Network Design Designing and Supporting.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
N EWS OF M ON ALISA SITE MONITORING
TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.
Brookhaven Science Associates U.S. Department of Energy 1 Network Services BNL USATLAS Tier 1 / Tier 2 Meeting John Bigrow December 14, 2005.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
A Throttling Layer-7 Web Switch James Furness. Motivation & Goals Specification & Design Design detail Demonstration Conclusion.
BNL Wide Area Data Transfer for RHIC & ATLAS: Experience and Plans Bruce G. Gibbard CHEP 2006 Mumbai, India.
Practical Distributed Authorization for GARA Andy Adamson and Olga Kornievskaia Center for Information Technology Integration University of Michigan, USA.
BNL Facility Status and Service Challenge 3 HEPiX Karlsruhe, Germany May 9~13, 2005 Zhenping Liu, Razvan Popescu, and Dantong Yu USATLAS/RHIC Computing.
TeraPaths The TeraPaths Collaboration Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos, BNL.
LAN QoS and WAN MPLS: Status and Plan Dantong Yu and Shawn Mckee DOE Site Visit December 13, 2004.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
Terapaths: MPLS based Data Sharing Infrastructure for Peta Scale LHC Computing Bruce Gibbard and Dantong Yu USATLAS Computing Facility DOE Network Research.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team CHEP 06.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
1 TeraPaths and dynamic circuits  Strong interest to expand testbed to sites connected to Internet2 (especially US ATLAS T2 sites)  Plans started in.
December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Brookhaven Science Associates U.S. Department of Energy 1 Network Services LHC OPN Networking at BNL Summer 2006 Internet 2 Joint Techs John Bigrow July.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Site Throughput Review and Issues Shawn McKee/University of Michigan US ATLAS Tier2/Tier3 Workshop May 27 th, 2008.
-1- ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) David Robertson Internet2 Joint Techs Workshop July 18,
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
BDTS and Its Evaluation on IGTMD link C. Chen, S. Soudan, M. Pasin, B. Chen, D. Divakaran, P. Primet CC-IN2P3, LIP ENS-Lyon
TeraPaths TeraPaths:Configuring End-to-End Virtual Network Paths With QoS Guarantees Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
BNL Network Status and dCache/Network Integration Dantong Yu USATLAS Computing Facility Brookhaven National Lab.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
STORAGE EXPERIENCES AT MWT2 (US ATLAS MIDWEST TIER2 CENTER) Aaron van Meerten University of Chicago Sarah Williams Indiana University OSG Storage Forum,
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
Fermilab Cal Tech Lambda Station High-Performance Network Research PI Meeting BNL Phil DeMar September 29, 2005.
The TeraPaths Testbed: Exploring End-to-End Network QoS Dimitrios Katramatos, Dantong Yu, Bruce Gibbard, Shawn McKee TridentCom 2007 Presented by D.Katramatos,
Diskpool and cloud storage benchmarks used in IT-DSS
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
GGF15 – Grids and Network Virtualization
Managing Clouds with VMM
Presentation transcript:

USATLAS Network/Storage and Load Testing Jay Packard Dantong Yu Brookhaven National Lab

2 Outline  USATLAS Network/Storage Infrastructures: Platform of Performing Load Test.  Load Test Motivation and Goals.  Load Test Status Overview.  Critical Components in Load Test: Control and Monitoring and Network Monitoring and Weather Maps.  Detailed Plots for Single v.s. Multiple host Load Tests.  Problems.  Proposed Solutions:  Network Research and Its Role in Dynamic Layer 2 Circuits between BNL and US ATLAS Tier 2 sites.

3 BNL 20 Gig-E Architecture Based on CISCO6513  20 GBps LAN for LHCOPN  20GBps for Production IP  Full Redundancy: can survive the failure of any network switch.  No Firewall for LHCOPN, as shown in the green lines.  Two Firewalls for all other IP networks.  Cisco Firewall Services Module (FWSM), a line card plugged into CISCO chassis with 5*1Gbps capacity, allows outgoing connection.

4 20 Gb/s HPSS Mass Storage System dCache SRM and Core Servers Gridftp door (8 nodes)‏ 2x10 Gb/s WAN BNL LHC OPN VLAN Write Pool Farm Pool (434 nodes / 360 TB)‏ 8 x 1 Gb/s Tier 1 VLANS 2x10 Gb/s 8 x 1 Gb/s dCache.... N x 1 Gb/s Gb/s Logical Connections FTS controlled Srmcp path T0 Export Pool (>=30 nodes)‏ New Farm Pool (80 nodes, 360TB Raw )‏ Thumpers (30 nodes, 720TB Raw )‏ dCache and Network Integration ESnet Load Testing Hosts New Panda and Panda DB.

5 Tier 2 Network Example: ATLAS Great Lakes Tier 2

6 Need More Details of Tier 2 Network/Storage Infrastructures  Hope to see architectural maps from each Tier2 to describe the integration of Tier 2 network and production and testing storage systems in the site reports.

7 Goal  Develop a toolkit for testing and viewing I/O performance at various middleware layers (network, grid-ftp, FTS) in order to isolate problems.  Single-host transfer optimization at each layer.  120 MB/s is ideal for memory to memory transfer and high performance storage.  40 MB/s is ideal for disk transfer to a regular worker node.  Multi-host transfer optimization for site with 10Gbps connectivity.  Starting point: Sustained 200 MB/s disk-to-disk transfer for 10 minutes between Tier1 and each Tier2 is goal (Rob Gardner).  Then increase disk-to-disk transfer to 400MBytes/second.  For site with 1Gbps bottleneck, we should max out the network capacity.

Status Overview  MonALISA control application has been developed for specifying single-host transfer, protocol, duration, size, stream range, tcp buffer range, etc. Currently only run by Jay Packard at BNL, but may eventually be run by multiple administrators at other sites within MonALISA framework.  MonALISA monitoring plugin has been developed to display current results in graphs. They are available in monALISA client ( and will soon be available on a web page.

Status Overview...  Have been performing single-host tests for past few months.  Types  Network memory to memory (using iperf)  Grid-ftp memory to memory (using globus-url-copy)  Grid-ftp memory to disk (using globus-url-copy)  Grid-ftp disk to disk (using globus-url-copy)  At least one host at each site has been TCP tuned, which has shown dramatic improvements at some sites in the graphs (e.g. 5 MB/s to 100 MB/s for iperf tests)  If Tier 2 has 10Gbps, there is significant improvement for single TCP stream, from 50mbps to close to 1Gbps. (IU, UC, BU, UMich).  If Tier 2 has 1Gbps bottleneck, network performance can be improved with multiple TCP streams. Simple TCP buffer size tune can not improve single TCP stream performance due to bandwidth competition.  Discovered problems: dirty fiber, CRC error on network interface, and moderate TCP buffer size, details can be found in Shawn’s talk.  Coordinating with Michigan and BNL (Hiro Ito, Shawn McKee, Robert Gardner, Jay Packard) to measure and optimize total throughput using FTS disk-to-disk. We are trying to leverage high performance storage (Thumper at BNL and Dell NAS at Michigan) to achieve our goal.

MonALISA Control Application  Our Java class implements MonALISA's AppInt interface as a plug-in.  900 lines of code currently.  Does the following:  Generates and prepares source files for disk to disk transfer  Starts up remote iperf server and local iperf client using globus-job-run remotely and ssh locally  Runs iperf or grid-ftp for a period of time and collects output  Parses output for average and maximum throughput  Generates output understood by monitoring plugin  Cleans up destination files  Stops iperf servers  Flexible to account for heterogeneous sites (e.g., killing iperf is done differently on a managed fork gatekeeper; one site runs BWCTL instead of iperf). This flexibility in the code requires frequently watching the output of the application and augmenting the code to handle many circumstances.

MonALISA Control Application... Generates the average and maximum throughput during a 2 minute interval, which interval is required for the throughput to “ramp up”. Sample configuration for grid-ftp memory-to-disk: command=gridftp_m2d startHours=4,16 envScript=/opt/OSG_060/setup.sh fileSizeKB= streams=1, 2, 4, 8, 12 repetitions=1 repetitionDelaySec=1 numSrcHosts=1 timeOutSec=120 tcpBufferBytes= , hosts=dct00.usatlas.bnl.gov, atlas-g01.bu.edu/data5/dq2-cache/test/, atlas.bu.edu/data5/dq2-cache/test/, umfs02.grid.umich.edu/atlas/data08/dq2/test/, umfs05.aglt2.org/atlas/data16/dq2/test/, dq2.aglt2.org/atlas/data15/mucal/test/, iut2- dc1.iu.edu/pnfs/iu.edu/data/test/, uct2-dc1.uchicago.edu/pnfs/uchicago.edu/data/ddm1/test/, gk01.swt2.uta.edu/ifs1/dq2_test/storageA/test/, tier2-02.ochep.ou.edu/ibrix/data/dq2-cache/test/, ouhep00.nhn.ou.edu/raid2/dq2- cache/test/, osgserv04.slac.stanford.edu/xrootd/atlas/dq2/tmp/

MonALISA Monitoring Application  Java class that implements MonALISA's MonitoringModule interface.  Much simpler than controlling application (only 180 lines of code).  Parses log file produced by controlling application in the format (time, site name, module, host, statistic, value:  , BNL_ITB_Test1, Loadtest, bnl->uta(dct00->ndt), network_m2m_avg_01s_08m, 6.42 (01s = 1 stream, 08m = TCP buffer size of 8 MB)  Data pulled by MonALISA server, which displays graph upon demand.

Single-host Tests Too many graphs to show all, but two key graphs will be shown. For one stream:

Single-host Tests... For 12 streams (notice disk-to-disk improvement):

Multi-host tests  Using FTS to perform tests from BNL to Michigan initially and then to other Tier 2 sites.  Goal is sustained 200 MB/s disk-to-disk transfer for 10 minutes from Tier 1 to each Tier 2. Can be in addition to existing traffic.  Trying to find optimum number of streams and TCP buffer size to use by finding optimum for single-host transfer between two high performance machines.  Low disk-to-disk, one-stream performance from BNL's thumper to Michigan's Dell NAS of 2 MB/s whereas iperf mem-to-mem, one-stream gives 66 MB/s between same hosts (Nov 21, 07). Should this be higher for one stream?  Found that the more streams the higher the throughput, but cannot use too many especially with a high TCP buffer size or applications will crash.  Disk-to-disk throughput currently so low that a larger TCP buffer doesn't matter.

Multi-host Tests and Monitoring  Monitoring using netflow graphs rather than Monalisa available at  Some sites will likely require the addition of more storage pools and doors that are each TCP tuned to achieve the goal.

Problems  Getting reliable testing results amidst existing traffic  Each test runs for a couple minutes and produces several samples, so hopefully a window exists when the traffic is low during which the maximum is attained.  The applications could be changed to output the maximum of the last few tests (tricky to implement).  Use dedicated Network Circuits: TeraPaths  Disk-to-disk bottleneck  Not sure if problem is the hardware or the storage software (e.g. dCache, Xrootd). FUSE (Filesystem in Userspace), or filesystem in memory, which provides could help isolate storage software degradation. Bonnie could help isolate hardware degradation.  Is there anyone that could offer disk performance expertise?  Discussed in Shawn McKee's presentation, 'Optimizing USATLAS Data Transfers.'".  Progress is happening slowly due to a lack of in-depth coordination, scheduling difficulties, and a lack of manpower (Jay is using ~1/3 FTE). Too much on the agenda at the Computing Integration and Operations meeting to allow for in-depth coordination.  Ideas for improvement

TeraPaths and Its Role in Improving Network Connectivities between BNL and US ATLAS Tier 2 sites.  The problem: support efficient/reliable/predictable peta-scale data movement in modern high-speed networks  Multiple data flows with varying priority  Default “best effort” network behavior can cause performance and service disruption problems  Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows  Treat network as a valuable resource  Schedule network usage (how much bandwidth and when)‏  Techniques: DiffServ (DSCP), PBR, MPLS tunnels, dynamic circuits (VLANs)  Collaboration with ESnet (OSCARS) and Internet 2 (DRAGON) to dynamically create end-to-end paths, and dynamically forward traffic into these paths. Software is being deployed to US ATLAS Tier 2 sites.  Option 1: Layer 3: MPLS tunnels (Umich and SLAC)‏  Option 2: Layer 2: VLANs (BU, UMichi, demonstrated at SC’07)‏

Northeast Tier 2 Dynamic Network Links

Questions?