Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN.

Slides:



Advertisements
Similar presentations
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
Advertisements

Network Technical Planning Committee Report Great Plains Network 4/27/2010.
T1-NREN Luca dell’Agnello CCR, 21-Ottobre The problem Computing for LHC experiments –Multi tier model (MONARC) –LHC computing based on grid –Experiment.
20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
FNAL Site Perspective on LHCOPN & LHCONE Future Directions Phil DeMar (FNAL) February 10, 2014.
A Flexible Model for Resource Management in Virtual Private Networks Presenter: Huang, Rigao Kang, Yuefang.
1 Routing and Scheduling in Web Server Clusters. 2 Reference The State of the Art in Locally Distributed Web-server Systems Valeria Cardellini, Emiliano.
Trial of the Infinera PXM Guy Roberts, Mian Usman.
Restoration Routing in MPLS Networks Zartash Afzal Uzmi Computer Science and Engineering Lahore University of Management Sciences.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
Inside the Internet. INTERNET ARCHITECTURE The Internet system consists of a number of interconnected packet networks supporting communication among host.
Semester 4 - Chapter 3 – WAN Design Routers within WANs are connection points of a network. Routers determine the most appropriate route or path through.
Network Topologies.
Data Communications and Networking
1 Wide Area Network. 2 What is a WAN? A wide area network (WAN ) is a data communications network that covers a relatively broad geographic area and that.
Questionaire answers D. Petravick P. Demar FNAL. 7/14/05 DLP -- GDB2 FNAL/T1 issues In interpreting the T0/T1 document how do the T1s foresee to connect.
TeraPaths : A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research USATLAS Tier 1 & Tier 2 Network Planning Meeting December.
LHCOPN & LHCONE Network View Joe Metzger Network Engineering, ESnet LHC Workshop CERN February 10th, 2014.
1 WHY NEED NETWORKING? - Access to remote information - Person-to-person communication - Cooperative work online - Resource sharing.
Use Case for Distributed Data Center in SUPA
TeraPaths: A QoS Collaborative Data Sharing Infrastructure for Petascale Computing Research Bruce Gibbard & Dantong Yu High-Performance Network Research.
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
ALICE data access WLCG data WG revival 4 October 2013.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
TeraPaths TeraPaths: establishing end-to-end QoS paths - the user perspective Presented by Presented by Dimitrios Katramatos, BNL Dimitrios Katramatos,
Module 4: Planning, Optimizing, and Troubleshooting DHCP
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
By Lecturer / Aisha Dawood 1.  Dedicated and Shared Server Processes  Configuring Oracle Database for Shared Server  Oracle Database Background Processes.
Chapter 4 Realtime Widely Distributed Instrumention System.
Wolfgang EffelsbergUniversity of Mannheim1 Differentiated Services for the Internet Wolfgang Effelsberg University of Mannheim September 2001.
Thoughts on Future LHCOPN Some ideas Artur Barczyk, Vancouver, 31/08/09.
Computer Networks with Internet Technology William Stallings
LHC Open Network Environment LHCONE David Foster CERN IT LCG OB 30th September
S4-Chapter 3 WAN Design Requirements. WAN Technologies Leased Line –PPP networks –Hub and Spoke Topologies –Backup for other links ISDN –Cost-effective.
TeraPaths TeraPaths: Establishing End-to-End QoS Paths through L2 and L3 WAN Connections Presented by Presented by Dimitrios Katramatos, BNL Dimitrios.
Brookhaven Science Associates U.S. Department of Energy 1 Network Services BNL USATLAS Tier 1 / Tier 2 Meeting John Bigrow December 14, 2005.
Wide Area Networks. Wide Area Networks WAN Bridging of any distance Usually for covering of a country or a continent Topology normally is irregular due.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
Hierarchical Topology Design. 2 Topology Design Topology is a map of an___________ that indicates network segments, interconnection points, and user communities.
SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.
ATLAS Network Requirements – an Open Discussion ATLAS Distributed Computing Technical Interchange Meeting University of Tokyo.
Internet2 Joint Techs Workshop, Feb 15, 2005, Salt Lake City, Utah ESnet On-Demand Secure Circuits and Advance Reservation System (OSCARS) Chin Guok
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Strawman LHCONE Point to Point Experiment Plan LHCONE meeting Paris, June 17-18, 2013.
Network Layer Lecture Network Layer Design Issues.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
A Strawman for Merging LHCOPN and LHCONE infrastructure LHCOPN + LHCONE Meeting Washington, DC, Jan. 31, 2013 W. E. Johnston and Chin Guok.
© 2014 Level 3 Communications, LLC. All Rights Reserved. Proprietary and Confidential. Simple, End-to-End Performance Management Application Performance.
1 Network related topics Bartosz Belter, Wojbor Bogacki, Marcin Garstka, Maciej Głowiak, Radosław Krzywania, Roman Łapacz FABRIC meeting Poznań, 25 September.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
100GE Upgrades at FNAL Phil DeMar; Andrey Bobyshev CHEP 2015 April 14, 2015.
Campana (CERN-IT/SDC), McKee (Michigan) 16 October 2013 Deployment of a WLCG network monitoring infrastructure based on the perfSONAR-PS technology.
TeraPaths: A QoS Enabled Collaborative Data Sharing Infrastructure for Petascale Computing Research The TeraPaths Project Team Usatlas Tier 2 workshop.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Dynamic Extension of the INFN Tier-1 on external resources
Extending the farm to external sites: the INFN Tier-1 experience
Virtual Private Networks
Use Case for Distributed Data Center in SUPA
Grid Optical Burst Switched Networks
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
Managing Secure Network Systems
Future of WAN Access in ATLAS
Semester 4 - Chapter 3 – WAN Design
Establishing End-to-End Guaranteed Bandwidth Network Paths Across Multiple Administrative Domains The DOE-funded TeraPaths project at Brookhaven National.
Ákos Frohner EGEE'08 September 2008
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Chapter 20 Network Layer: Internet Protocol
Towards Predictable Datacenter Networks
Presentation transcript:

Point-to-point Architecture topics for discussion Remote I/O as a data access scenario Remote I/O is a scenario that, for the first time, puts the WAN between the data and the executing analysis programs. – Previously, the data was staged to the site where the compute resources were located and data access by programs was from local, or at least site-resident, disks – inserting the WAN is a change that potentially requires a virtual circuit service to ensure the smooth flow of data between disk and computing system, and therefore the “smooth” job execution needed to make effective use of the compute resources. (Though it may be that raw bandwidth is a bigger issue – see below.)

Remote I/O as a data access scenario The simplistic is that each RIO operation involves setting up a circuit between compute system and data system – If circuits are set up and torn down when remote files are opened and closed, and if the example is typical, then circuit duration is short – of order 10 minutes – This is almost certainly impractical just based on looking at the number of jobs being executed today

Remote I/O as a data access scenario In regards to the potential intensity and density of circuit usage, what is the potential clustering and its parameters. – Is analysis organized around chunks of interesting data, and that a fairly large number of analysis jobs will work on that data for some limited period of time? – If this is the case then we would expect a 1 X N sort of clustering, where "1" is the data set and "N" is are the locations of compute resources that will operate on that data. It is likely the case that N is small-ish – maybe of order 10? what is the duration of this "cluster?" That is, how long will the jobs spend processing such a “chunk” of data? How many simultaneous clusters can we expect? That is, how many chunks of data are being analyzed simultaneously? How many different sites will be involved? (Presumably all of the Tier 2 sites.)

Remote I/O as a data access scenario Mix of remote I/O and bulk data transfer – What will be the mix of RIO-based access and bulk data transfer? – The change to remote I/O is, to a certain extent, aimed at lessening the use of the bulk data transfers that use GridFTP, so is addressing GridFTP addressing a dying horse? – How much bulk transfer will there be in a mature RIO scenario, and between what mix of sites?

Remote I/O as a data access scenario Performance – Some of the ATLAS initial testing using HTTP as the remote I/O protocol shows a 40x decrease in program I/O throughput compared to local disk

Remote I/O as a data access scenario – Is this example typical of the amount of data consumed by an analysis program? – Is this typical of the amount of data read compared to the amount of CPU time used? – Will circuits be used primarily to secure bandwidth? (If so, this exercise may be more about the available underlying network bandwidth rather than about circuits, per se.) If circuit setup fails, does this program proceed and automatically fallback to the VRF access that the computing systems is embedded in (or the general IP infrastructure)?

Remote I/O as a data access scenario Interactions of VRF and circuits (? – maybe an operations issue) – It may be that aggregations of compute and disk resources feeding a small number of long-lived circuits (site-to-site, cluster-to-cluster, etc.) could meet all user requirements. This would reduce the “intensity” of use of the service and obviate the need for users to deal directly with circuits. – Could, on the other hand, the existing LHCONE VRF environment make it difficult to aggregate resources in this way? By way of example, in ESnet many OSCARS circuits are used for interconnecting routers that, at the site end, aggregate resources, e.g. by routing for a cluster on a LAN. This is the sort of thing that might be hard to do if that cluster is also accessed via the VRF because of how the address space is managed

Point-to-point Architecture topics for discussion The LHCOPN as a circuit scenario Requirements – guaranteed delivery of data over a long period of time since the source at CERN is essentially a continuous, real-time data source; – long-term use of substantial fractions of the link capacity; – a well-understood and deterministic backup / fail-over mechanism; – guaranteed capacity that does not impact other uses of the network; – a clear cost model with long-term capacity associated with specific sites (the Tire 1s); – a mechanism that is easily integrated into a production operations environment (specifically, the LCG trouble ticket system) that monitors the circuit health and has established trouble-shooting, resolution responsibility, and provides for problem tracking and reporting

The LHCOPN as a circuit scenario There has been discussion of moving the LHCOPN to a virtual circuit service for several reasons: – VCs can be moved around on an underlying physical infrastructure to better use available capacity, and potentially, to provide greater robustness in the face of physical circuit outages; – VCs have the potential to allow for sharing of a physical link when the VC is idle or used less than the committed bandwidth. Therefore these are requirements (?) for circuits – Topological flexibility – Circuit implementation that allows sharing the underlying physical link That is, b/w committed to, but not used by the circuit, are available for other traffic

The LHCOPN as a circuit scenario Other useful semantics – Although the virtual circuits are rate-limited at the ingress to limit utilization to that requested by users they are permitted to burst above the allocated bandwidth if idle capacity is available Must be done without interfering with other circuits, or other uses of the link, such as general IP traffic, by, for example, marking the over- allocation bandwidth as low- priority traffic – User can request a second circuit that is diversely routed from the first circuit. In order to provide high reliability for backup circuit ……… Why is this interesting? – The rise of a general infrastructure that is 100G / link, using dedicated 10G links for T0 – T1 becomes increasingly inefficient – Shifting the OPN circuits to virtual circuits on the general (or LHCONE) infrastructure could facilitate sharing while meeting the minimum required guaranteed OPN bandwidth

Point-to-point Architecture topics for discussion Cost models, allocation management The reserved bandwidth of a circuit is a scarce commodity – this commodity must be manageable From the view of the network providers What sorts of manageability does a user community require –What does the user community need to control in terms of circuit creation?