Presentation is loading. Please wait.

Presentation is loading. Please wait.

Networking for High Energy Physics and US LHCNet In partnership with ESnet, Internet2,Starlight, SURFNet and GLIF Harvey Newman (Caltech, PI) Artur Barczyk.

Similar presentations


Presentation on theme: "Networking for High Energy Physics and US LHCNet In partnership with ESnet, Internet2,Starlight, SURFNet and GLIF Harvey Newman (Caltech, PI) Artur Barczyk."— Presentation transcript:

1 Networking for High Energy Physics and US LHCNet In partnership with ESnet, Internet2,Starlight, SURFNet and GLIF Harvey Newman (Caltech, PI) Artur Barczyk (Caltech, Lead Engineer) David Foster (CERN, Co-PI) US LHCNet Briefing at CERN July 8, 2013 1

2 2 Discovery of a Higgs Like Boson July 4, 2012 Theory : 1964 LHC + Experiments Concept: 1984 Construction: 2001 Operation: 2009 USLHCNet, a Highly Reliable High Capacity TransAtlantic Network Had an Essential Role in the Discovery

3 US LHCNet: Background 25+ Year Collaboration with CERN  1982: Caltech initiated Transatlantic networking for HEP in 1982, 1982-5: First HEP experience with packet networks (US-DESY)  1985-6 Networks for Science: NSFNET, IETF, National Academy Panel  1986 Assigned by DOE to operate LEP3Net, the first US-CERN leased line, multiprotocol network for HEP (9.6 – 64 kbps)  1987-8: Convinced IBM to provide the first T1 TA link US-CERN ($3M/Yr)  1989-1995: Upgrades to LEP3Net (X.25, DECNet, TCP/IP): 64 – 512 kbps  1996 - 2005: USLIC Consortium (Caltech – CERN – IN2P3 – WHO – UNICC) based on 2 Mbps Links, then ATM  1997: Hosted visit by Internet2 CEO; CERN Internet2’s first Int’l member  1996-2000: Created LHC Computing Model (MONARC); Tier2 Concept  2002 – Present: HN serves on ICFA as Chair of the SCIC: Network Issues, Roadmaps  2006 – Present: US LHCNet, co-managed by CERN and Caltech; Links at 2.5G; then 10G; then 2, 4, 6 10G links. Resilient service.  Spring 2006 – Present: Caltech took over the primary operation and management responsibility, including the roadmaps and periodic RFPs 3

4 LHC Data Grid Hierarchy: A Worldwide System Invented and Developed at Caltech (1999) Tier 1 Tier2 Center Online System CERN Center PBs of Disk; Tape Robot Chicago Institute Workstations ~300-1500 MBytes/sec 1 to 10 Gbps 100s of Petabytes by 2012 100 Gbps+ Data Networks Physics data cache ~PByte/sec 10 – 40 to 100 Gbps Tier2 Center 10 to N X 10 Gbps Tier 0 +1 Tier 3 Tier 4 Tier 2 Experiment A Global Dynamic System A New Generation of Networks: LHCONE, ANSE 11 Tier1 and 160 Tier2 + 300 Tier3 Centers Synergy with US LHCNet: State of the Art Data Networks London Paris Taipei CACR

5 The Core of LHC Networking: LHCOPN and Partners 5 Dark Fiber Core Among 19 Countries:  Austria  Belgium  Croatia  Czech Rep.  Denmark  Finland  France  Germany  Hungary  Ireland  Italy  Netherlands  Norway  Slovakia  Slovenia  Spain  Sweden  Switzerland  UK + ESnet, NRENs in Europe, Asia; Internet2, NLR, Latin Am.,Au/NZ US LHCNet and ESnet Simple and Highly Reliable, for Tier0 and Tier1 Operations LHCOPN GEANT

6 6 Networking for the Future of Science ESnet and the OSCARS Virtual Circuit Service: Motivation, Design, Deployment and Evolution of a Guaranteed Bandwidth Network Service CERN, June 10, 2010 William E. Johnston, Senior Scientist (wej@es.net) Chin Guok, Engineering and R&D (chin@es.net) Evangelos Chaniotakis, Engineering and R&D (haniotak@es.net) Energy Sciences Network www.es.net Lawrence Berkeley National Lab At the First LHCONE Workshop, June 2010 Originated by Caltech and CERN

7  What Drives ESnet’s Network Architecture, Services, Bandwidth, and Reliability? For details, see endnote 3 And US LHCNet’s W. Johnston, ESnet

8  ESnet Response to the Requirements (2010) 1.Design a new network architecture and implementation optimized for very large data movement – this resulted in ESnet 4 (built in 2007- 2008 and described above) 2.Design and implement a new network service to provide reservable, guaranteed bandwidth – this resulted in the OSCARS virtual circuit service Paralleled by the US LHCNet Layer 1+2 Implementation ESNet and US LHCNet work together to provide this, With US LHCNet in the primary role for robust, resilient TA capacity

9 9 Log Plot of ESnet Monthly Accepted Traffic, January 1990 – December 2012 The Long Term Growth Trends are Unlikely to Change Terabytes / month Oct 1993 1 TBy/mo. Jul 1998 10 TBy/mo. 38 months 57 months 40 months Nov 2001 100 TBy/mo. 53 months 10,000 100,000 1000 100 0.1 1 10 W. Johnston, G. Bell Actual Exponential fit + 12 month projection Apr 2007 1 PBy/mo. Remarkable Historical ESnet Traffic Trend Cont’d in 2012 Actual Dec 2012 12 PBy/mo ESnet Traffic Increases 10X Each 4.25 Yrs, for 20+ Yrs 12 PBytes/mo. in ~Dec 2012 Equal to 40 Gbps Continuous Projection to 2016: 100 PBy/mo Avg. Annual Growth: 72%

10 10   W. Johnston, ESnet Manager (2008) On Circuit-Oriented Network Services Traffic Isolation; Security; Deadline Scheduling; High Utilization; Fairness 

11 Why Mission Orientation?  In the same way, US LHCNet plays the crucial role of understanding the LHC requirements and providing a technically sound and cost effective solution  Based on in-house expertise in networking and LHC computing  Interfacing with the R&E networking community at large  DOE and CERN have long time recognized the immense value of a mission-oriented network, and why this cannot be simply outsourced to a general network provider 11 W. Johnston, 2010, Workshop on Transatlantic Networks for HEP

12 Large Scale Flows are Handled by Circuits Using “OSCARs” Software by ESnet and collaborators Traffic on Circuits (PBytes/ Month) 2 0 4 6 8 10 12 ESnet Accepted Traffic 2000-2012 Half of the 100 Petabytes Accepted by Esnet in 2012 was Handled by Virtual Circuits with Guaranteed Bandwidth ESnet Response to Traffic Growth, Large Flows: Circuits with Bandwidth Guarantees ATLAS Traffic: + 70% in 2012 Over 2011 CMS Traffic: + 45% ATLAS Traffic: + 70% in 2012 Over 2011 CMS Traffic: + 45%

13 13 Grid-based analysis in Summer 2012: >1000 different users; >45M analysis jobs Excellent Grid performance was crucial for fast discovery of “Higgs-like” boson. Total Throughput of ATLAS Data in 2012 12.8 GBytes/s (102 Gbps) Peak in 2012 11 GBytes/s Peak for HCP 7.9 GBytes/s Peak for ICHEP 2012 Vs 2011: +70% Average +180% Peak Design: ~2 GBytes/s

14 CMS Data Transfer Volume (Feb. 2012– Jan. 2013) 42 PetaBytes Transferred Over 12 Months = 10.6 Gbps Avg. (~25 Gbps Peak) 2012 Versus 2011: +45% Higher Trigger Rates and Larger Events (PU) Larger Data Flow in 2015

15 15 The OSCARS L3 Fail-Over Mechanism – FNAL to CERN BGP US LHCnet primary- 1 10G primary- 2 10G backup- 1 3G US LHCnet ESnet SDN-AoAESnet SDN-St1ESnet SDN-Ch1 CERN FNAL1ESnet SDN-F1ESnet SDN-F2 FNAL2 Normal operating state

16 US LHCNet Peak Utilization: Many Peaks of 5 to 9+ Gbps (for hours) on each link 16

17 US LHCNet and ESnet Strong Synergy and Complementarity  The same service requirements that drive ESnet, drive US LHCNet; recognizing that US LHCNet works in the more difficult TA context  Guaranteed bandwidth  Very high uptime  Resilient operation  DOE Mandated metric: 99.95+% service uptime  Experience and field tests in 2006-7 showed conclusively that stable topology in a multilink network cannot be maintained by Layer 2 methods (e.g. MPLS and VLSR) in the presence of “flapping” links.  US LHCNet therefore investigated the market (Nortel, Ciena) and in 2007 engineered a far more robust and cost effective solution: Layer 1 (optical) + Layer 2 (switched) solution, with minimal necessary routing  This solution offers much higher performance, seamless resilience (invisible to the users) in case of multiple and single link outages  Has been shown to be much lower cost than carrier class routers; which also can only “emulate” Layer 1: non-seamlessly  Evaluated in depth in a 2010 comparative Review: US LHCNet was selected on the basis of field-proven performance, experience and cost 17 Single link downtimes: > 10X greater on avg Multiple link outages: 100X greater, or more

18 Accomplishing the US LHCNet Missions R&D Programs  US LHCNet’s R&D program has the medium and long term goal: integration of emerging network technologies with experiments’ workflows  Controlling and treating networks as a schedulable resource, along with computing and storage  For more predictable, higher efficiency operations in ATLAS and CMS  LHCONE: The LHC Open Network Environment  A global collaboration between the R&E network providers and the LHC Community  A field of application of software defined networks  DYNES: An NSF-funded project to create a CyberInstrument of dynamic circuits linking 50 campuses to Internet2, US LHCNet + other R&E networks  OLiMPS (Openflow Link-layer Multipath Switching)  Developing an OpenFlow based switch fabric to interconnect network nodes or endpoints with high performance, over multiple paths  ANSE (Advanced Network Services for Experiments) Caltech US LHCNet team, Michigan, Vanderbilt and UT Arlington  Overall goal is to raise the operational efficiency of data distribution and analysis in both ATLAS and CMS 18

19  In a nutshell, LHCONE was born (out the 2010 transatlantic workshop at CERN) to address two main issues:  To ensure that the services to the science community maintain their quality and reliability; Focus on Tier2/3 operations  To protect existing R&E infrastructures against potential “threats” of very large data flows  LHCONE is expected to  Provide some guarantees of performance  Large data flows across managed bandwidth that would provide better determinism than shared IP networks  Segregation from competing traffic flows; Limit Impact on GPNs, $  Manage capacity as # sites x Max flow/site x # Flows increases  Provide ways for better utilization of resources  Use all available resources, especially transatlantic  Provide Traffic Engineering and flow management capability  Leverage investments being made in advanced networking LHCONE in Brief 19

20 LHCONE Initial Architecture Basic Idea at 30’000 ft 20 LHCOPN Meeting Lyon, February 2011 Sets of Open Exchange Points

21 21 ESnet USA Chicago New York Amsterdam BNL-T1 Internet2 USA Harvard CANARIE Canada UVic SimFraU TRIUMF-T1 UAlbUTor McGill Seattle TWAREN Taiwan NCU NTU ASGC Taiwan ASGC-T1 KREONET2 Korea KNU LHCONE VPN domain End sites – LHC Tier 2 or 3 unless indicated as Tier 1 Regional R&E communication nexus Data communication links, 10, 20, and 30 Gb/s See http://lhcone.net for details.http://lhcone.net NTU Chicago LHCONE: A global infrastructure for the LHC Tier1 Data Center – Tier 2 Analysis Center Connectivity NORDUnet Nordic NDGF-T1a NDGF-T1c DFN Germany DESY GSI DE-KIT-T1 GÉANT Europe GARR Italy INFN-Nap CNAF-T1 RedIRIS Spain PIC-T1 SARA Netherlands NIKHEF-T1 RENATER France GRIF-IN2P3 Washington CUDI Mexico UNAM CC-IN2P3-T1 Sub-IN2P3 CEA CERN Geneva CERN-T1 SLAC GLakes NE MidW SoW Geneva KISTI Korea TIFR India Korea FNAL-T1 MIT Caltech UFlorida UNeb PurU UCSD UWisc Bill Johnston, ESNet US LHCNet

22 LHCONE in Internet2: Flows to/from US Sites: Aggregate LHCONE VRF traffic 22 15G 10G 0 5G VRF Issues: Complexity, Scale Limitations, Load Balancing on Aggregated Links (LAG) not Effective Dale Finkelson, LHCONE Meeting January 2013 I2 Average ~5 Gbps Compare: US LHCNet Sustained Average of 13 Gbps Jan. 21-27 2013 LHCONE

23 DYNES: Dynamic Circuits Nationwide Created by Caltech; Led by Internet2 23 DYNES will transition to full scale Operations in 2013 DYNES is extending circuit capabilities to ~50 US campuses Turns out to be nontrivial Will be an integral part of the point-to-point service in LHCONE Partners: I2, Caltech, Michigan, Vanderbilt. Working with ESnet on dynamic circuit software http://internet2.edu/dynes

24 ANSE: Advanced Network Services for Experiments  NSF Funded, 4 US Institutes: Caltech, Vanderbilt, Michigan, UT Arlington. A US ATLAS / US CMS Collaboration  Goal: provide more efficient, deterministic workflows  Method: Interface advanced network services, including dynamic circuits, with the LHC data management systems:  PanDA in (US) Atlas  PhEDEx in (US) CMS  Includes key personnel for the data production systems  Kauschik De (PanDA Lead)  Tony Wildish (PhEDEx Lead) 24

25 Accomplishing the US LHCNet Missions External Collaborations: Internet2, ESnet, GLIF  US LHCNet (and its precursors) have been close partners with Internet2, since Internet2’s launch in 1996, and with ESnet since 1998  A common services portfolio for transatlantic network operations was worked out in between ESnet, Internet2, GEANT, CANARIE and US LHCNet (Caltech and CERN) in 2009-12  In the context of the DICE Collaboration  US LHCNet participates in the Global Lambda Integrated Facility (GLIF) which promotes the “lightpath” paradigm for data intensive science.  GLIF resources include GLIF Open Lightpath Exchanges (GOLEs), open exchange points, where R&E networks can interconnect with lambdas.  US LHCNet is collocated with 4 GOLEs, and uses them as exchange points to interconnect with some of its partners: MANLAN in New York, Starlight in Chicago, NetherLight in Amsterdam, and CERNLight in Geneva. 25

26 Accomplishing the US LHCNet Missions External Collaborations: Internet2  Our team partnered closely with Internet2 in its early days in 1996, and helped trigger its global involvement in international networks in 1997  Based on earlier joint work on the NSFNet in 1986 - 1993  Internet2’s strong support for the LHC Program is rooted in this partnership  Internet2 is now the principal partner in the US for Tier2 and Tier3 connectivity, as the principal national backbone for R&E networking in the US, together with National Lambda Rail.  Many of the LHC Tier2 and Tier3 centers in the US are connected to the Internet2 backbone through Regional networks  Caltech and US LHCNet in particular have a long-standing track record of collaboration with Internet2 on R&D projects  Through Internet2’s support of NSF-funded Ultralight network in 2003-10  Now also through the direct collaboration in the NSF funded DYNES project led by Internet2 and Caltech.  The US LHCNet management team also is part of Internet2's policy and direct-setting processes, through its Network Policy, Operations and Architecture Group (Newman) and its Network Technical Advisory Council (Newman and Barczyk). 26

27 Accomplishing the US LHCNet Missions Strategic and Tactical “Systems” Approach  Develop, evolve and maintain cost effective network capacity, equipment and topology roadmaps  Track usage and outage experience, market trends and cost drivers such as available + foreseen Transatlantic capacity and competition  Track emerging network technologies and standards, equipment trends and vendor plans  Develop cost-effective implementation and operational plans following the roadmaps, to meet annual milestones  Issue RFPs, then acquire, deploy & test new circuits  Upgrade system equipment and operations as needed: capacity; Models  Plan, test and deploy next generation equipment as needed  Develop and integrate new network and high throughput technologies as needed, to meet the advancing needs. Occasionally create new concepts.  Develop and maintain excellent vendor and R&E network partner relationships; engage in joint next-generation technology projects to gain experience, validate roadmaps, & obtain best prices  Negotiate to optimize deployments within a given cost envelope 27

28 HEP Networking and US LHCNet Summary  The US LHCNet team, and its principals, have brought international networking to the field of HEP, and have since developed and maintained the concepts, roadmap and implementations since 1982  We also developed the LHC Computing Model, global autonomous real time systems, and the state of the art in long range data transport  The team continues to shape policy and architecture in the ICFA SCIC, Internet2 NPOAG and NTAC, as well as in R&D projects and field trials  Dedicated mission oriented networks are the central elements in the network engineering solutions successfully serving DOE-supported science for the last decades: including Esnet and US LHCNet  Most notably those serving the LHC Program  CERN created the LHCOPN for Tier0 – Tier1 operations, and more recently  Caltech and CERN have the lead roles in creating and developing LHCONE supporting Tier2 and Tier3 Operations, in close partnership with ESnet and Internet2 [Dynamic circuits; SDN]  The Caltech team continues to hold a unique lead role in ongoing and near future advanced network developments serving HEP  Through the DYNES, OliMPS, ANSE projects, and the SC Conferences 28

29 HEP Networking and US LHCNet Relationships  The US LHCNet team has developed the necessary skills and expertise, in the networking area over the last 25+ years,  It has constructed long term relationships and built credibility with many key bodies on a global scale  The services and relationship that USLHCNet provide to the LHC experiments are strategically important  A focused and single voice towards all the network suppliers  A technical implementation for TA networking proven to provide such a high level of service, that major issues with specific cable systems over the years have had little to no impact on the service delivered.  The current planning for the evolution of the USLHCNet infrastructure is driven by the requirements of service reliability and market trends in the transatlantic network field.  It is a plan that continues to develop the engagement and relationships of USLHCNet on behalf of the LHC experiments on the one hand and the required technical and managerial skills of the team on the other.  These relationships are an essential, strategic asset, built over many years, to respond effectivey to both rapid and longer term changes in networking 29

30 HEP Networking and US LHCNet Vendor Independence  It is important to note that USLHCNet as a service provider to the LHC experiments, as well as a leading team member of the HEP community, is entirely dedicated to the experiments' needs  US LHCNet is therefore entirely unbiased by other agendas.  While it may be possible for other entities to construct similar infrastructure, the loss of supplier independence represents a large risk.  Example: The CERN LHCOPN activity in Europe was put under considerable pressure by one supplier, to cede complete control to them for the network.  This was resisted by CERN, and as a result the benefit is that a much richer and cost-effective network infrastructure emerged with multiple suppliers investing independently to innovate.  We have the same situation today with USLHCNet, in that we are able to maintain diversity and thus ensure the best and most innovative solutions from a range of suppliers.  We believe this to be an important strategic asset for the future. 30

31 HEP Networking and US LHCNet Way Forward: Strategy and Risk  Strategy: We have developed a carrier class operational model with cost- effective robustness features essential for Transoceanic networking.  It was remarkably successful during LHC Run 1. It is the production ready model for Run 2 at 13 TeV, with potential new discoveries  Risk: As experts in the field we have investigated and tracked possible alternatives continually, especially since 2005. No alternative provides an acceptable level of robustness, matching the operational needs, within a feasible budget envelope.  Innovation: We have led the way in terms of innovation, bringing the high throughput and global system tools proven by us, and some by ESnet on a US footprint, into a larger context and to a broader community, in US LHCNet itself as well as the DYNES, OliMPS, ANSE projects and LHCONE  Training: Our strategic and systems view, and the creation of state of the art infrastructures and methods has been a fertile ground to train the next generation of leading network and systems engineers  Openness: We are ready to consider other models, as long as they are mature enough to be viable candidates for production use for the next round  It is important to recognize the technical and strategic realities in judging possible solutions. Here again, US LHCNet’s (Caltech’s and CERN’s) experience and expertise is crucial. 31

32 US LHCNet Status and Plan Harvey Newman (PI) Artur Barczyk (Lead Engineer) California Institute of Technology US LHCNet Briefing at CERN July 8, 2013 32

33 US LHCNet: Introduction  Transatlantic mission-oriented network managed by Caltech and CERN, since 2005 [+ LEP3Net experience1985-1995; USLIC 1995 – 2005]  In partnership with ESnet, Internet2, NLR, SURFnet, GEANT and the NRENs in Europe  Funded by DOE/OHEP with a contribution from CERN (in-kind)  Mission Statement: to provide resilient, cost-effective transatlantic networking adequate to support the principal needs of the LHC physics program, with a focus on the US  Technical Missions: (1) Provide highly reliable, dedicated, high bandwidth connectivity among the US Tier1 centers and CERN [Uptime goal: 99.95%] in the context of the LHCOPN (2) Support the large data flows between US & European Tier1 and Tier2 centers, according to the US LHCNet AUP (3) From Mid-2010: Support the new LHC Computing Models, including more general Tier2 and Tier3 Operations, in the context of LHCONE 33

34 US LHCNet Implementation  US LHCNet is a Multi-vendor, Multi-layer (Layer 1,2,3) network  Six Path-diverse transatlantic links on five undersea cables, with terrestrial interconnects in the US (NY – CHI) & Europe (GVA – AMS)  Core Equipment: Ciena optical multiplexers, F10/Brocade switch routers  Offering fully resilient, seamless services to LHC experiments  A Real-time System designed for Non-stop Operation  A carefully managed set of circuits virtualized across multiple undersea cables with automated fallback provides graceful degradation in case of single or multiple outages  Real-time monitoring and some automated operations (MonaLISA)  US LHCNet Network Operations Center (NOC):  24x7x365 Coverage: On-call 3-line support; Office hours in CET, PDT  Distributed NOC: Main locations: CERN/Geneva, Caltech/Pasadena)  A small, talented team: 4 engineers with full range of skills  NOC engineers able to perform tasks from any location worldwide  Equipment and US PoP diversity:  Mitigates the effects of equipment or site outages 34

35 US LHCNet Activities  Production Operations and Support: 24 x 7 x 365  Pre-production Development and Deployment  Technical Coordination and Administration  Management, Planning, Architectural Design 35

36 US LHCNet Activity Number 1: Operations and Support  The Primary Activity: Operations and Support 24 x 7 x 365 of a highly reliable (seamlessly resilient at Layer 1) high performance production network service, including:  Equipment configuration, troubleshooting and maintenance  Configuring and maintaining Routes and peerings with all the major R&E networks  Monitoring of peak and average traffic; precisely logging and tracking throughput and down-times; troubleshooting  Interaction with the physics computing groups; strict monitoring of the performance of network transfers, as well as solving various requests and trouble tickets from users  Integration with the LHCOPN, LHCONE.  Periodic capacity and equipment upgrades as needed 36

37 US LHCNet Activities: Pre-Production Development & Deployment  A Pre-production infrastructure for network + grid development  Prepare each year for the transition to the production configuration of the next cycle (every 1-3 years)  Commission and test new equipment and methods; then move them into production  Keep up with the emergence of more cost-effective technologies; and/or new standards  The process includes 1.Demonstrating and in some cases optimizing the reliability and performance of new architectures and software in short-term field tests, and then 2.completing longer-lasting production-readiness tests prior to release of the new technologies as part of the next-round production service  SC Conferences may be the first occasion for large scale tests, with strong vendor support + little/no cost to the US LHC Program 37

38 US LHCNet Activities: Technical Coordination +Administration (1)  Consists of the day-to-day oversight of the team’s Operations and Development activities, and technical responsibility for project milestones and deliverables.  Excellent coordination also is required, given the number of partners and the variety of network-intensive activities by the LHC experiments and associated Grid projects  The Caltech and CERN network teams also have a central role in the coordinated planning, evaluation and development of both the present and next generation TA new transatlantic network solutions  In cooperation with our partner teams at the DOE labs, Internet2, ESnet, NLR, regional networks, leading US universities (Michigan, Nebraska, Vanderbilt, University of Florida, FIU, MIT and many others), and the research and education network teams of Canada (CANARIE), the Netherlands (SURFnet), GEANT and other international partners 38

39 US LHCNet Activities: Technical Coordination +Administration (2)  The Administration task also includes 1.Negotiating contracts with telecom providers and hardware manufacturers  Leveraging R&D activities to lower costs where possible 2.Formulating and managing the Requests for Proposals for the US LHCNet circuits 3.Maintaining and periodically renewing the contracts for equipment maintenance, network interconnections and peerings (where these entail charges) 4.Maintaining and periodically renewing the contracts for collocation of our equipment at our New York, Chicago and Amsterdam PoPs. 39

40 US LHCNet Activities Management, Planning and Architectural Design This Includes 1. Overall management of the team, its evolution, training and professional development 2. Developing and implementing the strategy and planning for US LHCNet operations & development, in consultation with our partners 3.Examining technology options, making design choices, + developing the architecture and site-designs at our PoPs (CERN, Starlight, MANLAN, and Amsterdam) for the next upgrade of the network 4.Tracking and evaluating current requirements for transatlantic networking, and preparing roadmaps projecting future requirements, with input from the HEP user and network communities, while also taking current and emerging technology trends into account, 5.Preparing funding proposals and reviews, reviewing and updating technical coordination plans including the major milestones 6.Developing relationships and joint R&D programs with partner programs, as well as leading network equipment and circuit vendors as appropriate, to maximize the overall benefit to the U.S. and international HEP community within a given funding envelope. 40

41 Working Methodology 41 Production Network Develop and build next generation networks Networks for HEP Research LHC + Other Experiments: LHCOPN; LHCONE GRID applications: e.g. WLCG, OSG; DISUN, ANSE Interconnection of US and EU Grid domains High performance High bandwidth Reliable network HEP & DOE Roadmaps Testbed for Network Services Development Pre-Production Transatlantic testbed: to 100G Lightpath technologies: DYNES, OESS/PSS, OSCARS, DRAC, AutoBAHN New transport protocols; Software Defined Networking Ultralight / Station / Terapaths /PlaNetS/OlimPS/ANSE; Vendor Partnerships; SC02-13 R&D efforts tailored for the specific needs of the HEP community, with direct feed-back into the high-performance production network.

42 US LHCNet Status Summary  US LHCNet continues to provide a highly available, reliable, dedicated high capacity network in support of the LHC program  Joint work with ESnet: a similar approach but US LHCNet operates the more difficult transoceanic parts of the paths to the US Tier1s  In 2013 we transitioned to a new 2 year contract with Level3 for all 6 circuits  Resulting in an annual Cost Savings of $ 476k, or 36%  Robust operation since 2007, meeting the uptime goal of 99.95% for the highest priority services, has been achieved with the current hardware  This was accomplished through careful equipment and architectural design and implementation, in-depth monitoring and automated fallback  The LHC experiments were thus shielded from frequent undersea TA cable outages; this was especially crucial at the time of the Higgs discovery  We hope and expect there will new discoveries in 2015-17  The data flow challenge will be greater: there will be larger events (more PU) and higher trigger rates. DAQ systems have been upgraded.  The challenge will be both highly reliable network capacity, and throughput  US LHCNet’s multifaceted expertise will play an important role 42

43 US LHCNet Near Term Outlook: 2013-15  Refresh US LHCNet Core: Current Generation Equipment  Layer 2 Force10  Brocade Switch Routers in 2013-14  Layer 1 Ciena Core Directors  5410 Converged Packet Optical platforms  Testing in 2013-14  Transition to Production in 2014-15  Both Layer 1 and Layer 2 equipment support OpenFlow  In time for the LHC restart: by the Winter of 2014-15  Remain at the same circuit capacity (6 10G Links) until there is a clear need  Work towards more efficient use of the available TA network resources with other major network providers in LHCONE and with the LHC experiments in ANSE  Layer 1 will continue to provide robust services at best cost  SDN will be explored to provide smarter management of the services  As now, US LHCNet will remain the main network capacity resource for the LHC Experiments 43

44 US LHCNET STATUS TODAY 44

45 US LHCNet  US LHCNet has been designed to deliver the required levels of service availability (99.95%) on transatlantic routes  Resilience has been engineered into the US LHCNet architecture to ensure nonstop operation of the service, based on:  Adequate path diversity  Equipment redundancy  PoP diversity: Two strategically located PoPs on each side of the Atlantic  Explicit path and PoP diverse backup paths for each of the US Tier1 centers 45 US LHCNet Basic Connectivity Diagram

46 US LHCNet in 2012/2013 Non-stop Operation; Circuit-oriented Services 46 PerformanceenhancingStandard Extensions: VCAT, LCAS USLHCNet, ESnet, BNL & FNAL: Equipment and link redundancy Core: Optical multiservice Switches [*] that provide resilience [*] Dynamic circuit-oriented network services with BW guarantees, with robust fallback at Layer 1: Hybrid optical network New since May 2013

47 Today’s Core Network Technology  Multiservice switches (Ciena “CoreDirectors”), aka optical multiplexers  Layer 1 / 2  SONET (line) and Ethernet (client) interfaces  VCAT+LCAS: Flexible mapping of SONET time slots to Ethernet (Layer 1.5)  Allows hitless capacity adjustment and additional protection against transatlantic link failures  Layer 2/Layer 3: Force10 Terascale Switch/Routers  Traffic aggregation  IP peerings with partner networks  Routing of management traffic 47

48 Current Capacity Allocation Summary 48 Purpose Endpoint A Endpoint B Allocated Bandwidth [OC-192 links] Allocated Bandwidth [Gbps] Tier0-Tier1 Tier1-Tier1 (primary, secondary, special req.) CERN-FNALGenevaChicago2×0.92×8.567 CERN-BNLGenevaNew York2×0.92×8.567 FNAL-FZKChicagoAmsterdam0.22.100 LHCONE Tier1/2/3New YorkAmsterdam1.09.620 Tier1 backup, GPN and other peerings GPN / FNAL backup GenevaNew York0.44.208 GPN / BNL backup FNAL-TIFR GenevaChicago0.44.208 TOTAL ALLOCATION5.654.404 Today: Very little capacity (6.7%, < 4 Gbps) remains for protection

49 Evolution of Utilization 2009-2013 49 Note Graphical Presentation: ~2 Day Averages

50 Supporting Important Special Cases  Large but badly connected sites need special solutions, e.g. TIFR  In collaboration with CERN, cross-connected to USLHCNet  Benefits to FNAL, TIFR, and the CMS collaboration at large 50 FNAL-TIFR point-to-point circuit 1Gbps Archetype of (static) point-to-point service in LHCONE FNAL-TIFR Traffic Flow Another important point-to-point circuit, FNAL/DE-KIT Not Shown

51 MonALISA Realtime Monitoring Toolset 51 1 Minute to Create an SLA Report: Customized real-time weathermap Real-time and historical link availability Real-time & historical utilization graphs Precise and Extremely Robust. Also to support Grid dashboards and operations

52 Transatlantic Service Availability Issues  TA links are more complex than purely terrestrial ones  Longer distance - more fibre, more spans, more equipment  Typically constructed from segments from multiple owners  Submarine segments: hostile environment, hard access Comparative Remarks on Outages  Terrestrial spans: shorter Time To Repair (TTR)  “Easier access” for repairs; but also for diggers, copper thieves,…  Submarine segments: less frequent, much longer TTR  “Difficult access”  longer repair time; Potential hazards: ship anchors, trawlers, geological events, sharks  Repair speed depends on Time To Arrival of repair fleet, problem location. Weather conditions can stall repairs for days to weeks. 52 High-Availability Transoceanic solutions require multiple links with carefully planned path redundancy Terrestrial: TTR measured in hours Submarine: TTR measured in days; sometimes weeks

53 Link Availability: Typical Uptimes 97-99.5%; Bad Cases Do Occur 53 98.99% 98.02% 97.34% 83.11% 97.44% 98.88% 97.54% 99.33% 99.36% 99.95% 99.92% 99.61% 99.82% Outages are unavoidable, need good design to minimize impact on service levels!

54 Service Definitions  US LHCNet implements services at Layer 1 (SONET), Layer 2 (Ethernet) and Layer 3 (IPv4 and IPv6). Through the underlying operation at Layer 1 and use of the SONET standard extensions VCAT and LCAS, US LHCNet is able to provide a variety of service levels to its users and also partner networks.  These can be classified as:  Protected: Guaranteed bandwidth through linear or mesh-protection  Partially protected: Reduced bandwidth in case of outage through use of VCAT/LCAS and mesh protection  Unprotected: Full bandwidth normally; zero in case of failure 54 All of these classes can be provided for virtual circuits (at Layer 2) or IP transit connections (at Layer 3) in combination with bandwidth requirements. Service Levels are agreed with the users or peering partners of the service.

55 High Service Availability: Impact + Expected [*] Duration of Multiple Link Outages 55 SIMULTANEOUS No. of Failed TA links Out of Six Impact on US Tier1 services (primary and secondary) Impact on Tier2 and other unprotected services Calculated expected average duration within one year 1 Link NO IMPACT Service protected, 16.8 Gbps operational per Tier1 Degraded, operational 95 Days/year 2 Links Degraded, available bandwidth per Tier1: 8.4 – 16.8 Gbps Degraded OR not operational 7 Days/year 3 Links Degraded, at least 8.4 Gbps, i.e. one primary virtual circuit, available per Tier1 Degraded OR not operational 5 Hours/year Small protection capacity in US LHCNet is enough to protect the highest priority services against single link outages.

56 New Circuit Leases: Cost Reduction without reduction in robustness  In August 2012, a new circuit lease contract was signed with Level(3), in response to a bundle offer  After competitive due diligence  Highlights:  Continue 3 existing Level(3) circuits  Add 3 new circuits, replacing previous T-Systems circuits  Circuit routing designed to meet two conditions:  No site shall be isolated due to a single event  A single failure shall not impact more than 2 circuits  Two year contract term  Per circuit, starting from circuit acceptance date:  GVA-CHI, AMS-NYC: November 15, 2012  GVA-NYC: January 23, 2013  Ongoing circuits: September 15, 2013  Bundle pricing started at the agreed commit date (Sept. 15, 2013)  Circuit Cost Reduction by 36%: from $ 1,320 k to $ 844 k per year (including NLR NYC-CHI circuit) 56

57 Strong Collaboration with CERN  CERN has been our partner in US LHCNet for many (~18) years  CERN’s direct in-kind contributions today:  2 10Gbps waves on the Geneva-Amsterdam dark fibre  A /24 IPv4 subnet  CERN also provided new shared equipment in Amsterdam:  In May 2013, CERN replaced the Force10 E600 router in Amsterdam with a new Brocade MLXe16 switch router  Belongs to CERN, co-managed with Caltech:  US LHCNet uses a number of ports (16) in Layer 2 mode  Can be used in OpenFlow mode, on a per-port basis  CERN provides the hardware and maintenance costs  US LHCNet provides the co-location cost (space + power) 57

58 SYNERGISTIC PROJECTS ANSE, DYNES, LHCONE, OLIMPS 58

59 ANSE Objectives and Approach  Integrate advanced network-aware tools in the mainstream production workflows of ATLAS and CMS  Use tools and deployed installations where they exist  Extend functionality of the tools to match experiments’ needs  Identify and develop tools and interfaces where they are missing  Deterministic, optimized workflows  Use network resource allocation along with storage and CPU resource allocation in planning data and job placement  Use accurate (as much as possible) information about the network to optimize workflows  Improve overall throughput and task times to completion  Build on several years of invested manpower, tools and ideas (some since the MONARC era, circa 2000-2006 )

60 ANSE Tool Categories  Monitoring (Alone):  Allows Reactive Use: React to “events” (State Changes) or Situations in the network  Throughput Measurements  Possible Actions: (1) Raise Alarm and continue (2) Abort/restart transfers (3) Choose different source  Topology (+ Site & Path performance) Monitoring  possible actions: (1) Influence source selection (2) Raise alarm (e.g. extreme cases such as site isolation)  Network Control: Allows Pro-active Use  Reserve Bandwidth Dynamically: prioritize transfers, remote access flows, etc.  Co-scheduling of CPU, Storage and Network resources  Create Custom Topologies  optimize infrastructure to match operational conditions: deadlines, workprofiles  e.g. during LHC running periods vs reconstruction/re-distribution

61 ANSE Current Activities  Initial sites: UMich, UTA, Caltech, Vanderbilt, CERN, UVIC  Monitoring information for workflow and transfer management  Define path characteristics to be provided to FAX and PhEDEx  On a NxN mesh of source/destination pairs  use information gathered in perfSONAR  Use LISA agents to gather end-system information  Dynamic Circuit Systems  Working with DYNES at the outset  monitoring dashboard, full-mesh connection setup and BW test  Deploy a prototype PhEDEx instance for development and evaluation  integration with network services  Potentially use LISA agents for pro-active end-system configuration

62 LHCONE Activities There are 4 (originally 5) activity areas today: 1.VRF-based IP service: a shared, General-Purpose style multipoint service, with logical separation from non-LHC traffic 2.Layer 2 Point-to-Point: development of a dynamic-circuit service for LHC data movement.  Subtask (main thrust today): develop a working prototype environment between a set of selected end-sites, spanning multiple domains 3.Software-Defined Networking (SDN): coordinate SDN /Openflow activities related to LHC data movement  Requires R&D; led by Caltech 4.Diagnostic Infrastructure: each site to have the ability to perform E2E performance tests with all other LHCONE sites Plus, Overarching: Investigate the impact of LHCONE dynamic circuits and Openflow on the LHC data analysis workflow 62

63 Re-organized LHCONE Activities 63  Short Term: VRF Peering Fabric  Challenges: Scalability, load balancing; no traffic management  Need a better solution: Point-to-point circuits

64 Software Defined Networking (SDN) in a Nutshell SDN is a new paradigm being widely adopted by both major R&E networks and equipment manufacturers Simply put, SDN is based on the physical separation of the control and data planes The control plane is managed by an external software controller, typically running on a PC Openflow is a protocol between the controller entity and the network devices Mission: Develop and apply SDN/Openflow to solve problems where no commercial solution exists Controller (PC) Flow Tables App OpenFlow Switch MAC src MAC dst … ACTION OpenFlow Protocol Net Device

65 OLiMPS Openflow Link-layer Multipath Switching  2 year project funded by DOE OASCR  PI: Harvey Newman  Technical Lead: Artur Barczyk  Lead Developer: Michael Bredel  Research Focus: Efficient interconnection between nodes in a network  First Use Case: LHCONE Multipath problem solution  Allows for per-flow multipath switching, which  Increases the robustness  Increases efficiency  Simplifies management of layer 2 network resources  Construct a robust multi-path system without modifications to the Layer 2 frame structure, using central out-of-band software control  A Big Plus: using Openflow, there is no need for new hardware or feature support (other than Openflow)  Caveat: coding is required, not for the faint-hearted  (No, we cannot just buy a controller) 65

66 Possible OpenFlow use case in LHCONE Solve the Multipath problem  Initiated by Caltech and SARA; now continued by Caltech with SURFnet  Caltech: OLiMPS project (DOE OASCR)  Implement multipath control functionality using Openflow  SARA: investigations of use of MPTCP  Basic idea: Flow-based load balancing over multiple paths  Initially: use static topology, and/or bandwidth allocation (e.g. NSI)  Later: comprehensive real-time information from the network (utilization, topology changes) as well as interface to applications  MPTCP on end-hosts  Demonstrated: at GLIF 2012, SC’12, TNC 2012 66 CHICAGO AMS GVA

67 OLiMPS Interesting Results Example  Local Experimental Setup  5 link-disjoint paths, 5 Openflow switches  1 to 10 parallel transfers  Single transfer (with multiple files) takes approximately 90 minutes  File sizes between 1 and 40 GByte (Zipf); 500 GByte in total  Exponentially distributed inter-transfer waiting times  Compared 5 different flow mapping algorithms  Application-aware or number- of-flows path mapping gives best results 67 Michael Bredel Caltech

68 Next Steps: OLiMPS and OSCARS OLiMPS/OSCARS Interface  User (or application) requests network setup from OLiMPS controller  OLiMPS requests setup of multiple paths from OSCARS-IDC  OLIMPS connects OpenFlow switches to OSCARS termination points, i.e. VLANs  OLiMPS transparently maps the site traffic to the VLANs

69 Applying OLiMPS in USLHCNET Operations  Currently building the US LHCNet Openflow pre-production environment !  OLiMPS controller  Using Dell z9000 switches with Openflow 1.0 firmware  Initially separate from production environment  Gradually add production traffic after development and initial test phase  End target: include Brocade MLXe16 and CIENA 5410 Openflow Capabilities 69 Additional CIENA modules required – kindly provided on loan by Internet2

70 US LHCNET PLANS The Planned Evolution 70

71 US LHCNet Refresh Equipment and Methods for 2013-18  Time scale: 2013-2014, in time for the LHC restart at 13 TeV  Mission and Major Technical Requirements:  Robust, flexible, scalable; Cost effective  Low overhead, simple management  Equipment able to support the emerging technologies of 2013-20  Build on current and emerging technologies and methods, opening the way to new, innovative capabilities of benefit to the LHC program  Integrated into/part of a global system: the LHCONE fabric  Integrated into LHC Experiments’ Workload Management systems  The present US LHCNet core equipment is 6-7 years old  Layer2: Force10 switch/routers are out of maintenance (urgent replacement starting now)  Layer 1: CoreDirector are declared Manufacture-Discontinued, will be “End of Life’d” (EOL’d) soon, but support for the next 5+ years (less urgent, but blocks > 10 Gbps deployment)  It is now time to refresh the Layer 1 and 2 platforms + concepts 71

72 US LHCNet Refresh Action Plan and Opportunities  Remain at current capacity, be ready to increase if needed (according to Experiments’ plans and/or projections) in time for LHC restart  The Actions: Building on Current Experience, plus our R&D Projects  Layer 2 Refresh in 2013-14: Brocade Switch Routers  Layer 1 Refresh In 2014-15: Replace Ciena Core Directors by Ciena 5410 Converged Packet Optical platforms; testing in 2013-14  Capable of operations through ~2020 including 40/100 G (then 400G)  Deploy new Software Defined Network (SDN) techniques, in concert with our LHCONE partners. Led by Caltech/US LHCNet in LHCONE.  Examples of Major Advantages in Addition to Layer 1+2 Functionality:  Network Virtualization: used widely in data centers today  Flexible and scalable solutions (“elastic cloud”), but inter-data center, inter-cloud virtualization in WAN requires new approaches (Openflow)  RDMA: very high-throughput, low CPU overhead data transfers (cf. SC12)  Requires Layer 2 end-to-end; performance needs dedicated bandwidth 72

73 New USLHCNet Architecture Basic Approach  Keep Solid Carrier-Grade Layer 1 Basis  Necessary for robust operation at Layer 2 and Layer 3  Key for reaching high availability; seamless operation  Need hardware upgrade to keep up with technology  40, 100 Gbps (Ethernet, WAN)  OTN (replaces SONET as long-haul technology)  Replace Aging (End of Manufacturing) equipment  Keep Solid but Simplified Layer 2 and Layer 3 Configuration  Centralize routing in one PoP  HW upgrade: current platform reached End of Life (EOL); need support for 40/100G, VRF, etc.  Move towards SDN solutions in production network  Flow management based on OLiMPS project  Need robust controller solution! (We will contribute) 73 The Solid Basis for TA advanced operations and services! Routed IP will stay: for Management, GPN traffic, and backup

74 Transition to New Architecture In 2013-2014  Simplification and cost reduction through  Consolidation/Reduction of Routers: Routing done at a single PoP, in Geneva: 1 Optical Multiplexers + 1 Core Router + 1 (small) backup router  Next generation aggregation switch platform  Cheaper, small form-factor (1U, 2U) 10/40/100GE capable switches 74 Switching and routing: Current Switching and routing: Next generation Aggregation switches not shown

75 Layer 2 and Layer 3 Services and Peerings 75 Aggregation switches not shown

76 Device Resiliency in Geneva (I)  Full-fledged resiliency would require duplicate routers and optical switches at the Geneva PoP (CERN)  A cost effective way to have sufficient resiliency is to back-up the Main router by a small secondary router  With very few interfaces (4 or 8 10GE interfaces), but full-sized CAM for the entire Internet routing table.  This configuration is:  Cost efficient (~10-20k$)  Robust: Loss of the main router will result in some degradation of the routed service, but no interruption  To Note: the Main Services in USLHCNet are high capacity Layer 2 circuits 76

77 Device Resiliency in Geneva (II)  The Ciena 5410 optical switch is a carrier-grade device, with built-in redundancy for each critical component  Control modules, switching fabric, power supplies  In the unlikely case of a complete device failure, we’ll profit from our collaboration with CERN, and fall back on their 100Gbps connection between CERN’s switch/routers at CERN and Amsterdam 77 Normal OperationFallback (unlikely case)

78 Implementation Plan (2013-2015)  Step 1 [Q3 2013]: Install Brocade MLXe16 router at CERN  Verify, prepare for transition  Step 2 [Q3 2013]: Remove Force10 switch/routers at CERN, MANLAN and Starlight  Single routing instance, verify  Step 3 [Q3 2013]: Install 2 Ciena 5410 optical nodes at Geneva, Chicago  Evaluation agreement  Transport (OTN) functionality only  Run in parallel with the CoreDirector network  Step 4 [Q4 2013]: install Layer 2 (ESLM) modules in 5410  Evaluate Full Layer 1 and 2 functionality  Step 5: Install 2 Ciena 5410 optical nodes (New York, Amsterdam)  Summer 2014 – Winter 2014-15  Complete transition, possibly use higher-capacity interfaces (100G) 78

79 Manpower Table for 2013  4 FTEs are required: 2013 includes start of network refresh  Also involvement in LHCONE, working with the NRENs and the LHC experiments 79 TasksFTE 2013 Harvey Newman (PI) Principal InvestigatorOverall project management, supervision and planning [0.20] Artur Barczyk Lead EngineerOperations management, planning, design and coordination 1.00 Azher Mughal Network EngineerOperations, planning, deployment0.70 Ramiro Voicu Network Software Engineer Operations support, pre-production testing, monitoring/SW development 0.70 Sandor Rozsa (until 03/31/2013) Network EngineerOperations, deployment0.25 Indira Kassymkhanova (since 06/06/2013) Network EngineerOperations, deployment0.60 Michael Bredel Systems EngineerOperations support0.2 Iosif Legrand Software Architect, Network Engineer Monitoring/Software development0.10 New-Hire Engineer Operations, software development0.45 Total Engineering 4.0

80 2013 Expenditure Plan  The plan for 2013 is based on the required  6 Transatlantic Circuits  4.0 FTEs 80 ItemCosts in k$ Circuit lease900 Switching and routing equipment155 Colocation and HW and SW Maintenance234 Salaries for Network engineers, Admin Support (incl. overhead) 626 Tooling, monitoring HW, technical and office supplies35 Travel50 Total2,000

81 Cost Outlook Through 2016: Upgrade  A rough estimate of upgrade costs is$ 1.0 M  based on some known cost figures  device common platform items and modules available today  and several estimates  e.g. modules with nearby release dates, circuit costs evolution  a particular capacity evolution scenario  keeping total expense for circuits constant (more bang for the buck)  This can be covered with a stable spending profile over the next 3-4 years:  This scenario provides $ 155k for Layer 2&3 and $ 893k for Layer 1 refresh - total upgrade budget of $ 1.048M over a 4 year period  The upgrade is starting this Summer; to be mainly completed by the Restart 81 Proposed stable spending profile in k$ 2010201120122013201420152016 Costs/Year2,3282,200 2,0002,200 Of which for upgrade155324298271 Other scenarios are open for discussion An equipment refresh cycle each ~7 years

82 Upgrade schedule  We can do the hardware refresh just-in-time for LHC restart:  Step 1 [Q3 2013]: Install Brocade MLXe16 router at CERN  Step 2 [Q3 2013]: Remove Force10 switch/routers at CERN, MANLAN and Starlight  Step 3 [Q3 2013]: Install 2 Ciena 5410 optical nodes at Geneva, Chicago  Step 4 [Q4 2013]: install Layer 2 (ESLM) modules in 5410  Step 5 [Q2 2014]: Install 2 Ciena 5410 optical nodes at New York, Amsterdam 82 This plan requires us to start as soon as possible on its implementation Need Operations Management’s ‘go-ahead’ now, in order to be ready in time for the end of LS1

83 US LHCNet Summary and Conclusions  US LHCNet continues to provide the robust cost effective transatlantic network supporting the LHC experiments  The well-designed high availability features were crucial in shielding the experiments from the impact of frequent cable outages  Including at the time of the Higgs discovery  We are looking forward + preparing for more discoveries in 2015-17  It is time for equipment refreshes to be completed by the restart  Current generation equipment for 2014-20 including higher BW  Layer 2 in 2013-14  Layer 1 in 2014-15, with testing starting this Fall  We are leading and working with partner projects on deploying the key next generation technologies and methods  Dynamic circuits, SDN-driven production operations  We are integrating these advanced methods and tools with the mainstream data distribution and analysis systems of ATLAS + CMS  All of the above aspects and activities are of strategic importance, and crucial to the continued success of the LHC Program 83

84 BACKUP SLIDES 84

85 OLiMPS Solution: a Robust and Flexible Software Driven Multi-path System  Construct a robust multi-path system without modifications to the Layer 2 frame structure, using central out-of-band software control  A Big Plus: using Openflow, there is no need for new hardware or feature support (other than Openflow)  Addresses the problem of topology limitations in large-scale Layer 2 networks  Remove the necessity of a tree structure in the topology typically achieved though the use of the Spanning Tree Protocol  Allows for per-flow multipath switching, which  Increases the robustness  Increases efficiency  Simplifies management of layer 2 network resources

86 LHCONE Virtual Routing and Forwarding (VRF) IP Servic e  VRF domains interconnect through BGP peerings at Open Exchange Points  VRF serves to (logically) separate LHC traffic within a domain  provides a “hook” for traffic engineering  Has been implemented in several NREN domains  Internet2, ESnet, GEANT, EU NRENs  The VRF domains peer at multiple locations  Raises the question of efficient use of multiple available paths  Tools available: Multi-exit Discriminators (MEDs), AS prepend, Local Preferences  Short term: MEDs will be exchanged between the peering partners, agreed to be favored over AS prepend (when possible)  Long-term: need a better solution as scalability and optimization is not guaranteed; by any of the above solutions  Challenges: Scalability, load balancing, traffic engineering 86

87 87 SC12 November 14-15 2012 Caltech-Victoria-Michigan-Vanderbilt; BNL 87 FDT Memory to Memory 300+ Gbps In+Out Sustained from Caltech, Victoria, UMich To 3 Pbytes Per Day US LHCNet Team and Partners Have defined the state of the art in high throughput long range transfers since 2002 FDT Storage to Storage http://monalisa.caltech.edu/ FDT/ 175 Gbps (186 Gbps Peak) Extensive use of FDT, Servers with 40G Interfaces. + RDMA/Ethernet 1 Server Pair: to 80 Gbps (2 X 40GE)

88 MonALISA Today Running 24 X 7 at 370 Sites  Monitoring  50,000 computers  > 100 Links On Major Research and Education Networks  Using Intelligent Agents  Tens of Thousands of Grid jobs running concurrently  Collecting > 1,500,000 parameters in real-time Monitoring the Worldwide LHC Grid State of the Art Technologies Developed at Caltech MonALISA: Monitoring Agents in a Large Integrated Services Architecture A Global Autonomous Realtime System

89 89 Usage Characteristics of Instruments and Facilities Fairly consistent requirements are found across the large-scale sciences Large-scale science uses distributed applications systems in order to: – Couple existing pockets of code, data, and expertise into “systems of systems” – Break up the task of massive data analysis into elements that are physically located where the data, compute, and storage resources are located Identified types of use include – Bulk data transfer with deadlines This is the most common request: large data files must be moved in an length of time that is consistent with the process of science – Inter process communication in distributed workflow systems This is a common requirement in large-scale data analysis such as the LHC Grid-based analysis systems – Remote instrument control, coupled instrument and simulation, remote visualization, real-time video Hard, real-time bandwidth guarantees are required for periods of time (e.g. 8 hours/day, 5 days/week for two months) Required bandwidths are moderate in the identified apps – a few hundred Mb/s – Remote file system access A commonly expressed requirement, but very little experience yet W. Johnston, ESnet

90 90 Example: VL 3500 – FNAL Primary-1 3.0G Fiber cut

91 91 VL3506 – FNAL Primary-2 Fiber cut

92 92 VL3501 – FNAL Backup Backup path circuit traffic during fiber cut

93 93 FNAL (LHC Tier 1 site) Outbound Traffic (courtesy Phil DeMar, Fermilab) Overall ESnet traffic tracks the very large science use of the network Red bars = top 1000 site to site workflows Starting in mid-2005 a small number of large data flows dominate the network traffic Note: as the fraction of large flows increases, the overall traffic increases become more erratic – it tracks the large flows Observation: A small number of large data flows now dominate the network traffic – this one factor motivating virtual circuits as a network service: For traffic management No flow data available Orange bars = OSCARS virtual circuit flows 6000 4000 2000 Terabytes/month accepted traffic

94 Open Exchange Points: NetherLight Example 1 X 100G, 3 x 40G, 30+ 10G Lambdas, Use of Dark Fiber Convergence of Many Partners on Common Lightpath Concepts Internet2, ESnet, GEANT, USLHCNet; nl, cz, ru, be, pl, es, tw, kr, hk, in, nordic www.glif.is 3 x 40G 1 x 100G 3 x 40G 1 x 100G Inspired Other Open Lightpath Exchanges Daejon (Kr) Hong Kong Tokyo Praha (Cz) Seattle Chicago Miami New York Inspired Other Open Lightpath Exchanges Daejon (Kr) Hong Kong Tokyo Praha (Cz) Seattle Chicago Miami New York 2013: Dynamic Lightpaths + IP Services Above 10G

95 95 Services Characteristics of Instruments and Facilities Such distributed application systems are – data intensive and high-performance, frequently moving terabytes a day for months at a time – high duty-cycle, operating most of the day for months at a time in order to meet the requirements for data movement – widely distributed – typically spread over continental or inter-continental distances – depend on network performance and availability however, these characteristics cannot be taken for granted, even in well run networks, when the multi-domain network path is considered therefore end-to-end monitoring is critical W. Johnston, ESnet

96 96  Get guarantees from the network The distributed application system elements must be able to get guarantees from the network that there is adequate bandwidth to accomplish the task at the requested time  Get real-time performance information from the network The distributed applications systems must be able to get real-time information from the network that allows graceful failure and auto-recovery adaptation to unexpected network conditions that are short of outright failure  Available in an appropriate programming paradigm These services must be accessible within the Web Services / Grid Services paradigm of the distributed applications systems See, e.g., [ICFA SCIC] Services Requirements of Instruments and Facilities W. Johnston, ESnet

97 97 The OSCARS L3 Fail-Over Mechanism FNAL to CERN The OSCARS circuits provide to FNAL are used as pseudowires on top of which FNAL implements a routed IP network Primary-1 and primary-2 are BGP costed the same, and so share the load in normal operation, with a potential for 20G total Secondary-1 (a much longer path) is costed higher and so only gets traffic when primary-1 and primary-2 are down This is controlled by FNAL and may be changed by them at any time  A dedicated mission-oriented network, as implemented by US LHCNet and ESNet is the essential, central element of the network engineering solution for the LHC Program This is also the leading exemplar of how the major DOE and other networks serve leading edge science; with tangible benefits to HEP.

98 The ESnet [and US LHCNet] Planning Process 1) Observing current and historical network traffic patterns – What do the trends in network patterns predict for future network needs? 2) Exploring the plans and processes of the major stakeholders (the Office of Science programs, scientists, collaborators, and facilities): 2a) Data characteristics of scientific instruments and facilities What data will be generated by instruments and supercomputers coming on-line over the next 5-10 years? 2b) Examining the future process of science How and where will the new data be analyzed and used – that is, how will the process of doing science change over 5-10 years?  Note that we do not ask users what network services they need, we ask them what they are trying to accomplish and how (and where all of the pieces are) then the network engineers derive network requirements  Users do not know how to use high-speed networks so ESnet assists with knowledge base (fasterdata.es.net) and direct assistance for big users


Download ppt "Networking for High Energy Physics and US LHCNet In partnership with ESnet, Internet2,Starlight, SURFNet and GLIF Harvey Newman (Caltech, PI) Artur Barczyk."

Similar presentations


Ads by Google