Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael.

Similar presentations


Presentation on theme: "1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael."— Presentation transcript:

1 1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael S. Collins, Stan Kluz, Joseph Burrescia, and James V. Gagliardi, ESnet Leads and the ESnet Team Lawrence Berkeley National Laboratory

2 2 TWC JGI SNLL LBNL SLAC YUCCA MT BECHTEL PNNL LIGO INEEL LANL SNLA Allied Signal PANTEX ARM KCP NOAA OSTI ORAU SRS ORNL JLAB PPPL ANL-DC INEEL-DC ORAU-DC LLNL/LANL-DC MIT ANL BNL FNAL AMES 4xLAB-DC NERSC NREL ALB HUB LLNL GA DOE-ALB SDSC Japan GTN&NNSA International (high speed) OC192 (10G/s optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet (1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155 Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s) Office Of Science Sponsored (22) NNSA Sponsored (12) Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) Laboratory Sponsored (6) QWEST ATM 42 end user sites ESnet IP GEANT - Germany - France - Italy - UK - etc Sinet (Japan) Japan – Russia(BINP) CA*net4 CERN MREN Netherlands Russia StarTap Taiwan (ASCC) CA*net4 KDDI (Japan) France Switzerland Taiwan (TANet2) Australia CA*net4 Taiwan (TANet2) Singaren ESnet core: Packet over SONET Optical Ring and Hubs IPv6: backbone and numerous peers ELP HUB SNV HUB CHI HUB NYC HUB ATL HUB DC HUB peering points MAE-E Starlight Chi NAP Fix-W PAIX-W MAE-W NY-NAP PAIX-E Euqinix PNWG SEA HUB ESnet Connects DOE Facilities and Collaborators hubs SNV HUB Abilene

3 STARLIGHT MAE-E NY-NAP PAIX-E GA LBNL ESnet Logical Infrastructure Connects the DOE Community With its Collaborators ESnet Peering (connections to other networks) Commercial NYC HUBS SEA HUB Japan SNV HUB MAE-W FIX-W PAIX-W 26 PEERS CA*net4 CERN MREN Netherlands Russia StarTap Taiwan (ASCC) Abilene + 7 Universities 22 PEERS MAX GPOP GEANT - Germany - France - Italy - UK - etc SInet (Japan) KEK Japan – Russia (BINP) Australia CA*net4 Taiwan (TANet2) Singaren 20 PEERS 3 PEERS LANL TECHnet 2 PEERS 39 PEERS CENIC SDSC PNW-GPOP CalREN2 CHI NAP Distributed 6TAP 19 Peers 2 PEERS KDDI (Japan) France EQX-ASH 1 PEER 5 PEERS ESnet provides complete access to the Internet by managing the full complement of Global Internet routes (about 150,000) at 10 general/commercial peering points + high-speed peerings w/ Abilene and the international networks. ATL HUB University International Commercial Abilene EQX-SJ Abilene 6 PEERS Abilene

4 4  ESnet New Architecture Goal MAN rings provide dual site and hub connectivity A second backbone ring will multiply connect the MAN rings to protect against hub failure Europe Asia- Pacific ESnet Core/Backbone New York (AOA) Chicago (CHI) Sunnyvale (SNV) Atlanta (ATL) Washington, DC (DC) El Paso (ELP) DOE sites

5 5 NERSC LBNL Joint Genome Institute SLAC SF Bay Area Qwest / ESnet hub mini ring SF BA MAN ring topology – phase 1 Existing ESnet Core Ring Chicago El Paso First Step: SF Bay Area ESnet MAN Ring Increased reliability and site connection bandwidth Phase 1 o Connects the primary Office of Science Labs in a MAN ring Phase 2 o LLNL, SNL, and UC Merced Ring should not connect directly into ESnet SNV hub (still working on physical routing for this) Have not yet identified both legs of the mini ring NLR / UltraScienceNet Seattle and Chicago LA and San Diego Level 3 hub

6 6  Traffic Growth Continues Annual growth in the past five years has increased from 1.7x annually to just over 2.0x annually. TBytes/Month ESnet Monthly Accepted Traffic ESnet is currently transporting about 250 terabytes/mo.

7 7 Traffic coming into ESnet = Green Traffic leaving ESnet = Blue Traffic between sites % = of total ingress or egress traffic Note that more that 90% of the ESnet traffic is OSC traffic ESnet Appropriate Use Policy (AUP) All ESnet traffic must originate and/or terminate on an ESnet an site (no transit traffic is allowed) Who Generates Traffic, and Where Does it Go? ESnet Inter-Sector Traffic Summary, Jan 2003 / Feb 2004 (1.7X overall traffic increase, 1.9X OSC increase) (the international traffic is increasing due to BABAR at SLAC and the LHC tier 1 centers at FNAL and BNL) Peering Points Commercial R&E (mostly universities) International 21/14% 17/10% 9/26% 14/12% 10/13% 4/6% ESnet ~25/18% DOE collaborator traffic, inc. data 72/68% 53/49% DOE is a net supplier of data because DOE facilities are used by universities and commercial entities, as well as by DOE researchers DOE sites

8 8 ESnet Top 20 Data Flows, 24 hrs., 2004-04-20 Fermilab (US)  CERN SLAC (US)  IN2P3 (FR) 1 terabyte/day SLAC (US)  INFN Padva (IT) Fermilab (US)  U. Chicago (US) CEBAF (US)  IN2P3 (FR) INFN Padva (IT)  SLAC (US) U. Toronto (CA)  Fermilab (US) DFN-WiN (DE)  SLAC (US) DOE Lab  DOE Lab SLAC (US)  JANET (UK) Fermilab (US)  JANET (UK) Argonne (US)  Level3 (US) Argonne  SURFnet (NL) IN2P3 (FR)  SLAC (US) Fermilab (US)  INFN Padva (IT) A small number of science users account for a significant fraction of all ESnet traffic

9 Top 50 Traffic Flows Monitoring – 24hr 2 Int’l and 2 Commercial Peering Points 10 flows > 100 GBy/day More than 50 flows > 10 GBy/day

10 10 LBNL PPPL BNL AMES Remote Engineer partial duplicate infrastructure DNS Remote Engineer partial duplicate infrastructure TWC Remote Engineer  Disaster Recovery and Stability The network must be kept available even if, e.g., the West Coast is disabled by a massive earthquake, etc. ATL HUB SEA HUB ALB HUB NYC HUBS DC HUB ELP HUB CHI HUB SNV HUB Duplicate Infrastructure Currently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub Engineers, 24x7 Network Operations Center, generator backed power Spectrum (net mgmt system) DNS (name – IP address translation) Eng database Load database Config database Public and private Web E-mail (server and archive) PKI cert. repository and revocation lists collaboratory authorization service Reliable operation of the network involves remote Network Operation Centers (3) replicated support infrastructure generator backed UPS power at all critical network and infrastructure locations high physical security for all equipment non-interruptible core - ESnet core operated without interruption through o N. Calif. Power blackout of 2000 o the 9/11/2001 attacks, and o the Sept., 2003 NE States power blackout

11 11 Disaster Recovery and Stability Duplicate NOC infrastructure to AoA hub in two phases, complete by end of the year o 9 servers – dns, www, www-eng and noc5 (eng. databases), radius, aprisma (net monitoring), tts (trouble tickets), pki-ldp (certificates), mail

12  Maintaining Science Mission Critical Infrastructure in the Face of Cyberattack A Phased Response to Cyberattack is being implemented to protects the network and the ESnet sites The phased response ranges from blocking certain site traffic to a complete isolation of the network which allows the sites to continue communicating among themselves in the face of the most virulent attacks o Separates ESnet core routing functionality from external Internet connections by means of a “peering” router that can have a policy different from the core routers o Provide a rate limited path to the external Internet that will insure site- to-site communication during an external denial of service attack o Provide “lifeline” connectivity for downloading of patches, exchange of e-mail and viewing web pages (i.e.; e-mail, dns, http, https, ssh, etc.) with the external Internet prior to full isolation of the network

13 13 Phased Response to Cyberattack LBNL ESnet router border router X peering router Lab gateway router ESnet second response – filter traffic from outside of ESnet  Lab first response – filter incoming traffic at their ESnet gateway router ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific “lifeline” services X peering router gateway router border router router attack traffic X ESnet first response – filters to assist a site  Sapphire/Slammer worm infection created a Gb/s of traffic on the ESnet core until filters were put in place (both into and out of sites) to damp it out.

14 14 Phased Response to Cyberattack Architecture to allow phased response to cybersecurity attacks lifeline communications during lockdown conditions. Design the Architecture Software; site, core and peering routers topology, and; hardware configuration 1Q04 Design and test lifeline filters Configuration of filters specified4Q04 Configure and test fail-over and filters Fail-over configuration is successful4Q04 In production The backbone and peering routers have a cyberattack defensive configuration 1Q05

15 15  Grid Middleware Services ESnet is the natural provider for some “science services” – services that support the practice of science o ESnet is trusted, persistent, and has a large (almost comprehensive within DOE) user base o ESnet has the facilities to provide reliable access and high availability through assured network access to replicated services at geographically diverse locations  However, service must be scalable in the sense that as its user base grows, ESnet interaction with the users does not grow (otherwise not practical for a small organization like ESnet to operate)

16 16 Grid Middleware Requirements (DOE Workshop) A DOE workshop examined science driven requirements for network and middleware and identified twelve high priority middleware services (see www.es.net/#research) Some of these services have a central management component and some do not Most of the services that have central management fit the criteria for ESnet support. These include, for example o Production, federated RADIUS authentication service o PKI federation services o Virtual Organization Management services to manage organization membership, member attributes and privileges o Long-term PKI key and proxy credential management o End-to-end monitoring for Grid / distributed application debugging and tuning o Some form of authorization service (e.g. based on RADIUS) o Knowledge management services that have the characteristics of an ESnet service are also likely to be important (future)

17 17 Science Services: PKI Support for Grids Public Key Infrastructure supports cross-site, cross- organization, and international trust relationships that permit sharing computing and data resources and other Grid services DOEGrids Certification Authority service provides X.509 identity certificates to support Grid authentication provides an example of this model o The service requires a highly trusted provider, and requires a high degree of availability o Federation: ESnet as service provider is a centralized agent for negotiating trust relationships, e.g. with European CAs o The service scales by adding site based or Virtual Organization based Registration Agents that interact directly with the users o See DOEGrids CA (www.doegrids.org)

18 18 ESnet PKI Project DOEGrids Project Milestones o DOEGrids CA in production June, 2003 o Retirement of initial DOE Science Grid CA (Jan 2004) o “Black rack” installation completed for DOE Grids CA (Mar 2004) New Registration Authorities o FNAL (Mar 2004) o LCG (LHC Computing Grid) catch-all: near completion o NCC-EPA: in progress Deployment of NERSC “myProxy” CA Grid Integrated RADIUS Authentication Fabric pilot

19 19 PKI Systems Secure racks Secure Data Center Building Security LBNL Site security Internet Fire Wall Bro Intrusion Detection Vaulted Root CA HSM DOEGrids Security RAs and certificate applicants

20 20 Science Services: Public Key Infrastructure The rapidly expanding customer base of this service will soon make it ESnet’s largest collaboration service by customer count Registration Authorities ANL LBNL ORNL DOESG (DOE Science Grid) ESG (Climate) FNAL PPDG (HEP) Fusion Grid iVDGL (NSF-DOE HEP collab.) NERSC PNNL

21 21 ESnet PKI Project (2) New CA initiatives: o FusionGrid CA o ESnet SSL Server Certificate CA o Mozilla browser CA cert distribution Script-based enrollment Global Grid Forum documents o Policy Management Authority Charter o OCSP (Online Certificate Status Protocol) Requirements For Grids o CA Policy Profiles

22 22 Grid Integrated RADIUS Authentication Fabric RADIUS routing of authentication requests Support One-Time Password initiatives o Gateway Grid and collaborative uses: standard UI and API o Provide secure federation point with O(n) agreements o Support multiple vendor / site OTP implementations o One token per user (SSO-like solution) for OTP Collaboration between ESnet, NERSC, a RADIUS appliance vendor, PNNL and ANL are also involved, others welcome White paper/report ~ 01 Sep 2004 to support early implementers, proceed to pilot Project pre-proposal: http://www.doegrids.org/CA/Research/GIRAF.pdf http://www.doegrids.org/CA/Research/GIRAF.pdf

23 23  Collaboration Service H323 showing dramatic increase in usage

24 24  Grid Network Services Requirements (GGF, GHPN) Grid High Performance Networking Research Group, “Networking Issues of Grid Infrastructures” (draft-ggf-ghpn- netissues-3) – what networks should provide to Grids o High performance transport for bulk data transfer (over 1Gb/s per flow) o Performance controllability to provide ad hoc quality of service and traffic isolation.  Dynamic Network resource allocation and reservation o High availability when expensive computing or visualization resources have been reserved o Security controllability to provide a trusted and efficient communication environment when required o Multicast to efficiently distribute data to group of resources. o Integrated wireless network and sensor networks in Grid environment

25 25 Priority Service So, practically, what can be done? With available tools can provide a small number of provisioned, bandwidth guaranteed, circuits o secure and end-to-end (system to system) o various Quality of Service possible, including minimum latency o a certain amount of route reliability (if redundant paths exist in the network) o end systems can manage these circuits as single high bandwidth paths or multiple lower bandwidth paths of (with application level shapers) o non-interfering with production traffic, so aggressive protocols may be used

26 26 Guaranteed Bandwidth as an ESNet Service user system2 user system1 site B policer site A will probably be service level agreements among transit networks allowing for a fixed amount of priority traffic – so the resource manager does minimal checking and no authorization will do policing, but only at the full bandwidth of the service agreement (for self protection) resource manager authorization resource manager resource manager allocation will probably be relatively static and ad hoc bandwidth broker A DOE Network R&D funded project user system2 Phase 1 Phase 2

27 27  Network Monitoring System Alarms & Data Reduction o From June 2003 through April 2004 the total number of NMS up/down alarms was 16,342 or 48.8 per day. o Path based outage reporting automatically isolated 1,448 customer relevant events during this period or an average of 4.3 per day, more than a 10 fold reduction. o Based on total outage duration in 2004, approximately 63% of all customer relevant events have been categorized as either “Planned” or “Unplanned” and one of “ESnet”, “Site”, “Carrier” or “Peer” Gives us a better handle on availability metric

28 28 2004 Availability by Month Unavailable Minutes Jan. – June, 2004 – Corrected for Planned Outages (More from Mike O’Connor) >99.9% available <99.9% available

29 29  ESnet Abilene Measurements 3 ESnet Participants o LBL o FERMI o BNL 3 Abilene Participants o SDSC o NCSU o OSU We want to ensure that the ESnet/Abilene cross connects are serving the needs of users in the science community who are accessing DOE facilities and resources from universities or accessing university facilities from DOE labs. Measurement sites in place: More from Joe Metzger

30 30 OWAMP One-Way Delay Tests Are Highly Sensitive NCSU Metro DWDM reroute adds about 350 micro seconds Fiber Re-Route 42.0 41.9 41.8 41.7 41.6 41.5 ms

31 31  ESnet Trouble Ticket System TTS used to track problem reports for the Network, ECS, DOEGrids, Asset Management, NERSC, and other services. Running Remedy ARsystem server and Oracle database on a Sun Ultra workstation. Total external ticket = 11750 (1995-2004), approx. 1300/year Total internal tickets = 1300 (1999-2004), approx. 250/year

32 32 Conclusions ESnet is an infrastructure that is critical to DOE’s science mission and that serves all of DOE Focused on the Office of Science Labs ESnet is working on providing the DOE mission science networking requirements with several new initiatives and a new architecture QoS service is hard – but we believe that we have enough experience to do pilot studies Middleware services for large numbers of users are hard – but they can be provided if careful attention is paid to scaling


Download ppt "1 ESNet Update Joint Techs Meeting, July 19, 2004 William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael."

Similar presentations


Ads by Google