Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ESnet Network Requirements ASCAC Networking Sub-committee Meeting April 13, 2007 Eli Dart ESnet Engineering Group Lawrence Berkeley National Laboratory.

Similar presentations


Presentation on theme: "1 ESnet Network Requirements ASCAC Networking Sub-committee Meeting April 13, 2007 Eli Dart ESnet Engineering Group Lawrence Berkeley National Laboratory."— Presentation transcript:

1 1 ESnet Network Requirements ASCAC Networking Sub-committee Meeting April 13, 2007 Eli Dart ESnet Engineering Group Lawrence Berkeley National Laboratory

2 2 Overview  Requirements are primary drivers for ESnet – science focused  Sources of Requirements  Office of Science (SC) Program Managers  Direct gathering through interaction with science users of the network  Example case studies (updated 2005/2006)  Magnetic Fusion  Large Hadron Collider (LHC)  Climate Modeling  Spallation Neutron Source  Observation of the network  Other requirements  Requirements aggregation  Convergence on a complete set of network requirements

3 3  Requirements from SC Program Managers SC Program Offices have determined that ESnet future priorities must address the requirements for: – Large Hadron Collider (LHC), CERN – Relativistic Heavy Ion Collider (RHIC), BNL, US – Large-scale fusion (ITER), France – High-speed connectivity to Asia-Pacific Climate and Fusion – Other priorities and guidance from SC will come from upcoming per- Program Office requirements workshops, beginning this summer Modern science infrastructure is too large to be housed at any one institution – Structure of DOE science assumes the existence of a robust, high- bandwidth, feature-rich network fabric that interconnects scientists, instruments and facilities such that collaboration may flourish

4 4  Direct Gathering Through Interaction with Stakeholders SC selected a representative set of applications for the 2002 Workshop Case studies were created for each application at the Workshop in order to consistently characterize the requirements The requirements collected from the case studies form the foundation for the current ESnet4 architecture – Bandwidth, Connectivity Scope / Footprint, Services – We do not ask that our users become network experts in order to communicate their requirements to us – We ask what tools the researchers need to conduct their science, synthesize the necessary networking capabilities, and pass that back to our constituents for evaluation Per-Program Office workshops continue this process – Workshops established as a result of ESnet baseline Lehman Review – Workshop survey process extended to ESnet sites via Site Coordinators ESnet has a much larger user base (~50k to 100k users) than a typical supercomputer center (~3k users) and so has a more diffuse relationship with individual users – Requirements gathering focused on key Principal Investigators, Program Managers, Scientists, etc, rather than a broad survey of every computer user within DOE – Laboratory CIOs and their designates also play a key role in requirements input

5 5 Case Studies For Requirements Advanced Scientific Computing Research (ASCR) – NERSC – NLCF Basic Energy Sciences – Advanced Light Source Macromolecular Crystallography – Chemistry/Combustion – Spallation Neutron Source Biological and Environmental – Bioinformatics/Genomics – Climate Science Fusion Energy Sciences – Magnetic Fusion Energy/ITER High Energy Physics – LHC Nuclear Physics – RHIC There is a high level of correlation between network requirements for large and small scale science – the only difference is bandwidth – Meeting the requirements of the large-scale stakeholders will cover the smaller ones, provided the required services set is the same

6 6 Case Studies Requirements Gathering For all the science cases the following were identified by examining the science environment – Instruments and facilities Location and use of facilities, instruments, computational resources, etc. Data movement and storage requirements – Process of science Collaborations Network services requirements Noteworthy patterns of use (e.g. duty cycle of instruments) – Near-term needs (now to 12 months) – 5 year needs (relatively concrete) – 5-10 year needs (more uncertainty)

7 7 Example Case Study Summary Matrix: Fusion Feature Science Instruments and FacilitiesProcess of Science Anticipated Requirements Time FrameNetwork Network Services and Middleware Near-term  Each experiment only gets a few days per year - high productivity is critical  Experiment episodes (“shots”) generate 2-3 Gbytes every 20 minutes, which has to be delivered to the remote analysis sites in two minutes in order to analyze before next shot  Highly collaborative experiment and analysis environment  Real-time data access and analysis for experiment steering (the more that you can analyze between shots the more effective you can make the next shot)  Shared visualization capabilities  PKI certificate authorities that enable strong authentication of the community members and the use of Grid security tools and services.  Directory services that can be used to provide the naming root and high-level (community- wide) indexing of shared, persistent data that transforms into community information and knowledge  Efficient means to sift through large data repositories to extract meaningful information from unstructured data. 5 years  10 Gbytes generated by experiment every 20 minutes (time between shots) to be delivered in two minutes  Gbyte subsets of much larger simulation datasets to be delivered in two minutes for comparison with experiment  Simulation data scattered across United States  Transparent security  Global directory and naming services needed to anchor all of the distributed metadata  Support for “smooth” collaboration in a high-stress environment  Real-time data analysis for experiment steering combined with simulation interaction = big productivity increase  Real-time visualization and interaction among collaborators across United States  Integrated simulation of the several distinct regions of the reactor will produce a much more realistic model of the fusion process  Network bandwidth and data analysis computing capacity guarantees (quality of service) for inter-shot data analysis  Gbits/sec for 20 seconds out of 20 minutes, guaranteed  5 to 10 remote sites involved for data analysis and visualization  Parallel network I/O between simulations, data archives, experiments, and visualization  High quality, 7x24 PKI identity authentication infrastructure  End-to-end quality of service and quality of service management  Secure/authenticated transport to ease access through firewalls  Reliable data transfer  Transient and transparent data replication for real-time reliability  Support for human collaboration tools 5+ years  Simulations generate 100s of Tbytes  ITER – Tbyte per shot, PB per year  Real-time remote operation of the experiment  Comprehensive integrated simulation  Quality of service for network latency and reliability, and for co-scheduling computing resources  Management functions for network quality of service that provides the request and access mechanisms for the experiment run time, periodic traffic noted above. Considers instrument and facility requirements, the process of science drivers and resulting network requirements cross cut with timelines

8 8 Requirements from Instruments and Facilities This is the ‘hardware infrastructure’ of DOE science – types of requirements can be summarized as follows – Bandwidth: Quantity of data produced, requirements for timely movement – Connectivity: Geographic reach – location of instruments, facilities, and users plus network infrastructure involved (e.g. ESnet, Internet2, GEANT) – Services: Guaranteed bandwidth, traffic isolation, etc.; IP multicast Data rates and volumes from facilities and instruments – bandwidth, connectivity, services – Large supercomputer centers (NERSC, NLCF) – Large-scale science instruments (e.g. LHC, RHIC) – Other computational and data resources (clusters, data archives, etc.) Some instruments have special characteristics that must be addressed (e.g. Fusion) – bandwidth, services Next generation of experiments and facilities, and upgrades to existing facilities – bandwidth, connectivity, services – Addition of facilities increases bandwidth requirements – Existing facilities generate more data as they are upgraded – Reach of collaboration expands over time – New capabilities require advanced services

9 9 Requirements from Examining the Process of Science (1) The geographic extent and size of the user base of scientific collaboration is continuously expanding – DOE US and international collaborators rely on ESnet to reach DOE facilities – DOE Scientists rely on ESnet to reach non-DOE facilities nationally and internationally (e.g. LHC, ITER) – In the general case, the structure of modern scientific collaboration assumes the existence of a robust, high- performance network infrastructure interconnecting collaborators with each other and with the instruments and facilities they use – Therefore, close collaboration with other networks is essential for end-to-end service deployment, diagnostic transparency, etc. Robustness and stability (network reliability) are critical – Large-scale investment in science facilities and experiments makes network failure unacceptable when the experiments depend on the network – Dependence on the network is the general case

10 10 Requirements from Examining the Process of Science (2) Science requires several advanced network services for different purposes – Predictable latency, quality of service guarantees Remote real-time instrument control Computational steering Interactive visualization – Bandwidth guarantees and traffic isolation Large data transfers (potentially using TCP-unfriendly protocols) Network support for deadline scheduling of data transfers Science requires other services as well – for example – Federated Trust / Grid PKI for collaboration and middleware Grid Authentication credentials for DOE science (researchers, users, scientists, etc.) Federation of international Grid PKIs – Collaborations services such as audio and video conferencing

11 11 Science Network Requirements Aggregation Summary Science Drivers Science Areas / Facilities End2End Reliability Connectivity2006 End2End Band width 2010 End2End Band width Traffic Characteristics Network Services Advanced Light Source - DOE sites US Universities Industry 1 TB/day 300 Mbps 5 TB/day 1.5 Gbps Bulk data Remote control Guaranteed bandwidth PKI / Grid Bioinformatics- DOE sites US Universities 625 Mbps 12.5 Gbps in two years 250 Gbps Bulk data Remote control Point-to-multipoint Guaranteed bandwidth High-speed multicast Chemistry / Combustion - DOE sites US Universities Industry -10s of Gigabits per second Bulk data Guaranteed bandwidth PKI / Grid Climate Science - DOE sites US Universities International -5 PB per year 5 Gbps Bulk data Remote control Guaranteed bandwidth PKI / Grid High Energy Physics (LHC) 99.95+% (Less than 4 hrs/year) US Tier1 (DOE) US Tier2 (Universities) International (Europe, Canada) 10 Gbps60 to 80 Gbps (30-40 Gbps per US Tier1) Bulk data Remote control Guaranteed bandwidth Traffic isolation PKI / Grid

12 12 Science Network Requirements Aggregation Summary Science Drivers Science Areas / Facilities End2End Reliability Connectivity2006 End2End Band width 2010 End2End Band width Traffic Characteristics Network Services Magnetic Fusion Energy 99.999% (Impossible without full redundancy) DOE sites US Universities Industry 200+ Mbps 1 Gbps Bulk data Remote control Guaranteed bandwidth Guaranteed QoS Deadline scheduling NERSC- DOE sites US Universities Industry International 10 Gbps20 to 40 Gbps Bulk data Remote control Guaranteed bandwidth Guaranteed QoS Deadline Scheduling PKI / Grid NLCF- DOE sites US Universities Industry International Backbone Band width parity Backbone band width parity Bulk data Nuclear Physics (RHIC) - DOE sites US Universities International 12 Gbps70 Gbps Bulk data Guaranteed bandwidth PKI / Grid Spallation Neutron Source High (24x7 operation) DOE sites640 Mbps2 Gbps Bulk data

13 13  Example Case Studies By way of example, four of the cases are discussed here  Magnetic fusion  Large Hadron Collider  Climate Modeling  Spallation Neutron Source Categorization of case study information: quantitative vs. qualitative – Quantitative requirements from instruments, facilities, etc. Bandwidth requirements Storage requirements Computational facilities Other ‘hardware infrastructure’ – Qualitative requirements from the science process Bandwidth and service guarantees Usage patterns

14 14 Magnetic Fusion Energy

15 15 Magnetic Fusion Requirements – Instruments and Facilities Three large experimental facilities in US (General Atomics, MIT, Princeton Plasma Physics Laboratory) – 3 GB data set per pulse today, 10+ GB per pulse in 5 years – 1 pulse every 20 minutes, 25-35 pulses per day – Guaranteed bandwidth requirement: 200+ Mbps today, ~1 Gbps in 5 years (driven by science process) Computationally intensive theory/simulation component – Simulation runs at supercomputer centers, post-simulation analysis at ~20 other sites – Large data sets (1 TB+ in 3-5 years) – 10’s of TB of data in distributed archives ITER – Located in France – Groundbreaking soon, production operations in 2015 – 1 TB of data per pulse, 1 pulse per hour – Petabytes of simulation data per year

16 16 Magnetic Fusion Requirements – Process of Science (1) Experiments today – Interaction between large groups of local and remote users and the instrument during experiments – highly collaborative – Data from current pulse is analyzed to provide input parameters for next pulse – Requires guaranteed network and computational throughput on short time scales Data transfer in 2 minutes Computational analysis in ~7 minutes Science analysis in ~10 minutes Experimental pulses are 20 minutes apart ~1 minute of slack – this amounts to 99.999% uptime requirement – Network reliability is critical, since each experiment gets only a few days of instrument time per year

17 17 Magnetic Fusion Requirements – Process of Science (2) Simulation – Large, geographically dispersed data sets, more so in the future – New long-term initiative (Fusion Simulation Project, FSP) – integrated simulation suite – FSP will increase the computational requirements significantly in the future, resulting in increased bandwidth needs between fusion users and the SC supercomputer centers Both experiments and simulations rely on middleware that uses ESnet’s federated trust services to support authentication ITER – Scale will increase substantially – Close collaboration with the Europeans is essential for DOE science

18 18 Magnetic Fusion – Network Requirements Experiments – Guaranteed bandwidth requirement: 200+ Mbps today, ~1 Gbps in 5 years (driven by science process) – Reliability (99.999% uptime) – Deadline scheduling – Service guarantees for remote steering and visualization Simulation – Bulk data movement (310 Mbps end2end to move 1 TB in 8 hours) Federated Trust / Grid PKI for authentication ITER – Large guaranteed bandwidth requirement (pulsed operation and science process as today, much larger data sets) – Large bulk data movement for simulation data (Petabytes per year)

19 19 Large Hadron Collider at CERN

20 20 LHC Requirements – Instruments and Facilities Large Hadron Collider at CERN – Networking requirements of two experiments have been characterized – CMS and Atlas – Petabytes of data per year to be distributed LHC networking and data volume requirements are unique to date – First in a series of DOE science projects with requirements of unprecedented scale – Driving ESnet’s near-term bandwidth and architecture requirements – These requirements are shared by other very-large-scale projects that are coming on line soon (e.g. ITER) Tiered data distribution model – Tier0 center at CERN processes raw data into event data – Tier1 centers receive event data from CERN FNAL is CMS Tier1 center for US BNL is Atlas Tier1 center for US CERN to US Tier1 data rates: 10 Gbps by 2007, 30-40 Gbps by 2010/11 – Tier2 and Tier3 sites receive data from Tier1 centers Tier2 and Tier3 sites are end user analysis facilities Analysis results are sent back to Tier1 and Tier0 centers Tier2 and Tier3 sites are largely universities in US and Europe

21 21 LHCNet Security Requirements Security for the LHC Tier0-Tier1 network is being defined by CERN in the context of the LHC Network Operations forum Security to be achieved by filtering packets at CERN and the Tier1 sites to enforce routing policy (only approved hosts may send traffic) In providing circuits for LHC, providers must make sure that these policies cannot be circumvented

22 22 LHC Requirements – Process of Science Strictly tiered data distribution model is only part of the picture – Some Tier2 scientists will require data not available from their local Tier1 center – This will generate additional traffic outside the strict tiered data distribution tree – CMS Tier2 sites will fetch data from all Tier1 centers in the general case CMS traffic patterns will depend on data locality, which is currently unclear Network reliability is critical for the LHC – Data rates are so large that buffering capacity is limited – If an outage is more than a few hours in duration, the analysis could fall permanently behind Analysis capability is already maximized – little extra headroom CMS/Atlas require DOE federated trust for credentials and federation with LCG Service guarantees will play a key role – Traffic isolation for unfriendly data transport protocols – Bandwidth guarantees for deadline scheduling Several unknowns will require ESnet to be nimble and flexible – Tier1 to Tier1,Tier2 to Tier1, and Tier2 to Tier0 data rates could add significant additional requirements for international bandwidth – Bandwidth will need to be added once requirements are clarified – Drives architectural requirements for scalability, modularity

23 23 LHC Ongoing Requirements Gathering Process ESnet has been an active participant in the LHC network planning and operation – Been an active participant in the LHC network operations working group since its creation – Jointly organized the US CMS Tier2 networking requirements workshop with Internet2 – Participated in the US Atlas Tier2 networking requirements workshop – Participated in all 5 US Tier3 networking workshops

24 24 LHC Requirements Identified To Date 10 Gbps “light paths” from FNAL and BNL to CERN – CERN / USLHCnet will provide10 Gbps circuits to Starlight, to 32 AoA, NYC (MAN LAN), and between Starlight and NYC – 10 Gbps each in near term, additional lambdas over time (3-4 lambdas each by 2010) BNL must communicate with TRIUMF in Vancouver – This is an example of Tier1 to Tier1 traffic – 1 Gbps in near term – Circuit is currently being built Additional bandwidth requirements between US Tier1s and European Tier2s – To be served by USLHCnet circuit between New York and Amsterdam Reliability – 99.95%+ uptime (small number of hours per year) – Secondary backup paths – SDN for the US and possibly GLIF (Global Lambda Integrated Facility) for transatlantic links – Tertiary backup paths – virtual circuits through ESnet, Internet2, and GEANT production networks Tier2 site connectivity – Characteristics TBD, and is the focus of the Tier2 workshops – At least 1 Gbps required (this is already known to be a significant underestimate for large US Tier2 sites) – Many large Tier2 sites require direct connections to the Tier1 sites – this drives bandwidth and Virtual Circuit deployment (e.g. UCSD) Ability to add bandwidth as additional requirements are clarified

25 25 Identified US Tier2 Sites Atlas (BNL Clients) – Boston University – Harvard University – Indiana University Bloomington – Langston University – University of Chicago – University of New Mexico Alb. – University of Oklahoma Norman – University of Texas at Arlington Calibration site – University of Michigan CMS (FNAL Clients) – Caltech – MIT – Purdue University – University of California San Diego – University of Florida at Gainesville – University of Nebraska at Lincoln – University of Wisconsin at Madison

26 26 LHC Tier 0, 1, and 2 Connectivity Requirements Summary Denver Sunnyvale LA KC Dallas Albuq. CERN-1 GÉANT-1 GÉANT-2 CERN-2 Tier 1 Centers ESnet IP core hubs ESnet SDN/NLR hubs Cross connects with Internet2 CERN-3 Internet2/GigaPoP nodes USLHC nodes ESnet SDN Internet2 / Gigapop Footprint Seattle FNAL (CMS T1) BNL (Atlas T1) New York Wash DC Jacksonville Boise San Diego Atlanta Vancouver Toronto Tier 2 Sites Chicago ESnet IP Core TRIUMF (Atlas T1, Canada) CANARIE GÉANT USLHCNet Virtual Circuits Direct connectivity T0-T1-T2 USLHCNet to ESnet to Internet2 Backup connectivity SDN, GLIF, VCs

27 27 LHC ATLAS Bandwidth Matrix as of April 2007 Site ASite ZESnet AESnet ZA-Z 2007 Bandwidth A-Z 2010 Bandwidth CERNBNLAofA (NYC)BNL10Gbps20-40Gbps BNLU. of Michigan (Calibration) BNL (LIMAN)Starlight (CHIMAN) 3Gbps10Gbps BNLBoston University BNL (LIMAN) Internet2 / NLR Peerings 3Gbps (Northeastern Tier2 Center) 10Gbps (Northeastern Tier2 Center) BNLHarvard University BNLIndiana U. at Bloomington BNL (LIMAN) Internet2 / NLR Peerings 3Gbps (Midwestern Tier2 Center) 10Gbps (Midwestern Tier2 Center) BNLU. of Chicago BNLLangston University BNL (LIMAN)Internet2 / NLR Peerings 3Gbps (Southwestern Tier2 Center) 10Gbps (Southwestern Tier2 Center) BNLU. Oklahoma Norman BNLU. of Texas Arlington BNLTier3 AggregateBNL (LIMAN)Internet2 / NLR Peerings 5Gbps20Gbps BNLTRIUMF (Canadian ATLAS Tier1) BNL (LIMAN)Seattle1Gbps5Gbps

28 28 LHC CMS Bandwidth Matrix as of April 2007 Site ASite ZESnet AESnet ZA-Z 2007 Bandwidth A-Z 2010 Bandwidth CERNFNALStarlight (CHIMAN) FNAL (CHIMAN) 10Gbps20-40Gbps FNALU. of Michigan (Calibration) FNAL (CHIMAN) Starlight (CHIMAN) 3Gbps10Gbps FNALCaltechFNAL (CHIMAN) Starlight (CHIMAN) 3Gbps10Gbps FNALMITFNAL (CHIMAN) AofA (NYC)/ Boston 3Gbps10Gbps FNALPurdue UniversityFNAL (CHIMAN) Starlight (CHIMAN) 3Gbps10Gbps FNALU. of California at San Diego FNAL (CHIMAN) San Diego3Gbps10Gbps FNALU. of Florida at Gainesville FNAL (CHIMAN) SOX3Gbps10Gbps FNALU. of Nebraska at Lincoln FNAL (CHIMAN) Starlight (CHIMAN) 3Gbps10Gbps FNALU. of Wisconsin at Madison FNAL (CHIMAN) Starlight (CHIMAN) 3Gbps10Gbps FNALTier3 AggregateFNAL (CHIMAN) Internet2 / NLR Peerings 5Gbps20Gbps

29 29 Estimated Aggregate Link Loadings, 2007-08 Denver Seattle Sunnyvale LA San Diego Chicago Jacksonville KC El Paso Albuq. Tulsa Clev. Boise Wash DC Salt Lake City Portland Baton Rouge Houston Pitts. NYC Boston Philadelphia Indianapolis Atlanta Nashville Existing site supplied circuits ESnet IP core (1 ) ESnet Science Data Network core ESnet SDN core, NLR links Lab supplied link LHC related link MAN link International IP Connections Raleigh OC48 (1) (1(3)) Layer 1 optical nodes at eventual ESnet Points of Presence ESnet IP switch only hubs ESnet IP switch/router hubs ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Lab site 13 12.5 8.5 9 13 2.5 Committed bandwidth, Gb/s 6 9 6 2.5 unlabeled links are 10 Gb/s

30 30 Layer 1 optical nodes at eventual ESnet Points of Presence ESnet IP switch only hubs ESnet IP switch/router hubs ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Lab site ESnet IP core ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link LHC related link MAN link International IP Connections ESnet4 2007-8 Estimated Bandwidth Commitments Denver Seattle Sunnyvale LA San Diego Chicago Raleigh Jacksonville KC El Paso Albuq. Tulsa Clev. Boise Wash DC Salt Lake City Portland Baton Rouge Houston Pitts. NYC Boston Philadelphia Indianapolis Atlanta Nashville All circuits are 10Gb/s. MAX West Chicago MAN Long Island MAN Newport News - Elite San Francisco Bay Area MAN LBNL SLAC JGI LLNL SNLL NERSC JLab ELITE ODU MATP Wash., DC OC48 (1(3)) (7) (17) (19) (20) (22) (23) (29) (28) (8) (16) (32) (2) (4) (5) (6) (9) (11) (13) (25) (26) (10) (12) (3) (21) (27) (14) (24) (15) (0) (1) (30) FNAL 600 W. Chicago Starlight ANL USLHCNet CERN 10 29 (total) 2.5 Committed bandwidth, Gb/s BNL 32 AoA, NYC USLHCNet CERN 10 13 5 unlabeled links are 10 Gb/s

31 31 Estimated Aggregate Link Loadings, 2010-11 Denver Seattle Sunnyvale LA San Diego Chicago Raleigh Jacksonville KC El Paso Albuq. Tulsa Clev. Boise Wash. DC Salt Lake City Portland Baton Rouge Houston Pitts. NYC Boston Philadelphia Indianapolis (>1 ) Atlanta Nashville Layer 1 optical nodes at eventual ESnet Points of Presence ESnet IP switch only hubs ESnet IP switch/router hubs ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Lab site OC48 ESnet IP core (1 ) ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link LHC related link MAN link International IP Connections 50 4040 4040 4040 4040 4040 4 5 3030 4040 4040 40 (16) 3030 3030 50 45 30 15 20 5 5 5 10 2.5 Committed bandwidth, Gb/s link capacity, Gb/s 4040 unlabeled links are 10 Gb/s labeled links are in Gb/s

32 32 ESnet4 2010-11 Estimated Bandwidth Commitments Denver Seattle Sunnyvale LA San Diego Chicago Raleigh Jacksonville KC El Paso Albuq. Tulsa Clev. Boise Wash. DC Salt Lake City Portland Baton Rouge Houston Pitts. NYC Boston Philadelphia Indianapolis (>1 ) Atlanta Nashville Layer 1 optical nodes at eventual ESnet Points of Presence ESnet IP switch only hubs ESnet IP switch/router hubs ESnet SDN switch hubs Layer 1 optical nodes not currently in ESnet plans Lab site OC48 (0) (1) ESnet IP core (1 ) ESnet Science Data Network core ESnet SDN core, NLR links (existing) Lab supplied link LHC related link MAN link International IP Connections Internet2 circuit number (20) 5 4 4 4 4 4 4 5 5 5 3 4 4 5 5 5 5 5 4 (7) (17) (19) (20) (22) (23) (29) (28) (8) (16) (32) (2) (4) (5) (6) (9) (11) (13) (25) (26) (10) (12) (27) (14) (24) (15) (30) 3 3 (3) (21) 25 20 25 15 10 20 5 10 5 5 80 FNAL 600 W. Chicago Starlight ANL USLHCNet CERN 40 80 100 BNL 32 AoA, NYC USLHCNet CERN 65 40 unlabeled links are 10 Gb/s 2.5 Committed bandwidth, Gb/s

33 33 Climate Modeling

34 34 Climate Modeling Requirements – Instruments and Facilities Climate Science is a large consumer of supercomputer time Data produced in direct proportion to CPU allocation – As supercomputers increase in capability and models become more advanced, model resolution improves – As model resolution improves, data sets increase in size – CPU allocation may increase due to increased interest from policymakers – Significant data set growth is likely in the next 5 years, with corresponding increase in network bandwidth requirement for data movement (current data volume is ~200TB, 1.5PB/year expected rate by 2010) Primary data repositories co-located with compute resources – Secondary analysis is often geographically distant from data repositories, requiring data movement

35 35 Climate Modeling Requirements – Process of Science Climate models are run many times – Analysis  improved model  analysis is typical cycle – Repeated runs of models are required to generate sufficient data for analysis and model improvement Current analysis is done by transferring model output data sets to scientist’s home institution for local study – Recent trend is to make data from many models widely available – Less efficient use of network bandwidth, but huge scientific win PCMDI (Program for Climate Model Diagnosis and Intercomparison) generated 200 papers in a year Wide sharing of data expected to continue PCMDI paradigm of wide sharing from central locations will require significant bandwidth and excellent connectivity at those locations – If trend of sharing data continues, more data repositories will be opened, requiring more bandwidth resources

36 36 Climate Modeling Requirements Data movement – Large data sets must be moved to remote analysis resources – Central repositories collect and distribute large data volumes Hundreds of Terabytes today Petabytes by 2010 Analysis cycle – Steady growth in network usage as models improve Increased use of supercomputer resources – As computational systems increase in capability, data set sizes increase – Increased demand from policymakers may result in increased data production

37 37 Spallation Neutron Source (SNS) at ORNL

38 38 SNS Requirements – Instruments and Facilities SNS is latest instrument for Neutron Science – Most intense pulsed neutron beams available for research – Wide applicability to materials science, medicine, etc – Users from DOE, Industry, Academia – In process of coming into full production (full-power Accelerator Readiness Review imminent as of April 2007) SNS detectors produce 160GB/day of data in production – Operation schedule results in about 50TB/year – Network requirements are 640Mbps peak – This will increase to 10Gbps peak within 5 years Neutron science data repository is being considered

39 39 SNS Requirements – Process of Science Productivity is critical – Scientists are expected to get just a few days per year of instrument time Drives requirement for reliability – Real-time analysis used to tune experiment in progress Linkage with remote computational resources 2Gbps network load for real-time remote visualization Most analysis of instrument data is expected to be done using remote computational resources – Data movement is necessary – Workflow management software (possibly based on Grid tools) will be necessary There is interest from the SNS community in ESnet’s Federated Trust services for Grid applications

40 40 SNS Requirements Bandwidth – 2Gbps today – 10Gbps in 5 years Reliability – Instrument time is a scarce resource – Real-time instrument interaction Data movement – Workflow tools – Potential neutron science data repository Federated Trust – User management – Workflow tools

41 41  Aggregation of Requirements from All Case Studies Analysis of diverse programs and facilities yields dramatic convergence on a well-defined set of requirements – Reliability Fusion – 1 minute of slack during an experiment (99.999%) LHC – Small number of hours (99.95+%) SNS – limited instrument time makes outages unacceptable Drives requirement for redundancy, both in site connectivity and within ESnet – Connectivity Geographic reach equivalent to that of scientific collaboration Multiple peerings to add reliability and bandwidth to interdomain connectivity Critical both within the US and internationally – Bandwidth 10 Gbps site to site connectivity today 100 Gbps backbone by 2010 Multiple 10 Gbps R&E peerings Ability to easily deploy additional 10 Gbps lambdas and peerings Per-lambda bandwidth of 40 Gbps or 100 Gbps should be available by 2010 – Bandwidth and service guarantees All R&E networks must interoperate as one seamless fabric to enable end2end service deployment Flexible rate bandwidth guarantees – Collaboration support (federated trust, PKI, AV conferencing, etc.)

42 42 Additional Bandwidth Requirements Matrix – April 2007 Site ASite ZESnet AESnet ZA-Z 2007 Bandwidth A-Z 2010 Bandwidth ANL (ALCF) ORNLANL (CHIMAN)ORNL (Atlanta and Chicago) 10Gbps (2008)20Gbps ANL (ALCF) NERSCANL (CHIMAN)NERSC (BAMAN)10Gbps (2008)20Gbps BNL (RHIC) CC-J, RIKEN, Japan BNL (LIMAN)NYC (MANLAN)1Gbps3Gbps Argonne Leadership Computing Facility requirement is for large-scale distributed filesystem linking ANL, NERSC and ORNL supercomputer centers BNL to RIKEN traffic is a subset of total RHIC requirements, and is subject to revision as the impact of RHIC detector upgrades becomes clearer

43 43  Requirements Derivation from Network Observation ESnet observes several aspects of network traffic on an ongoing basis – Load Network traffic load continues to grow exponentially – Flow endpoints Network flow analysis shows a clear trend toward the dominance of large-scale science traffic and wide collaboration – Traffic patterns Traffic pattern analysis indicates a trend toward circuit-like behaviors in science flows

44 44 Terabytes / month ESnet Monthly Accepted Traffic, January, 2000 – June, 2006 ESnet is currently transporting more than1 petabyte (1000 terabytes) per month More than 50% of the traffic is now generated by the top 100 sites — large-scale science dominates all ESnet traffic top 100 sites to site workflows Network Observation – Bandwidth

45 45 ESnet Traffic has Increased by 10X Every 47 Months, on Average, Since 1990 Terabytes / month Log Plot of ESnet Monthly Accepted Traffic, January, 1990 – June, 2006 Oct., 1993 1 TBy/mo. Aug., 1990 100 MBy/mo. Jul., 1998 10 TBy/mo. 38 months 57 months 40 months Nov., 2001 100 TBy/mo. Apr., 2006 1 PBy/mo. 53 months

46 46 Requirements from Network Utilization Observation In 4 years, we can expect a 10x increase in traffic over current levels without the addition of production LHC traffic – Nominal average load on busiest backbone links is greater than 1 Gbps today – In 4 years that figure will be over 10 Gbps if current trends continue Measurements of this kind are science-agnostic – It doesn’t matter who the users are, the traffic load is increasing exponentially Bandwidth trends drive requirement for a new network architecture – New ESnet4 architecture designed with these drivers in mind

47 47 Requirements from Traffic Flow Observations Most ESnet science traffic has a source or sink outside of ESnet – Drives requirement for high-bandwidth peering – Reliability and bandwidth requirements demand that peering be redundant – Multiple 10 Gbps peerings today, must be able to add more flexibly and cost-effectively Bandwidth and service guarantees must traverse R&E peerings – “Seamless fabric” – Collaboration with other R&E networks on a common framework is critical Large-scale science is becoming the dominant user of the network – Satisfying the demands of large-scale science traffic into the future will require a purpose-built, scalable architecture – Traffic patterns are different than commodity Internet Since large-scale science will be the dominant user going forward, the network should be architected to serve large-scale science

48 48 Aggregation of Requirements from Network Observation Traffic load continues to increase exponentially – 15-year trend indicates an increase of 10x in next 4 years – This means backbone traffic load will exceed 10 Gbps within 4 years requiring increased backbone bandwidth – Need new architecture – ESnet4 Large science flows typically cross network administrative boundaries, and are beginning to dominate – Requirements such as bandwidth capacity, reliability, etc. apply to peerings as well as ESnet itself – Large-scale science is becoming the dominant network user

49 49  Other Networking Requirements Production ISP Service for Lab Operations – Captured in workshops, and in discussions with SLCCC (Lab CIOs) – Drivers are an enhanced set of standard business networking requirements – Traditional ISP service, plus enhancements (e.g. multicast) – Reliable, cost-effective networking for business, technical, and research operations Collaboration tools for DOE science community – Audio conferencing – Video conferencing

50 50  Required Network Services Suite for DOE Science We have collected requirements from diverse science programs, program offices, and network analysis – the following summarizes the requirements: – Reliability 99.95% to 99.999% reliability Redundancy is the only way to meet the reliability requirements –Redundancy within ESnet –Redundant peerings –Redundant site connections where needed – Connectivity Geographic reach equivalent to that of scientific collaboration Multiple peerings to add reliability and bandwidth to interdomain connectivity Critical both within the US and internationally – Bandwidth 10 Gbps site to site connectivity today 100 Gbps backbone by 2010 Multiple 10+ Gbps R&E peerings Ability to easily deploy additional lambdas and peerings – Service guarantees All R&E networks must interoperate as one seamless fabric to enable end2end service deployment Guaranteed bandwidth, traffic isolation, quality of service Flexible rate bandwidth guarantees – Collaboration support Federated trust, PKI (Grid, middleware) Audio and Video conferencing – Production ISP service

51 51 Questions? Thanks for listening!


Download ppt "1 ESnet Network Requirements ASCAC Networking Sub-committee Meeting April 13, 2007 Eli Dart ESnet Engineering Group Lawrence Berkeley National Laboratory."

Similar presentations


Ads by Google