Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDN-SF LANL Tasks. LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated.

Similar presentations


Presentation on theme: "SDN-SF LANL Tasks. LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated."— Presentation transcript:

1 SDN-SF LANL Tasks

2 LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated routes within the storage area network. This may be done by adding additional virtual routes to existing LNet routers. Further, LANL will expose this tuning as a service that can be invoked by a custom software-based network controller. Explore dynamically adjusting Lustre network request schedulers to allow preferential storage operation en-queuing. Replicate SDN-SF rack at the local site and collaborate with ORNL in developing and customizing the emulation test bed. Evaluate the role of Data Center TCP in ensuring high-performance flows can be constructed within the data center infrastructure. Extend the concepts within Data Center TCP to high-speed networking technology. In particular, LANL plans to develop a practical reservation- aware congestion control for Data Center TCP, and then extend Data Center TCP techniques to the Lustre networking (LNet) protocol to alleviate bottlenecks (e.g. the parking lot problem).

3 LANL Task Timetable Year 1: I/O and File System Orchestrator module; Lustre performance optimization for intra-datacenter transfers Year 2: I/O and FS testing; SDN-SF site installation Year 3: I/O testing with remote computation

4 LANL Year 1 Overview Frequent data movement within the data center – We’ve examined ~30 scientist allocations for LANL’s data center – 3 basic types of science: Simulation, Uncertainty quantification, and High-throughput computing – Each generates massive long-lived data sets that flow throughout the data center – (Published in workflow report w/APEX) Goal: Develop I/O and File System Orchestrator module to improve/reserve storage performance during inter/intra-datacenter transfers

5 LANL Year 1 Progress To Date: Measuring existing transfers LANL uses a cluster of file transfer agents to move data between platforms, file systems, archive, DTN staging area, etc. – Transfers use a scheduled pftool session, an MPI- based data mover Production FTAs pftool instrumented with Darshan, an I/O tracing framework – Capture and profile all data movement between storage systems within LANL’s center

6 LANL’s Turquoise Enclave Wolf Mustang Cluster Pinto Cluster L1 L2 L3 WAN Staging Tape Archive Wolf Cluster FTA Cluster Campaign Storage Platform Storage I/O Backbone Network

7 LANL’s Turquoise Enclave Wolf Mustang Cluster Pinto Cluster L1 L2 L3 WAN Staging Tape Archive Wolf Cluster FTA Cluster Campaign Storage Platform Storage I/O Backbone Network Darshan Instrumentation

8 Data Retention Time Forever Temporary Setup/Parame terize/Create Geometry Simulate Physics Simulate Physics Viz Initial Input Deck Initial Input Deck Checkpoint Dump Checkpoint Dump Γ*JMTTI Job Begin Job Begin Job End Job End Campaign Initial State Initial State Checkpoint Dump Checkpoint Dump Timeste p Data Set Timeste p Data Set Sampled Data Set Sampled Data Set Down- Sample Down- Sample Post- Process Post- Process Analysis Data Set Analysis Data Set Sim Input Deck Sim Input Deck Phase S1 Phase S2 Phase S3 Phase S4 Phase S5 Checkpoint Dump Checkpoint Dump 4 – 8x per week 5 - 15x per pipeline Timeste p Data Set Timeste p Data Set 5 – 10x per week Simulation Science Pipeline Simulation Science Workflow Data Center Transfers

9 HTC Science Pipeline Data Retention Time Forever Temporary Generate and/or Gather Input Data Generate and/or Gather Input Data HTC Analysis or UQ Simulation Checkpoint Dump Checkpoint Dump Campaign Shared Input Checkpoint Dump Checkpoint Dump Analysis Phase H1 Phase U1 Phase H2 Phase U2 Phase H3 Phase U3 Checkpoint Dump Checkpoint Dump 4 – 8x per week 5 - 15x per pipeline Private Input File- based Comm. Analysis Data Sets Analysis Data Sets Analysis Data Sets Analysis Data Sets or UQ Science Pipeline … … … … Data-Intensive Science Workflow Data Center Transfers

10 LANL Year 1: Develop orchestration FTA cluster provides a natural mechanism to orchestrate science flows within data center – Collect data to describe quantities of flows for provisioning – Techniques for guaranteeing flow QoS using FTAs/scheduler/pftool Opportunity to re-play pftool traces to measure approaches for limiting performance variability – Measure multiple pilot approaches

11 LANL Year 1: Develop orchestration Candidate pilot approaches possible due to FTA control of data transfers – File mix (small/large) co-scheduling Small files limited by MDS throughput May still generate significant interference – Manage total number of transfer-I/O threads – Network and storage watermarking Measurements via darshan critical – PFS modifications can be years-long efforts then longer to make it to production usage – Currently, Lnet route changes are expensive – would making it cheaper be worth the effort

12 Deliverables Year 1 - 3 – Share PFTool data across complex Anonymize data from some LANL enclaves Year 1 – Comparison of performance isolation/maximization techniques Year 2 – Integrate SDN rack into Darwin – Orchestrate techniques identified as valuable Goal is to feed this information into the PFS community, so that feasible algorithms are implemented in NRS, etc. Year 3 – Isolation techniques for remote burst buffer access – BB Software is immature, influence possible All are based on PLFS Chance to get small QoS hooks added – just need to know what we request

13 Closing Questions


Download ppt "SDN-SF LANL Tasks. LANL Research Tasks Explore parallel file system networking (e.g. LNet peer credits) in order to give preferential treatment to isolated."

Similar presentations


Ads by Google