Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Software-Defined Storage for Workflow Applications

Similar presentations


Presentation on theme: "A Software-Defined Storage for Workflow Applications"— Presentation transcript:

1 A Software-Defined Storage for Workflow Applications
Samer Al-Kiswany Matei Ripeanu The University of Waterloo The University of British Columbia

2 Opportunity: Application-Optimized Storage
Application  Storage System file access pattern Storage System  Application file location There is a great opportunity in optimizing systems through optimizing storage systems using application information. For instance, we can optimize storage caching or replication if we know application future access patterns. Similarly storage systems can provide information to optimize upper layer operations.

3 Current Storage System Architecture Limitation
Limits building application-optimized storage: Limits the flow of information across layers Prohibits optimizing the storage operations Current distributed storage architecture limits our ability in harvesting this opportunity due to these limitations.

4 Our Solution: Software-Defined Storage Architecture
FlexStore: Flexible and Extensible Storage System Provides application control of storage operations Extensible Evaluation highlight: Evaluation with synthetic and real science applications, up to: 6x higher performance 10x lower network load We solve these problems in FlexStore. FlexStore demonstrates a new software-define architecture for building storage systems. It enables application-optimized storage operations and is extensible. We built a prototype and evaluated it using synthetic and real applications, here is the evaluation highlight.

5 Outline Target Application Domain
Software-Defined Storage Architecture Architecture FlexStore Evaluation Summary

6 Workload Characteristics - Workflow Applications
File dependency Computation Reduce We target scientific workflow applications. Here is Montage, one example of such applications. Workflow applications are composed of a set of executables (circules), each executable consumes a file or more, and produces a file (arrows) We can see some patterns, for instance ,.. This is a local pattern. We can optimize the caching for this local access pattern to increase access locality. Local access Local access Reduce Montage workflow

7 System Architecture … Application hints (e.g., access patterns)
Workflow Runtime Engine Compute Nodes Storage hints (e.g., location information) task Local storage task Local storage task Local storage Intermediate Storage (shared) Stage In/Out POSIX API Here how this application is deployed. We use large clusters. A schedule know the workflow dependency, schedules tasks on compute nodes that access the storage ….. Backend Filesystem (e.g., GPFS, NFS)

8 Passing hints while maintaining the current API
Application-Optimized Storage Challenges Passing hints while maintaining the current API Design an extensible storage system

9 Outline Target Application Domain
Software-Defined Storage Architecture Architecture FlexStore Evaluation Summary

10 Solution: Hints through Custom Attributes
Application-optimized storage: Application  Storage System file access pattern Storage System  Application file location Custom Metadata File System API Our solution to the first problem Custom attributes where introduced in Linux kernel for improving search and navigability of large systems. We used to pas hints ….

11 Solution: Hints through Custom Attributes
Metadata File System API Advantages: Application-agnostic Maintains layered architecture benefits Maintains standard API Provides an incremental adoption path Cross Layer Application Old Application Custom Metadata File System API Custom Metadata File System API Cross Layer Storage System Old Storage System

12 FlexStore: Flexible and Extensible Storage System
Metadata Manager Scheduler Storage Nodes Client

13 FlexStore Architecture
Flexibility through external control  Software defined storage architecture Control Plane The operation plan has multiple versions of the same operations. The schedule hints are used to select a specific operation version, per file. Primitives plan simplifies building new versions of the operations through encapsulating common storage primitives. Scheduler control Operations Plane Versions of the same operation …. Op. 1 Op. 2 Primitives Plane …. Pr. 1 Pr. 2

14 FlexStore Architecture
Flexibility through external control  Software defined storage architecture Extensibility  common primitives, isolate operations. Control Plane Scheduler control Operations Plane Versions of the same operation …. Op. 1 Op. 2 Primitives Plane …. Pr. 1 Pr. 2

15 FlexStore Design Scheduler control Dispatch based design
The design is complex, here is high level view: This is the design of the manager, but the client and storage nodes follow the same design We use dispatch design pattern, the dispatcher forwards the requests based on the file attributes and operation type. To simplify building distributed optimizations the design tags file related messages with file custom attributes. Scheduler control Dispatch based design Message tagging

16 Evaluation Platform: 102 nodes cluster Workload: 10 stages
Montage workflow Gluster FS did not finish in reasonable time Platform: 102 nodes cluster Workload: 10 stages ~4,000 tasks 6,770 files generated (size of 27GB) Ceph FlexStore-D FlexStore Up to 25% performance gain 2x gain with synthetic benchmarks.

17 Summary Contributions: Cross layer communication trough tags
Design first software-defined extensible storage system Evaluation: Up to 6x higher performance with synthetic benchmarks Up to 10x lower network load Up to 70% higher performance with real applications

18 Thank you

19 Layered System Architecture
Problem: Limits flow of information [Patil HotCloud ’09, Grider CMU Report ’06, Seltzer HotOS ’09, HECE Working Group] Proposed solutions: New API [UrsaMinor, BitDew, HDFS, GreenStore] Modify or extend existing API. [Mesnier SIGOPS ‘11, Patterson SOSP ‘95]

20 Passing hints while maintaining the current API
Cross Layer Optimization Challenges Passing hints while maintaining the current API Design an extensible storage system

21 Opportunity Application-optimized storage:
Application  Storage System file access pattern Storage System  Application file location Custom Metadata File System API


Download ppt "A Software-Defined Storage for Workflow Applications"

Similar presentations


Ads by Google