Presentation is loading. Please wait.

Presentation is loading. Please wait.

Give Your Data the Edge A Scalable Data Delivery Platform

Similar presentations


Presentation on theme: "Give Your Data the Edge A Scalable Data Delivery Platform"— Presentation transcript:

1 Give Your Data the Edge A Scalable Data Delivery Platform
University of Arizona University of North Carolina Open Networking Lab Princeton University

2 Data Management Challenge
Distributed Set of Collaborators Data Management Experts Share Pre-Stage Write-Back Institutional Resources Commodity Cloud Storage S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE

3 Our Goal Enable a scalable number of collaborators (and their applications) to share access to data independent of where it is stored, in a way that (1) minimizes the operational burden on users and (2) maximizes the uses of commodity infrastructure.

4 Syndicate Solution CDN Metadata Service Shared Volume SG SG SG SG SG
CyVerse DropBox Talk to the value of a CDN. Leverage the same service as Netflix, but for scientific data. XSEDE

5 Syndicate Solution CDN Manages data consistency and
Shared Volume Manages data consistency and key distribution Bridges application workflow and HTTP transport; e.g., – Jupyter – Hadoop SG SG SG Metadata Service CDN Aquires data from existing data stores; e.g., – CyVerse – XSEDE Treats cloud storage as a block device SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE

6 Syndicate Solution CDN As easy as mounting Dropbox Auto-mount in
Shared Volume Auto-mount in Cloud VMs SG SG SG Metadata Service CDN SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE

7 OpenCloud – Service Delivery Platform
Shared Volume SG SG SG Metadata Service SG SG SG SG S3 CyVerse DropBox Emphasize value – but limitations – of existing resources: (1) R/W performance of local disk, (2) Scalable Read Bandwidth, (3) Persistent Storage, (4) popular data sets. Then introduce UG, RG, and AG, plus tied all together with (1) HTTP data plane and (2) MS. Results in shared/global volume. XSEDE

8 The “Value-Add” Strategy
Syndicate = CDN  Object Store  NoSQL DB Value-Add Storage Service Scalable Read Bandwidth (Akamai HyperCache & RequestRouter) Data Durability (S3, Glacier, DropBox, Box, Swift) Data Consistency (Google App Engine)

9 Value-Add Storage Service
OpenCloud Commodity Clouds Private Clouds Internet2 Backbone Regional & Campus End Users HPC Amazon AWS S3 iRODS RR . Google Cloud Platform MS Latency matters Shared state matters Sufficient resources matters

10 Syndicate Value Proposition
Cloud-Ready – Allows users to mount shared volumes into cloud-hosted virtual machines (VMs) with minimal operational overhead. Scalable Read Bandwidth – Provides scalable read bandwidth (i.e., supports a scalable number of users) with minimal operational overhead. Provider Independence – Allows users to take advantage of cost/performance tradeoffs among multiple storage providers (as well as spread risk across those providers) with minimal operational overhead.

11 Syndicate Value Proposition
Secure-by-Default – Allows users to securely share files across organizational boundaries, at scale, with minimal operational overhead. Adapt to Existing Workflows – Makes it easy to integrate existing user workflows, datasets, and toolkits, as well as extend and customize to meet specific community requirements (e.g., privacy). Sustainable Design – Provides a general-purpose storage platform that leverages commodity storage and network caches at every opportunity. Commodity!!! Value to NSF  No up-front capital investment. Pay-as-you-go approach.


Download ppt "Give Your Data the Edge A Scalable Data Delivery Platform"

Similar presentations


Ads by Google