Future Data Architectures Big Data Workshop – April 2018

Slides:



Advertisements
Similar presentations
BEDI -Big Earth Data Initiative
Advertisements

Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Next Generation Application Platform (NGAP) Andrew Mitchell WGISS-39 Tsukuba, Japan Monday, May 11,
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Evolving toward a Coherent, Collaborative Framework for Earth Science Data, Tools and Services Christopher Lynnes, Kwo-Sen Kuo and Kevin Murphy Earth Science.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
MODIS SDST, STTG and SDDT MODIS Science Team Meeting (Land Discipline Breakout Session) July 13, 2004 Robert Wolfe Raytheon NASA GSFC Code 922.
NASA Earth Exchange (NEX) A collaborative supercomputing environment for global change science Earth Science Division/NASA Advanced Supercomputing (NAS)
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
LP DAAC Overview – Land Processes Distributed Active Archive Center Chris Doescher LP DAAC Project Manager (605) Chris Torbert.
DreamFactory for Microsoft Azure Is an Open Source REST API Platform That Enables Mobilization of Data in Minutes across Frameworks and Storage Methods.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Organizations Are Embracing New Opportunities
Report from WLCG Workshop 2017: WLCG Network Requirements GDB - CERN 12th of July 2017
IOT Critical Impact on DC Design
Barracuda Networks Creates Next-Generation Security Solutions That Enable Customers to Accelerate Their Adoption of Microsoft Azure MICROSOFT AZURE APP.
Working With Azure Batch AI
Ralleo Enterprise-Grade Solution for Managing Change and Business Transformation Provides Opportunities to Better Analyze Real-Time Data MICROSOFT AZURE.
Partner Logo Veropath Offers a Next-Gen Expense Management SaaS Technology Solution, Built Specifically to Harness Big Data Analytics Capabilities in Azure.
Data Bridge Solving diverse data access in scientific applications
Trial.iO Makes it Easy to Provision Software Trials, Demos and Training Environments in the Azure Cloud in One Click, Without Any IT Involvement MICROSOFT.
NeoFirma Taps into the Microsoft Azure Cloud Platform to Deliver Digital Oilfield SaaS to North American Independent Oil and Gas Producers MICROSOFT AZURE.
Joseph JaJa, Mike Smorul, and Sangchul Song
EOSDIS Data Preservation Archive (EDPA)
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Deployed on Microsoft Azure, ecManager Provides E-Business Retailers and Brand Manufacturers with a Dependable Omnichannel E-Commerce Platform MICROSOFT.
LEO Kinesis More Kafka-like Blaine Nielsen
Nimble Streamer Helps Media Content Providers Create Streaming Networks Cost-Effectively and Easily by Utilizing Azure’s Worldwide Scalability MICROSOFT.
Using Microsoft Azure, Crowdnetic Launches Innovative Lending Gateway Platform That Connects Borrowers to Alternative Lenders MICROSOFT AZURE SOLUTION.
Language Understanding Intelligent Service and Microsoft Azure Enable Rover, PLEX.AI’s Artificial Intelligence-Powered Virtual Insurance Advisor MICROSOFT.
CEOS Database API Overview
Built on the Powerful Microsoft Azure Platform, Lievestro Delivers Care Information, Capacity Management Solutions to Hospitals, Medical Field MICROSOFT.
Microsoft Azure Platform Powers New Elements Constellation Software Suite to Deliver Invaluable Insights From Your Data for Marketing and Sales MICROSOFT.
Replace with Application Image
FACTON Provides Businesses with a Cloud Solution That Elevates Enterprise Product Cost Management to a New Level Using the Power of Microsoft Azure MICROSOFT.
Stratus Innovations Group Intelligent Factory™ Solution Offering
Running on the Powerful Microsoft Azure Platform,
Future Data Architecture Cloud Hosting at USGS
Yellowfin: An Azure-Compatible Business Intelligence Platform That Connects People with Their Data for Better Decision Making MICROSOFT AZURE APP BUILDER.
Open Data Cubes Cloud Services Experiences and Lessons Learned
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Voice Analytics on Microsoft Azure Allows Various Customers to Get the Most Out of Conversations with Clients Through Efficient Content Analysis MICROSOFT.
Technology Exploration Cloud Hosting at USGS
Extending Your Integration Strategy
CloneManager® Helps Users Harness the Power of Microsoft Azure to Clone and Migrate Systems into the Cloud Cost-Effectively and Securely MICROSOFT AZURE.
MyCloudIT Enables Partners to Drive Their Cloud Profitability Using CSP-Enabled Desktop Hosting Automation with Microsoft Azure and Office 365 MICROSOFT.
Crypteron is a Developer-Friendly Data Breach Solution that Allows Organizations to Secure Applications on Microsoft Azure in Just Minutes MICROSOFT AZURE.
MARMIND’s New Service Delivers a Single Centralized Marketing Plan That Connects Teams, Campaigns and Outcomes by Using the Power of the Azure Platform.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
One-Stop Shop Manages All Technical Vendor Data and Documentation and is Globally Deployed Using Microsoft Azure to Support Asset Owners/Operators MICROSOFT.
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
XtremeData on the Microsoft Azure Cloud Platform:
Abiquo’s Hybrid Cloud Management Solution Helps Enterprises Maximise the Full Potential of the Microsoft Azure Platform MICROSOFT AZURE ISV PROFILE: ABIQUO.
JOINED AT THE HIP: DEVSECOPS AND CLOUD-BASED ASSETS
Single Cell’s Progenitor Powered by Microsoft Azure Improves Organisational Efficiency with Strategic Procurement, Contract Management, and Analytics MICROSOFT.
Grid Systems: What do we need from web service standards?
Cloud Security AWS as an example.
Cloud Security AWS as an example.
Modern data architecture at scale in the cloud : Best practices of Serverless, lambda and microservices architecture Prakriteswar Santikary, PhD Vice President.
Features Overview.
AI Discovery Template IBM Cloud Architecture Center
COMPANY PROFILE: REELWAY
NOAA OneStop and the Cloud
Presentation transcript:

Future Data Architectures Big Data Workshop – April 2018 Earthdata Cloud 2021 Kevin Murphy April 2018 Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 Earthdata Cloud 2021 Improve the efficiency of NASA’s data systems operations Prepare for planned high-data-rate missions Increase opportunity for researchers and commercial users to process PBs of data quickly without the need for data management/movement Future Data Architectures Big Data Workshop – April 2018

Focused on evaluation and planning for a cloud migration in 4 areas Compliance, Security, Cost Tracking Core Archive Functionality and Processing End-User Application Migration Commercial Cloud Partnerships Future Data Architectures Big Data Workshop – April 2018

Data Rates Drive System Evolution Future Data Architectures Big Data Workshop – April 2018

Enabling Analytics in the Cloud for Earth Science Data

80 TBs/day generation 400 TBs/day 300 GB 150 PBs @ 50 Gbps Most networks can’t handle sustained 50 Gbps Processing times for creating scenes, times to create time-series 400 TBs/day reprocessing 300 GB Granules 150 PBs @ 50 Gbps processing speed for months Future Data Architectures Big Data Workshop – April 2018

EOSDIS Cloud Architecture - 2021 Users access and can process PBs of data quickly without the need for data CEOS USGS, NOAA Organized, well-documented, consistently formatted, and error free data lake Discipline specific support and tools (All data) Workflow specialization by DAACs Processing next to data for anyone Clear integration path for new technology Data reformatting tools available via APIs Supports global distrubition Open Data + Open Source Software + Open Architecture

So we made this thing. Getting Started with Cumulus https://cumulus-nasa.github.io/ Cumulus Code Base: https://github.com/cumulus-nasa

Future Data Architectures Big Data Workshop – April 2018 What is Cumulus? Lightweight, cloud-native framework for data ingest, archive, distribution and management Goals Provide core DAAC functionality in a configurable manner Enable DAACs to help each other with re-usable, compatible containers (e.g. data retrieval, metadata extraction, metrics delivery) Enable DAAC-specific customizations Future Data Architectures Big Data Workshop – April 2018

Cumulus Major System Components A lightweight framework consisting of: Tasks a discrete action in a workflow, invoked as a Lambda function or EC2 service, common protocol supports chaining Orchestration engine (AWS Step Functions) that controls invocation of tasks in a workflow Database store status, logs, and other system state information Workflows(s) file(s) that define the ingest, processing, publication, and archive operations (json) Dashboard create and execute workflows, monitor system Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 AKA the big picture Direct Reduced O&M costs (AWS negotiations) Minimize data movement to compute Ability to scale to increasing data streams Indirect Efficiencies gained via sharing Reduce design, development, purchase of redundant code/components/infrastructure Transparency of processes Improve knowledge sharing Future Data Architectures Big Data Workshop – April 2018

Avoiding Vendor Lock-in (who owns this data? what if you have to move it?) Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 Data Transfer Risk What if you have to move the data? Right now, AWS is the only NASA-approved commercial cloud vendor. As more options become available we will investigate them. Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 Application Transfer Risk Step Functions is an AWS-specific product! Cumulus’ backbone is a workflow processing engine. This is not a unique problem that Amazon alone has solved. There are (many) free and open source, alternatives. We own the boxes, always, the arrows between those boxes are replaceable. Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 Infrastructure Transfer Risk Compute, Serverless, Queueing, etc Again, this is not a unique problem. Every major competitor in the cloud space has alternatives: Serverless: Qinling, Google Cloud Functions Queues: Zaqar, RabbitMQ etc, etc Future Data Architectures Big Data Workshop – April 2018

Future Data Architectures Big Data Workshop – April 2018 Knowledge Transfer Risk We are training everyone in AWS This is a real problem. Effectively leveraging the AWS console is its own skillset. People may become unwilling to be retrained if we have to migrate. But we have faced this problem before. Future Data Architectures Big Data Workshop – April 2018