Presentation is loading. Please wait.

Presentation is loading. Please wait.

VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories PI: Dr. Ashit Talukder Bank of America Endowed Chair.

Similar presentations


Presentation on theme: "VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories PI: Dr. Ashit Talukder Bank of America Endowed Chair."— Presentation transcript:

1 VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories
PI: Dr. Ashit Talukder Bank of America Endowed Chair in IT Web:

2 VIFI Concept Novel VIFI cyberinfrastructure that facilitates data-driven discovery from distributed, fragmented datasets without requiring movement of massive amounts of data without exposing sensitive raw datasets to end users. Overarching Goals: Open source middleware tools Evaluate and demonstrate on multiple domains: Earth Science, Astronomy, Health Informatics, Resilient Human -building ecosystems Useful in domains involving massive, or heterogeneous data streams, with novel edge analytics, fog computing.

3 Traditional Data Fabric: Limitations
Complex and timely processes, standards, APIs, MOUs - may include format conversion, DB import, select field encryption, data redaction or de-identification, etc. Given appropriate authorizations and consideration for data privacy, bulk datasets are transported across bandwidth limited connections. After staging bulk data ingest, analytics differentiates valuable information from irrelevant data. Irrelevant data volume often eclipses that of the usable information. 1.45pm to 3.45pm - Room 232

4 VIFI Proof of Concept: Early Stage Demonstrations
Demonstrate initial core components in VIFI proof of concept use-case: User interface and visualization of distributed data and VIFI features Portable analytics container (PAC) – prepare self-contained analytics scripts and algorithms Docker swarm – deploy, monitor, execute portable analytics (PAC) on remote repositories Orchestration of distributed infrastructure Distributed computation and analytics without moving distributed repositories User visualization of analytics and data-driven insights Demonstrate on pilot Earth science use-case for climate and weather precipitation model prediction from distributed earth science repositories Demonstrate on pilot Astronomy use-case for detecting specific statistical patterns from distributed astronomy data

5 VIFI POC Use Case: Hourly Precipitation datasets over the Great Plains
When it rains at somewhere in the Great Plains, would there be a probability density function to forecast how (strong/long/much) it rains? The example uses rainfall data for one day from NASA’s observational (GPM) and model datasets at three different spatial resolutions. from [Bukovsky 2011] 10: Northern Plains Resolution Date Observation  GPM 0.1o, 30 minutes 06/01/2015 RCM: NASA-Unified Weather Research and Forecasting model (NU-WRF) WRF24 24 km, hourly 06/01/2002 WRF12 12 km, hourly WRF04 4 km, hourly

6 VIFI Motivation: Traditional Data Fabric Architecture
Model Data 3. Observational data re-gridded to the same resolution of the model data (if necessary) Model Server 1. Download Model data 4. JPDF is computed for the observational data User Node (or Server) 2. Download Obs data 6. Observed and simulated JPDFs are used to compute an Evaluation Metric 5. JPDF is computed for the model data Observation Server Observations Data

7 VIFI Motivation: Traditional Data Fabric Architecture
Disadvantages: Long time for transferring massive datasets to the User Node High requirements for storing massive datasets on the User Node All computations are executed on the same server All data are transferred to the User Node – including data that might not be relevant for the analysis in question Scientist must manually install the algorithms (including all dependencies) on the User Node

8 VIFI Motivation: ViFi Enabled Data Fabric Architecture
Model Data 5. Execute Model PAC Model Server 3. Request Model PAC Docker Image 1. Send Model PAC Script 10. Execute Evaluation PAC 7. Send Model Results 9. Request Evaluation PAC Docker Image Docker Hub User Node (or Server) 2. Send Obs PAC Script 8. Send Obs Results 4. Request Obs PAC Docker Image Observation Server Observations Data 6. Execute Obs PAC

9 VIFI Motivation: ViFi Enabled Data Fabric Architecture
Advantages: All phases of the scientific analysis lifecycle (compute and data transfer) are executed by a single agent (NIFI), without any manual intervention or a- priori knowledge on the scientist part. Science algorithms are encapsulated in re-usable PACs, which can be seamlessly deployed and run on any ViFi-enabled Node Computations are distributed onto multiple servers, which have direct access to the data (NO NEED TO MOVE DATA). Only a subset of the data (i.e., results of Model and Observation PACs) are transferred over the network, drastically reducing the data transfer times Scalability of overall infrastructure to any new data source by simply installing the ViFi software.

10 VIFI User Interface PAC script Upload PAC script Write
Visualization Types Results

11 NIFI at User Site

12 NIFI at Server(s) Site(s)
NIFI at Model Server NIFI at Observation Server

13 PoC Current Status Open source (extensible and portable across infrastructures) Initial deployment on AWS (for speed of demonstration – portable and easy to deploy on local managed infrastructure if needed) AWS virtual machines AWS S3 bucket to keep results First Datacenter hosts Model data + NIFI + Docker Swarm Second Datacenter hosts Observation data + NIFI + Docker Swarm User node with NIFI + Docker Swarm Docker Image of Apache OCW at Docker Hub User interface and visualization base functionalities

14 PoC Future Work Expand Pilot, commence HLA design
Integration between UI and NIFI. Common NIFI workflow design for most datacenters (i.e., not only for JPL): Identification of common attributes of users, as well as, datacenters. Data search and virtualization separate from PAC scripts. Data governance, data management, search and query Workflow scheduling and optimization (e.g., DAWN, IReS). Security integration. (authentication, authorization, audit, provenance) Encryption integration (encrypt relevant data and run computations on encrypted data) Demonstrate, evaluate, benchmark on multiple application domains.


Download ppt "VIFI : Virtual Information Fabric for Data-Driven Discovery from Distributed Fragmented Repositories PI: Dr. Ashit Talukder Bank of America Endowed Chair."

Similar presentations


Ads by Google