Gagan Agrawal The Ohio State University

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis
Gagan Agrawal The Ohio State University (Joint work with Yi Wang, Yu Su, Tekin Bicer and others)

Compression/Summarization of Streaming Data
12/4/2018 Outline Middleware Systems Work on In Situ Analysis Analysis of Instrument Data Compression/Summarization of Streaming Data Post analysis using just summary Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

In Situ Analysis – Simulation Data
In-Situ Algorithms No disk I/O Indexing, compression, visualization, statistical analysis, etc. In-Situ Resource Scheduling Systems Enhance resource utilization Simplify the management of analytics code GoldRush, Glean, DataSpaces, FlexIO, etc. Algorithm/Application Level Seamlessly Connected? Platform/System Level These two layers have been extensively studied. Is there anything to do between the two levels?

Opportunity Explore the Programming Model Level in In-Situ Environment Between application level and system level Hides all the parallelization complexities by simplified API A prominent example: MapReduce + In Situ

Challenges Hard to Adapt MR to In-Situ Environment 4 Mismatches
MR is not designed for in-situ analytics 4 Mismatches Data Loading Mismatch Programming View Mismatch Memory Constraint Mismatch Programming Language Mismatch

System Overview In-Situ System = Shared-Memory System + Combination
= Distributed System – Partitioning In-Situ System Distributed System Shared-Memory System Distributed System = Partitioning + Shared-Memory System + Combination In-Situ System = Shared-Memory System + Combination = Distributed System – Partitioning This is because the input data is automatically partitioned based on simulation code.

Two In-Situ Modes Space Sharing Mode:
Enhances resource utilization when simulation reaches its scalability bottleneck Time Sharing Mode: Minimizes memory consumption

Smart vs. Spark To Make a Fair Comparison
Bypass programming view mismatch Run on an 8-core node: multi-threaded but not distributed Bypass memory constraint mismatch Use a simulation emulator that consumes little memory Bypass programming language mismatch Rewrite the simulation in Java and only compare computation time 40 GB input and 0.5 GB per time-step K-Means Histogram 62X 92X

Smart vs. Low-Level Implementations
Setup Smart: time sharing mode; Low-Level: OpenMP + MPI Apps: K-means and logistic regression 1 TB input on 8–64 nodes Programmability 55% and 69% parallel codes are either eliminated or converted into sequential code Performance Up to 9% extra overheads for k-means Nearly unnoticeable overheads for logistic regression Logistic Regression K-Means Up to 9% extra overheads for k-means: low-level implementation can store reduction objects in contiguous arrays, whereas Smart stores data in map structures, and extra serialization is required for each synchronization. Nearly unnoticeable overheads for logistic regression: only a single key-value pair created.

Tomography at Advanced Photon Source
Data deluge is happening in almost all science domains. Light source facilities in US generate several TBs of data per day now and is poised to increased by two orders of magnitude in the next few years. Figure shows the tomography system installed at APS. A sample (inserted by a robot for high-throughput operation) is rotated while synchrotron radiation is passed through it and images are captured by a CCD camera. The current camera-storage bus allows data to be transferred to disk at 100 frames per second (fps), corresponding to 820 MB/s or 67.6 TB/day. Scientific discovery at experimental facilities such as light sources often limited by computational and computing capabilities. EuroPar’15

Tomographic Image Reconstruction
Analysis of tomographic datasets is challenging Long image reconstruction/analysis time E.g. 12GB Data, 12 hours with 24 Cores Different reconstruction algorithms Longer computation times Input dataset < Output dataset 73MB vs. 476MB Parallelization using MATE+ Predecessor of Smart System Beam time at light source facilities is a precious commodity, booked months in advance and typically for only a relatively short period of time (a day or perhaps a week). The data need to be analyzed rapidly so that results from one experiment can guide selection of the next—or even influence the course of a single experiment. Otherwise, the users collect as much as they can in the given time – sometimes the data is not useful, sometimes they collect more data than they need. Iterative reconstruction algorithms are computationally intensive and the output data size of the reconstruction is greater than the input data size. EuroPar’15

Mapping to a MapReduce-like API
Inputs IS: Assigned projection slices Recon: Reconstruction object dist: Subsetting distance Output Recon: Final reconstruction object /* (Partial) iteration i */ For each assigned projection slice, is, in IS { IR = GetOrderedRaySubset(is, i, dist); For each ray, ir, in rays IR { (k, off, val) = LocalRecon(ir, Recon(is)); ReconRep(k) = Reduce (ReconRep(k), off, val); } /* Combine updated replicas */ Recon = PartialCombination(ReconRep) /* Exchange and update adjacent slices*/ Recon = GlobalCombination(Recon) Recon[i] Node 0 Partial Combination() Local Recon(…) Recon Rep Thread m … Partial Combination Phase Node n Thread 0 Local Reconstruction Phase Global Combination Phase Iteration i Projs. Recon [i-1] Inputs Tomography datasets are stored by using a scientific data format such as HDF5 [6]. Before beginning reconstruction, our middleware reads metadata information from the input dataset and allocates the resources required for the output dataset. Our middleware eliminates these race conditions by creating a replica of the output object for each thread, which we refer to as ReconRep in Fig. 3(a). Local reconstruction corresponds to the mapping phase of the MapReduce processing structure. The user implements the reconstruction algorithms in the local reconstruction function (LocalRecon in Fig. 3(a)); our middleware applies this function to each assigned data chunk. Each data chunk can be a set of slices (for per-slice parallelization) or a subset of rays in a slice (for in-slice parallelization). After all the rays in the assigned data chunks are processed, the threads that are working on the same slices synchronize and combine their replicas by using a user-defined function EuroPar’15

How insights can be obtained in-situ?
12/4/2018 In Situ Analysis How do we decide what data to save? This analysis cannot take too much time/memory Simulations already consume most available memory Scientists cannot accept much slowdown for analytics How insights can be obtained in-situ? Must be memory and time efficient What representation to use for data stored in disks? Effective analysis/visualization Disk/Network Efficient Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

Specific Issues Bitmaps as data summarization In-Situ Data Reduction
12/4/2018 Specific Issues Bitmaps as data summarization Utilize extra computer power for data reduction Save memory usage, disk I/O and network transfer time In-Situ Data Reduction In-Situ generate bitmaps Bitmaps generation is time-consuming Bitmaps before compression has big memory cost In-Situ Data Analysis Time steps selection Can bitmaps support time step selection? Efficiency of time step selection using bitmaps Offline Analysis: Only keep bitmaps instead of data Types of analysis supported by bitmaps Before I introduce our method, let’s first look at what is in-situ analysis, and why do we need in-situ analysis. By definition, in-situ analysis is the process of transforming data at run-time. Each time after the data is collected or simulated in memory, we do several types of transforms before the data is written to disk or transffered through the network.

Time-Steps Selection

Efficiency Comparison for In-Situ Analysis - MIC
More cores Lower bandwidth Full Data (original): Huge data writing time Bitmaps: Good scalability of both bitmaps generation and time step selection using bitmaps Much smaller data writing time Overall: 0.81x to 3.28x Simulation: Heat3D; Processor: MIC Time steps: select 25 over 100 time steps 1.6 GB per time step (200*1000*1000) Metrics: Conditional Entropy

Gagan Agrawal The Ohio State University

Similar presentations

Presentation on theme: "Gagan Agrawal The Ohio State University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Gagan Agrawal The Ohio State University

Similar presentations

Presentation on theme: "Gagan Agrawal The Ohio State University"— Presentation transcript:

Similar presentations

About project

Feedback