Compiler Supported High-level Abstractions for Sparse Disk-resident Datasets Renato Ferreira Gagan Agrawal Joel Saltz Ohio State University.

Slides:

Advertisements

Similar presentations

1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.

Advertisements

Evis Trandafili Polytechnic University of Tirana Albania Functional Programming Languages 1.

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

The Virtual Microscope Umit V. Catalyurek Department of Biomedical Informatics Division of Data Intensive and Grid Computing.

Supporting High-Level Abstractions through XML Technologies Xiaogang Li Gagan Agrawal The Ohio State University.

Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.

Copyright Arshi Khan1 System Programming Instructor Arshi Khan.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.

FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.

Compiler Support for Exploiting Coarse-Grained Pipelined Parallelism Wei Du Renato Ferreira Gagan Agrawal Ohio-State University.

Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.

High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.

Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.

SAGA: Array Storage as a DB with Support for Structural Aggregations SSDBM 2014 June 30 th, Aalborg, Denmark 1 Yi Wang, Arnab Nandi, Gagan Agrawal The.

October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,

1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.

A Fault-Tolerant Environment for Large-Scale Query Processing Mehmet Can Kurt Gagan Agrawal Department of Computer Science and Engineering The Ohio State.

Sep 11, 2009 Automatic Transformation of Applications onto GPUs and GPU Clusters PhD Candidacy presentation: Wenjing Ma Advisor: Dr Gagan Agrawal The.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Ohio State University Department of Computer Science and Engineering Data-Centric Transformations on Non- Integer Iteration Spaces Swarup Kumar Sahoo Gagan.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Implementing Data Cube Construction Using a Cluster Middleware: Algorithms, Implementation Experience, and Performance Ge Yang Ruoming Jin Gagan Agrawal.

Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

ApproxHadoop Bringing Approximations to MapReduce Frameworks

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,

1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Research Overview Gagan Agrawal Associate Professor.

1 HPJAVA I.K.UJJWAL 07M11A1217 Dept. of Information Technology B.S.I.T.

Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.

Using XQuery for Flat-File Scientific Datasets Xiaogang Li Gagan Agrawal The Ohio State University.

Michael J. Voss and Rudolf Eigenmann PPoPP, ‘01 (Presented by Kanad Sinha)

Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

Advanced Computer Systems

Efficient Evaluation of XQuery over Streaming Data

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Linchuan Chen, Peng Jiang and Gagan Agrawal

Gary M. Zoppetti Gagan Agrawal

Grid Based Data Integration with Automatic Wrapper Generation

Compiler Supported Coarse-Grained Pipelined Parallelism: Why and How

The Ohio State University

Gary M. Zoppetti Gagan Agrawal Rishi Kumar

Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.

New (Applications of) Compiler Techniques for Data Grids

LCPC02 Wei Du Renato Ferreira Gagan Agrawal

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Compiler Supported High-level Abstractions for Sparse Disk-resident Datasets Renato Ferreira Gagan Agrawal Joel Saltz Ohio State University

General Motivation Computing is playing an increasingly more significant role in a variety of scientific areas Traditionally, the focus was on simulating scientific phenomenon or processes Software tools motivated by various computational solvers Recently, analysis of data is being considered key to advances in sciences Data from computational simulations Digitized images Data from sensors

Challenges in Supporting Processing Massive amounts of data are becoming common Data from simulations of large grids, parameters studies Sensors collecting high resolution data, over long periods of time Datasets can be quite complex Applications scientists need high-performance as well as ease of implementing and modifying analysis

Motivating Application: Satellite Data Processing Time[t] ···  Data collected by satellites is a collection of chunks, each of which captures an irregular section of earth captured at time t  The entire dataset comprises multiples pixels for each point in earth at different times, but not for all times  Typical processing is a reduction along the time dimension - hard to write on the raw data format

Supporting High-level Abstractions AbsDomain lat time long  View the dataset as a dense 3-d array, where many values can be zero  Simplify the specification of processing on the datasets  Challenge: how do we achieve efficient processing ?  Locality in accessing data  Avoiding unnecessary computations

Outline Compiler front-end Execution strategy for irregular/sparse applications Supporting compiler analyses Performance enhancements Dense applications Code motion for conditionals Experimental results Conclusion

Programming Interface Multi-dimensional collections Domain RectDomain Foreach loop Iterates of the elements of a collection Reduction interface Defines reduction variables Update within the foreach Associative and commutative operations Only used for self updates

Satellite Data Processing public class Element { short bands[5]; short lat, long; } public class SatelliteApp { SatelliteData satdata; OutputData output; public static void main(String[] args) { Point[2] q; pixel val; RectDomain[3d] AbsDomain =... foreach (q in AbsDomain) if (val = satdata.getData(q)) { Point[2] p = (q[1], q[2]); output[p].Accumulate(val); } Time[t] ··· AbsDomain lat time long

Sparse Execution Strategy Iterating over AbsDomain Sparse domain Poor locality Iterating over input elements Need to map element to loop iteration Foreach element e I = Iters(e) Foreach i in I If (i in the Input Range) Perform computation for e

Computing function Iters() Iters (element -> abstract domain) l-value of element = r-value of element = Iters(elem = )  Find the dominating constraints for the return statements within the functions in the low-level data layout (getData)

(Chunk-wise) Dense Strategy Exploit the regularity on the dataset Eliminate overhead of sparse strategy Simpler, more efficient implementation Foreach input block Extract D (descriptor of the data) I = (Iters(D)  Input Range) Foreach i in I Perform computation for Input[i]

Other Implementation Issues Generating code for efficient execution ADR run-time system Memory requirements Tiling of the output Extract subscript and range functions from user application Program Slicing (ICS 2000) Compiler and runtime communication analysis (PACT 2001)

Active Data Repository Specialized run-time support for processing disk- based multi-dimensional datasets Push processing into storage manager Asynchronous operations Dataset is divided in blocks Distribute across the nodes of a parallel machine Spatial indexing mechanism Customizable for a variety of applications Through virtual functions Supplied by the compiler

Experimental Results: Sparse Application Cluster of Pentium II 400MHz Linux 256MB main memory 18GB local disk Gigabit switch Total data of 2.7GB Process about 1.9GB Output 446MB 5 to 10 times faster

Experimental Results: Dense Application Multi-grid Virtual Microscope Based on VMScope Stores data on different resolutions Total data of 3.3GB Process about 3GB Output 1.6GB 2 to 3 times faster

Improving the Performance Virtual Microscope with subsampling Extra conditionals From execution strategy From application for(i0 = low1; i0 <= hi1; i0++) for (i1 = low2; i1 <= hi2; i1++) { ipt[0] = i0; ipt[1] = i1; opt[0] = (i0-v0)/2; opt[1] = (i1-v1)/2; if ((tlow1 <= opt[0] <= thi1) && (tlow2 <= opt[1] <= thi2)) if ((i0 % 2 == 0) && (i1 % 2 == 0)) O[opt].Accum(I[ipt]); }

Conditional Motion Eliminate redundant conditionals Views of a conditional Syntactically different conditions Dominating constraints Downward propagation Upward propagation Omega Library Generate code for a set of conditionals

Conditional Motion Example for(i0 = low1; i0 <= hi1; i0++) for (i1 = low2; i1 <= hi2; i1++) { ipt[0] = i0; ipt[1] = i1; opt[0] = (i0-v0)/2; opt[1] = (i1-v1)/2; if ((tlow1 <= opt[0] <= thi1) && (tlow2 <= opt[1] <= thi2)) if ((i0 % 2 == 0) && (i1 % 2 == 0)) O[opt].Accum(I[ipt]); } if (2*low2 <= -v1+thi2 && low2 <= v1+2*thi2) for(t1 = max(2*(v0+2*tlow1+1)/2, 2*(low1+1)/2); t1 <= min(v0+2*thi1,hi1); t1+=2) for(t2 = max(2*(2*tlow2+va1+1)/2, 2*(low2+1)/2); t2 <= min(v1+2*thi2,hi2); t2+=2) s1(t1, t2);

Input to Omega Library R := {[i0,i1] : low1 <= i0 <= hi1 and low2 <= i1 <= hi2 and exists (i00: i00*2 = i0) and exists (i11: i11*2 = i1)}; S := {[i0,i1] : tlow1 * 2 + v0 <= i0 <= thi1 * 2 + v0 and tlow2 * 2 + v1 <= i1 <= thi2 * 2 + v1}; U := (R intersects S) codegen U;

Conditional Motion mg-vscope: subsampling vscope: satellite:

Related Work Parallelizing irregular applications Disk-resident datasets, different class of applications Out-of-core compilers High-level abstractions, different applications, language, and runtime system Data-centric locality transformations Focus on disk-resident datasets Synthesizing sparse applications from dense ones Different class of applications, disk-resident datasets Code motion techniques Target eliminating redundant conditionals

Conclusion High-level abstractions simplify application development Data-centric execution strategies help support efficient processing Data parallel framework is convenient to describe the applications Choice of strategies has substantial impact on the performance

Application Loops Loop fission techniques to create canonical loops Program slicing techniques to extract the functions Foreach (r  R) { O 1 [S L1 (r)] = F 1 (O 1 [S L1 (r)], I 1 [S R1 (r), …, I n [S Rn (r)]); … O m [S Lm (r)] = F m (O m [S Lm (r)], I 1 [S R1 (r), …, I n [S Rn (r)]); }

Canonical Loops Facilitate the task for the run-time system Left hand side subscript functions Output collections are congruent; or All output collections fit in main memory Right hand side subscript functions Input collections are congruent F i (O i, I 1, I 2, … I n ) = g 0 (O i ) op 1 g 1 (I 1 ) op 2 g 2 (I 2 ) … op n g n (I n ); op 1 to op n are commutative and associative

Program Slicing public class VMPixel { char[3] colors; void Initialize(){ colors[0] = colors[1] = colors[2] = 0; } void Accum(VMPixel p, int avg) { colors[0] += p.colors[0]/avg; colors[1] += p.colors[1]/avg; colors[2] += p.colors[2]/avg; } public class VMPixelOut extends VMPixel implements Reducinterface; Public Class VMScope { static Point[2] lowpoint [0,0]; static Point[2] hipoint[MaxX-1, MaxY-1]; static RectDomain[2] VMSlide = [lowpoint:hipoint]; static VMPixel[2d] Vscope = new VMPixel[VMSlide]; public static void main(String[] args) { Point[2] lowend = [args[0], args[1]]; Point[2] hiend = [args[2],args[3]]; RectDomain[2] querybox = [lowend:hiend]; int subsamp = args[4]; RectDomain[2] OutDomain = [[0,0]:(hiend-lowend)/subsamp]; VMPixelOut[2d] Output = new VMPixelOut[OutDomain]; Point[2] p; foreach (p in OutDomain) Output[p].Initialize(); foreach (p in querybox) { Point[2] q = (p - lowend)/subsamp; Output[q].Accum(Vscope[p], subsamp*subsamp); }