LCPC02 Wei Du Renato Ferreira Gagan Agrawal

LCPC02 Wei Du Renato Ferreira Gagan Agrawal
Towards Compiler Support for Data Intensive Applications on Distributed Heterogeneous Resources Wei Du Renato Ferreira Gagan Agrawal Ohio-State University

Ohio-State University
Motivation Grid Environment geographically distributed heterogeneous resources Scientific and commercial data intensive applications generalized reduction operations are very common in the processing structure No compiler support for high-level languages for grid application development 7/26/02 Ohio-State University

Software Architecture
Data Parallel Java assume all data are available in a flat memory assume all computation are done on a single processor extensions of Java: domain & rectdomain foreach loop reduction variables Data Parallel Java Compiler Support Filter-stream Program On DataCutter 7/26/02 Ohio-State University

Example code public class kNN { static buffer kbuffer;
public static void main(String[] args) { double dis; Point<3> lowend =[0,0,0]; Point<3> hiend =[Integer.parseInt(args[0]), Integer.parseInt(args[1]), Integer.parseInt(args[2])]; Point<3> p; RectDomain<3> InputDomain=[lowend:hiend]; kPoint[3d] Input=new kPoint[InputDomain]; foreach (p in InputDomain) { if (Input[p].inRange(R)) dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }

DataCutter — Grid development tool
ongoing project at University of Maryland / OSU ( Beynon, Kurc, Sussman, Saltz et al.) targets distributed, heterogeneous environments decomposes application-specific data processing operations into a set of interacting processes provides a set of interfaces filter stream layout & placement stream1 stream2 filter1 filter2 filter3

Compiler Overview Data Parallel Java Data Centric Transformation
Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 7/26/02 Ohio-State University

Experience with a Mining Algo.
K-Nearest Neighbors Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point  = (a, b, c). We want to find the nearest K neighbors of  within R. Range_Query Input[p].inRange? discard Local reduction Dis = … … Kbuffer.insert(…, …) Range_query Select Combine R-S stream S-C stream Read data 7/26/02 Ohio-State University

Experimental Results Experimented on a LAN
Data is available on 2 machines Results are on the third machine Dataset contains 3-D points, of size 1.2M and 12M K is 20 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version

Experience with Image Querying
Virtual Microscope Input: a digitized image, a rectangular region R, a subsampling factor Output: an enlarged image for the specified image region with a particular sampling factor querybox = [lowend, highend] foreach (p in querybox) { if (VScope[p].is_sampled(lowend, subsampling_factor) q = (p-lowend)/subsampling_factor; Output[q].Assign(VScope[p]); } 7/26/02 Ohio-State University

Experimental Results Experimented on a LAN
Data is available on machine other than where the results are to be displayed Image is of size 800k Sampling factor are 4 and 16 Consider 0, 1, 4, 16 other computation intensive jobs on machines hosting data Running time is in milliseconds Performance difference of the two versions is within 15% Comparing Manual Version and compiler-generated Version

Summary and Future Work
Aims at developing compiler support for using heterogeneous and distributed resources for processing geographically distributed datasets Experimented on simple data intensive codes with simple filter generation heuristics – Initial results are quite encouraging Future Work: more applications (visualization) more compiler analysis (sophisticated heuristics, loop fission, data-centric code generation … …) 7/26/02 Ohio-State University

Thank you !!!

Compiler Overview Data Parallel Java Data Centric Transformation
Loop Fission Global Reduction Analysis Filter Enumeration Granularity Selection Final Code Generation Filter-Stream programming 10/17/2019 Ohio-State University

Data Parallel Java · assume all data are available in a flat memory · assume all computation are done on a single processor · extensions of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables: can only be updated inside a foreach loop by operations that are associative & commutative intermediate value of the reduction variables may not be used within the loop, except for self-updates 10/17/2019 Ohio-State University

Why DataCutter ? Typical DDM algorithm DataCutter features local reduction on geographically dispersed data global reduction to combine the results decomposition of application into a set of filters filters are location independent filters interact with each other via streams 10/17/2019 Ohio-State University

K-nearest neighbor search algorithm on DataCutter — A case study
Problem definition: Given a 3-D range R= <(x1, y1, z1), (x2, y2, z2)>, and a point  = (a, b, c). We want to find the nearest K neighbors of  within R. Solution: Range_query Select Combine R-S stream S-C stream Read data 10/17/2019 Ohio-State University

LCPC02 Wei Du Renato Ferreira Gagan Agrawal

Similar presentations

Presentation on theme: "LCPC02 Wei Du Renato Ferreira Gagan Agrawal"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LCPC02 Wei Du Renato Ferreira Gagan Agrawal

Similar presentations

Presentation on theme: "LCPC02 Wei Du Renato Ferreira Gagan Agrawal"— Presentation transcript:

Similar presentations

About project

Feedback