Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Similar presentations


Presentation on theme: "Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information."— Presentation transcript:

1 Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University

2 Motivation  Languages, compilers, and runtime systems for high- end computing  Typically focus on scientific applications  Can commercial applications benefit ?  A majority of top 500 parallel configurations are used as database servers  Is there a role for parallel systems research ?  Parallel relational databases – probably not  Data mining, OLAP, decision support – quite likely

3 Data Mining  Extracting useful models or patterns from large datasets  Includes a variety of tasks - mining associations, sequences, clustering data, building decision trees, predictive models - several algorithms proposed for each  Both compute and data intensive  Algorithms are well suited for parallel execution  High-level interfaces can be useful for application development

4 Project Overview

5 Project Components  A middleware system called FREERIDE (Framework for Rapid Implementation of Datamining Engines) (SDM 01, SDM 02)  Performance modeling and prediction (for parallelization strategy selection) SIGMETRICS 2002  Runtime and compiler support for shared memory parallelization (LCPC 02)  Translation from mining operators (not yet )  Focus on language and compiler support for distributed memory parallelization in this talk

6 Common Processing Structure Structure of Common Data Mining Algorithms {* Outer Sequential Loop *} While () { { * Reduction Loop* } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } Applies to major association mining, clustering and decision tree construction algorithms Parallelization approach Compute local copy of reduction objects Perform global reduction

7 Middleware Support for Distributed Memory Parallelization  Interface Requires:  Specification of an iterator and termination condition  Local reduction for each parallel loop  Global reduction for each loop  Functionality  Fetch data elements chunk by chunk, apply local reduction  Broadcast the reduction object after finishing one pass on data  Perform global reduction, broadcast the results  Check termination condition, move to next iteration

8 Compilation Approach  Support a general high-level language  Use middleware functionality in compilation  Exploit the domain-specific common structure  Reduction loop with associative and commutative operations  Disk-resident input datasets, smaller output

9 · A data parallel dialect of Java: to give compiler information about independent collections of objects, parallel loops and reduction operations — domain & rectdomain — foreach loop — reduction variables: - can only be updated inside a foreach loop by operations that are associative & commutative - intermediate value of the reduction variables may not be used within the loop, except for self-updates Language Support

10 Example code public class kNN { static buffer kbuffer; public static void main(String[] args) { double dis; Point lowend = … Point hiend = … Point p; RectDomain InputDomain=[lowend:hiend]; kPoint[3d] Input=new kPoint[InputDomain]; foreach (p in InputDomain) { if (Input[p].inRange(R)) { dis=Input[p].distance(W); kbuffer.insert(Input[p],dis); }

11 Compilation Task  Extract local reduction function  Simple from body of data parallel loop  Extract an iterator and termination condition  Simple from the overall code  Extract a global reduction function  Can be quite challenging in the presence of complex control flow and data-structures  A new algorithm developed

12 Extracting Global Reduction from Local Reduction : Motivating Example I = k – 1 ; While (newdis = 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … } I = k – 1 ; While (kpoint.dis = 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = kpoint.x1 ; x2[I+1] = kpoint.x2 ; … } For( j = 0; j < k ; j++) { I = k – 1 ; While (buf.dis[j] = 0) { if(I>0) { x1[I] = x1[I-1] ; x2[I] = x2[I-1] ; … } I = I – 1 ; } If(I < k-1) { x1[I+1] = buf..x1[j] ; x2[I+1] = buf..x2[I] ; … }

13 Overall Approach  Classify each assignment to a data member of reduction object into following types:  O.x = g(e), where e is the input element  O.x = O.x op g(e), op is an associative and commutative operator  Expression involving loop constants and other members of the reduction object  Classify control dependence on any of the above assignment statements as:  Loop constant  Non-loop constant

14 Code Generation: Handling Different Types of Assignment Statements  Three types of assignment statements:  O.x = g(e) (Type a) If x can represent many fields, iterate over all of them  O.x = O.x op g(e) (Type b) Replace by O.x = O.x op O1.x If x can represent many fields, iterate over all of them  Expression involving loop constants and other data members (Type c) Keep as it is

15 Handling Control Flow  Control predicates for Type (b) assignments:  Remove non-loop constant control predicates  Keep loop constant control predicates  Control predicates for Type (a) and Type (c) statements:  Keep loop constant control predicates  Classify non-loop constant into two types:  Predicate involves a value that is assigned to a data member Replace that value by the data member  Other predicates - Simply remove

16 Experimental Platform Cluster of Workstations  Sun Ultra Enterprise 450  250 MHz Ultra-II processors  1 GB of 4-way interleaved main memory  Myrinet as the interconnect

17 Results from k-means clustering 1 GB dataset with 3 dimensional points K = 3

18 Results from Apriori Association Mining 3 GB dataset

19 Results from k-nearest neighbors 1 GB dataset 3 dimensional pts. k = 100

20 Summary  Focus on a new class of applications  Exploit the common structure within the class  Develop a runtime system supporting this structure  Use it as a compiler target  Very simple compiler implementation (< 1000 lines of code)  A new algorithm for synthesizing global reduction functions  Performance of compiler generated code is very competitive


Download ppt "Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information."

Similar presentations


Ads by Google