Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya.

Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya Unnikrishnan IBM Toronto Lab priyau@ca.ibm.com CASCON 2005 priyau@ca.ibm.com

Software Group © 2005 IBM Corporation October 2005 Parallelization  IBM XL compilers support Fortran 77/90/95, C and C++  Implements both OpenMP and Auto-parallelization.  Both target SMP (shared memory parallel) machines  Non-threadsafe code generated by default –Use the _r invocation (xlf_r, xlc_r … ) to generate threadsafe code

Software Group © 2005 IBM Corporation October 2005 Parallelization options -qsmp=nooptParallelizes code with minimal optimization to allow for better debugging of OpenMP applications. -qsmp=ompParallelizes code containing OpenMP directives -qsmp=autoAutomatically parallelizes loops -qsmp=noautoNo auto-parallelization. Processes IBM and OpenMP parallel directives.

Software Group © 2005 IBM Corporation October 2005 Outlining long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) endif return main; } int main{}{ #pragma omp parallel for for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( unsigned @LB, unsigned @UB){ @CIV1 =0; do{ a[]0[(long)@LB + CIV1] = const; …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime call Outlined routine

Software Group © 2005 IBM Corporation October 2005 SMP parallel runtime _xlsmpParallelDoSetup_TPO(&main@OL@1,0,n..) main@OL@1(30,39) main@OL@1(0,9) main@OL@1(10,19) main@OL@1(20,29) The outlined function is parameterized – can be invoked for different ranges in the iteration space

Software Group © 2005 IBM Corporation October 2005 Auto-parallelization  Integrated framework for OpenMP and auto-parallelization  Auto-parallelization is restricted to loops.  Auto-parallelization is done in the link step when possible.  This allows us to perform various interprocedural analysis and optimizations before automatic parallelization

Software Group © 2005 IBM Corporation October 2005 Auto-parallelization transformation int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } + int main{}{ #auto-parallel-loop for(int i=0; i<n; i++) { a[i] = const; …… } Outlining

Software Group © 2005 IBM Corporation October 2005 We can auto-parallelize OpenMP applications – skipping user-parallel code – good thing!! int main{}{ for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; } + Outlining int main{}{ #auto-parallel-loop for(int i=0; i<n; i++){ a[i] = const; …… } #pragma omp parallel for for (int j=0; j<n; j++){ b[j] = a[i]; }

Software Group © 2005 IBM Corporation October 2005 Pre-parallelization phase  Loop Normalization (normalize countable loops)  Scalar privatization  Array privatization  Reduction variable analysis  Loop interchange (that helps parallelization)

Software Group © 2005 IBM Corporation October 2005 Cost Analysis  Automatic parallelization tests –Dependence analysis : Is it safe to parallelize ?? –Cost analysis : Is it worthwhile to parallelize ??  Cost analysis: Estimates the total workload of the loop  LoopCost = ( IterationCount * ExecTimeOfLoopBody )  Cost known at compile time – trivial  Runtime cost analysis is more complex

Software Group © 2005 IBM Corporation October 2005 Conditional Parallelization long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) } else main@OL@1(0,0,(unsigned)n,0) endif return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime check

Software Group © 2005 IBM Corporation October 2005 Runtime cost analysis challenges  Runtime checks should be –Light weight : should not introduce large overhead in applications that are mostly serial –Overflow problems : leads to incorrect decision – costly!! loopcost = ((( c1*n1 ) + (c2*n2) + const)*n3)* … –Restricted to integer operations –Should be accurate  Balance all the above factors

Software Group © 2005 IBM Corporation October 2005 Runtime dependence test long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if( && loop_cost>threshold){ _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,0) } else main@OL@1(0,0,(unsigned)n,0) endif return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime dependence Work by Peng Zhao

Software Group © 2005 IBM Corporation October 2005 Controlled parallelization  Cost analysis  selects big loops  Controlled parallelization –Selection is not enough –Parallel performance dependent on ( amount of work + number of processors used) –Using large number of processors for a small loop  huge degradations !!

Software Group © 2005 IBM Corporation October 2005 Controlled parallelization  Introduce another runtime parameter IPT (minimum iterations per thread)  The IPT is passed to the SMP runtime  SMP runtime limits the number of threads working on the parallel loop based on IPT  IPT = function( loop_cost, mem access info.. )

Software Group © 2005 IBM Corporation October 2005 Controlled Parallelization long main{}{ @_xlsmpEntry0 =_xlsmpInitializeRTE(); if (n > 0) then if(loop_cost > threshold){ IPT = func(loop_cost) _xlsmpParallelDoSetup_TPO(2208, &main@OL@1,0,n,5,0, @_xlsmpEntry0,0,0,0,0,0,IPT) endif } else main@OL@1(0,0,(unsigned)n,0) } return main; } int main{}{ for(int i=0; i<n; i++) { a[i] = const; …… } Subroutine void main@OL@1( …… @CIV1 = @CIV1 + 1; }while((unsigned)@CIV1 < (@UB-@LB)); return; } + Runtime parameter

Software Group © 2005 IBM Corporation October 2005 SMP parallel runtime _xlsmpParallelDoSetup_TPO(&main@OL@1,0,n..IPT) { threadsUsed = IterCount/IPT if (threadsUsed > threadsAvailable) threadsUsed = threadsAvailable ….. }

Software Group © 2005 IBM Corporation October 2005 Controlled parallelization for OpenMP  Improves performance and scalability  Allows fine grained control at loop level granularity  Can be applied to OpenMP loops as well  Adjust number of threads when ENV variable OMP_DYNAMIC is turned on.  Issues with threadprivate data  Encouraging results in galgel

Software Group © 2005 IBM Corporation October 2005 Future work  Improve cost analysis algorithm and fine tune heuristics  Implement interprocedural cost analysis.  Extend cost analysis and controlled parallelization to non loops in user-parallel code – for scalability  Implement interprocedural dependence analysis

Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya.

Similar presentations

Presentation on theme: "Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya.

Similar presentations

Presentation on theme: "Software Group © 2005 IBM Corporation Compilation Technology Controlling parallelization in the IBM XL Fortran and C/C++ parallelizing compilers Priya."— Presentation transcript:

Similar presentations

About project

Feedback