Presentation is loading. Please wait.

Presentation is loading. Please wait.

Antoine Monsifrot François Bodin CAPS Team Computer Aided Hand Tuning June 2001.

Similar presentations


Presentation on theme: "Antoine Monsifrot François Bodin CAPS Team Computer Aided Hand Tuning June 2001."— Presentation transcript:

1 Antoine Monsifrot François Bodin CAPS Team Computer Aided Hand Tuning June 2001

2 2 Overview Why CBR driven code tuning? Approach System overview Tuning cases Examples Conclusion

3 3 Introduction Execution speed depends –on the code structure –on the processor architecture Compiler optimizations frequently fail –unable to analyze the programs (aliasing,...) –must preserve program semantics –few application or target architecture knowledge –ignore most of the existing libraries

4 4 CBR Driven Code Tuning? Case-based reasoning –no knowledge formalization needed –4 main operations: identification, retrieval, reuse, retention Defining a Tuning case –abstracting loop performance properties User interaction

5 5 System Overview

6 6 A Tuning Case A goal and a target machine A program transformation A set of indices –data about the code that indicates the optimisation opportunity –abstraction of code properties High probability of recognising a code structure we know how to optimise –compilers need to be conservative

7 7 Abstract performance indices Based on execution time code properties –data locality –parallelism –floating point operations –libraries Abstractions –data accesses –data dependencies –arithmetic expressions –code patterns

8 8 Static Indices Loop nest structure –depth, gotos, function call Array accesses –access strides Expression patterns –div/div, power, sparse accesses,... Loop patterns –Blas, LU, Jacobi, SOR Parallelism –Data dependencies Execution time and frequency –etime, tcov Dynamic Indices do k = 1,npts do j = 2,npts a(j,k) = a(j-1,k) + a(j,k)**2 if (a(j,k).eq. 0) then goto 4 endif a(j,k) = a(j,k) + 1 4 a(j,k) = a(j-1,k) / a(j,k) enddo

9 9 Computing Cases For each loops all cases are checked char *ComputeCase1(Indices[]){ …}

10 10 Tiling for TLB Cases Example Indices affine loop line array accesses column array accesses } tiling + no negative component in dependence vectors uniform dependencies } no perfect loop nest large body } + skewing distribution } } distribution + tiling Skewing + tiling

11 11 Loop Benchmark DO 3200 I = 1,NSIZE2 DO 3170 J = 1,NSIZE1 IF (B2(J,I).EQ. 0.0) GO TO 3130 A2(J,I) = C2(J,I)*B2(J,I) GO TO 3170 3130 CONTINUE B2(J,I) = C2(J,I)*A2(J,I) 3170 CONTINUE 3200 CONTINUE 3.3Mflop 54.1Mflop 64 loop nests 44 are compiler friendly 40 are improved by KAP 13 do not exhibit a case 12 exhibit a case 5 parallel loops not parallelized by KAP 1 sorted else if 1 condition on loop index 3 loop nests with loops to merge 2 matrix multiply http://www.netlib.org/benchmark/parallel

12 12 An Application Example: DeFT A real application Gaussian Density Functional Program 75863 lines of Fortran code (comment included) Two main routines: gridwork : 47,5% 1015 lines x_annihilate : 29,7% 269 lines http://www.ccl.net/cca/software/SOURCES/FORTRAN/DeFT/index.shtml

13 13 DeFT Examples Examples of cases found: do 1029 k = 1,n... do 1029 j = istart(myid+1),iend(myid+1) do 1029 i = 1,n 1029 overlap(i,j) = overlap(i,j) + coeff(i,k)*coeff(j,k) Matrix Multiplication (Blas) Sequential : 121s KAP : 140s CAHT : 85s 4-processor SGI Onyx do 1012 i=1,ihits ii=iwkvec(i) …... do 1012 j=1,ihits jj=iwkvec(j) …... do 1015 k=1,npts 1015 wf(k,ii)=wf(k,ii)+factor*fv(k,jj) if((nfunctional.gt.0).and.(ipart.eq.0)) then do 1016 k=1,npts wfx(k,ii)=wfx(k,ii)+factor*fvx(k,jj) wfy(k,ii)=wfy(k,ii)+factor*fvy(k,jj) 1016 wfz(k,ii)=wfz(k,ii)+factor*fvz(k,jj) endif 1012 continue do 1011 k=istart(myid+1),iend(myid+1) 1011 veci(k)=coeff(k,i) do 1012 k=istart(myid+1),iend(myid+1) 1012 vecj(k)=coeff(k,j) do 1013 k=istart(myid+1),iend(myid+1) 1013 coeff(k,i)=coeff(k,i)+s(i)*(vecj(k)-tau(i)*veci(k)) do 1014 k=istart(myid+1),iend(myid+1) 1014 coeff(k,j)=coeff(k,j)-s(i)*(veci(k)+tau(i)*vecj(k)) do 1015 k=istart(myid+1),iend(myid+1) 1015 veci(k)=smat(k,i) do 1016 k=istart(myid+1),iend(myid+1) 1016 vecj(k)=smat(k,j) Fusion Parallel loop

14 14 Conclusion Case based reasoning provides a promising framework for code tuning Tuning the cases may be difficult –take into account the compiler (f.i. unrolling) –integration of dynamic data and assembly code properties –learning techniques for case tuning


Download ppt "Antoine Monsifrot François Bodin CAPS Team Computer Aided Hand Tuning June 2001."

Similar presentations


Ads by Google